pitrou commented on code in PR #39216:
URL: https://github.com/apache/arrow/pull/39216#discussion_r1455833560
##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -188,8 +286,13 @@ TEST_F(DatasetEncryptionTest,
WriteReadDatasetWithEncryption) {
AssertTablesEqual(*combined_table, *table_);
}
-// Read a single parquet file with and without decryption properties.
-TEST_F(DatasetEncryptionTest, ReadSingleFile) {
+void read_single_file(
Review Comment:
Please, let's follow the style conventions. Function names should be
`CamelCase`.
##########
python/pyarrow/_parquet_encryption.pyx:
##########
@@ -103,6 +114,19 @@ cdef class EncryptionConfiguration(_Weakrefable):
["{}: {}".format(k, ", ".join(v)) for k, v in value.items()])
self.configuration.get().column_keys = tobytes(column_keys)
+ @property
+ def uniform_encryption(self):
+ """Encrypt footer and all columns with the same encryption key
(footer_key).
+
+ This cannot be used together with column_keys.
+ """
Review Comment:
```suggestion
"""
Whether to encrypt the footer and all columns with the same key.
If true, the columns as well as the footer will be encrypted with
footer_key.
If false, the columns will be encrypted as specified in column_keys.
It is an error to set both uniform_encryption and column_keys.
"""
```
##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
# Write with encryption properties
write_encrypted_parquet(path, data_table, encryption_config,
kms_connection_config, crypto_factory)
+
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir,
data_table):
+ """Write an encrypted parquet, but give both column key and uniform
encryption.
+
+ This should raise an error.
+ """
Review Comment:
```suggestion
"""
Try to write an encrypted parquet with both column keys and uniform
encryption.
"""
```
##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
# Write with encryption properties
write_encrypted_parquet(path, data_table, encryption_config,
kms_connection_config, crypto_factory)
+
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir,
data_table):
+ """Write an encrypted parquet, but give both column key and uniform
encryption.
+
+ This should raise an error.
+ """
+ path = tempdir / 'encrypted_table_no_col_key.in_mem.parquet'
+
+ # Encrypt the footer with the footer key
+ encryption_config = pe.EncryptionConfiguration(
+ footer_key=FOOTER_KEY_NAME)
Review Comment:
Where are uniform_encryption and column_keys set?
##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -188,8 +286,13 @@ TEST_F(DatasetEncryptionTest,
WriteReadDatasetWithEncryption) {
AssertTablesEqual(*combined_table, *table_);
}
-// Read a single parquet file with and without decryption properties.
-TEST_F(DatasetEncryptionTest, ReadSingleFile) {
+void read_single_file(
+ std::shared_ptr<fs::FileSystem> file_system_, std::shared_ptr<Table>
table_,
Review Comment:
Local variables and function arguments should not have a trailing
underscore. The trailing underscore is for class member variables.
##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
# Write with encryption properties
write_encrypted_parquet(path, data_table, encryption_config,
kms_connection_config, crypto_factory)
+
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir,
data_table):
+ """Write an encrypted parquet, but give both column key and uniform
encryption.
+
+ This should raise an error.
+ """
+ path = tempdir / 'encrypted_table_no_col_key.in_mem.parquet'
Review Comment:
If we're specifying a file name we should be careful not to reuse the same
name as in the previous test.
##########
python/pyarrow/_parquet_encryption.pyx:
##########
@@ -103,6 +114,19 @@ cdef class EncryptionConfiguration(_Weakrefable):
["{}: {}".format(k, ", ".join(v)) for k, v in value.items()])
self.configuration.get().column_keys = tobytes(column_keys)
+ @property
+ def uniform_encryption(self):
+ """Encrypt footer and all columns with the same encryption key
(footer_key).
+
+ This cannot be used together with column_keys.
+ """
+ return self.configuration.get().uniform_encryption
+
+
Review Comment:
Nit: please remove this superfluous empty line.
##########
cpp/src/parquet/encryption/crypto_factory.h:
##########
@@ -41,6 +41,7 @@ struct PARQUET_EXPORT EncryptionConfiguration {
: footer_key(footer_key) {}
/// ID of the master key for footer encryption/signing
+ /// If uniform_encryption is True, will use this key to encrypt all columns
as well.
Review Comment:
Nit: let's add a blank line between summary and description.
```suggestion
/// \brief ID of the master key for footer encryption/signing
///
/// If uniform_encryption is True, this key will encrypt all columns as
well.
```
##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -142,12 +145,107 @@ class DatasetEncryptionTest : public ::testing::Test {
kms_connection_config_;
};
-// This test demonstrates the process of writing a partitioned Parquet file
with the same
-// encryption properties applied to each file within the dataset. The
encryption
-// properties are determined based on the selected columns. After writing the
dataset, the
-// test reads the data back and verifies that it can be successfully decrypted
and
-// scanned.
-TEST_F(DatasetEncryptionTest, WriteReadDatasetWithEncryption) {
+class DatasetEncryptionTestUniformEncryption : public ::testing::Test {
+ public:
+ // This function creates a mock file system using the current time point,
creates a
+ // directory with the given base directory path, and writes a dataset to it
using
+ // provided Parquet file write options. These include an encryption
configuration
+ // which uses EncryptionConfiguration.column_keys to specify the encryption
key for
+ // each column.
+ // The dataset is partitioned using a Hive partitioning scheme.
+ // The function also checks if the written files exist in the file system.
+ static void SetUpTestSuite() {
Review Comment:
Could you please avoid copy-pasting all this? You should instead create a
base class to receive all common setup functionality.
##########
python/pyarrow/tests/test_dataset_encryption.py:
##########
@@ -73,6 +74,18 @@ def create_encryption_config():
)
+def create_uniform_encryption_config():
+ return pe.EncryptionConfiguration(
+ footer_key=FOOTER_KEY_NAME,
+ plaintext_footer=False,
+ uniform_encryption=True,
+ encryption_algorithm="AES_GCM_V1",
+ # requires timedelta or an assertion is raised
Review Comment:
I don't think this comment is useful
```suggestion
```
##########
python/setup.py:
##########
@@ -181,8 +181,12 @@ def initialize_options(self):
self.bundle_cython_cpp = strtobool(
os.environ.get('PYARROW_BUNDLE_CYTHON_CPP', '0'))
- self.with_parquet_encryption = (self.with_parquet_encryption and
- self.with_parquet)
+ if self.with_parquet_encryption and not self.with_parquet:
+ raise ValueError(
+ "Cannot build Parquet encryption without Parquet. Set
PYARROW_WITH_PARQUET=1 "
+ "or use --with-parquet-encryption."
Review Comment:
It seems like this error message is not consistent?
```suggestion
"Cannot build Parquet encryption without Parquet. Set
PYARROW_WITH_PARQUET=1 "
"or use --with-parquet."
```
Also, I'm not sure about this change. What do you think @jorisvandenbossche ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]