pitrou commented on code in PR #39216:
URL: https://github.com/apache/arrow/pull/39216#discussion_r1455833560


##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -188,8 +286,13 @@ TEST_F(DatasetEncryptionTest, 
WriteReadDatasetWithEncryption) {
   AssertTablesEqual(*combined_table, *table_);
 }
 
-// Read a single parquet file with and without decryption properties.
-TEST_F(DatasetEncryptionTest, ReadSingleFile) {
+void read_single_file(

Review Comment:
   Please, let's follow the style conventions. Function names should be 
`CamelCase`.



##########
python/pyarrow/_parquet_encryption.pyx:
##########
@@ -103,6 +114,19 @@ cdef class EncryptionConfiguration(_Weakrefable):
                 ["{}: {}".format(k, ", ".join(v)) for k, v in value.items()])
             self.configuration.get().column_keys = tobytes(column_keys)
 
+    @property
+    def uniform_encryption(self):
+        """Encrypt footer and all columns with the same encryption key 
(footer_key).
+        
+        This cannot be used together with column_keys.
+        """

Review Comment:
   ```suggestion
           """
           Whether to encrypt the footer and all columns with the same key.
           
           If true, the columns as well as the footer will be encrypted with 
footer_key.
           If false, the columns will be encrypted as specified in column_keys.
           
           It is an error to set both uniform_encryption and column_keys.
           """
   ```



##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
         # Write with encryption properties
         write_encrypted_parquet(path, data_table, encryption_config,
                                 kms_connection_config, crypto_factory)
+        
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir, 
data_table):
+    """Write an encrypted parquet, but give both column key and uniform 
encryption.
+    
+    This should raise an error.
+    """

Review Comment:
   ```suggestion
       """
       Try to write an encrypted parquet with both column keys and uniform 
encryption.
       """
   ```



##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
         # Write with encryption properties
         write_encrypted_parquet(path, data_table, encryption_config,
                                 kms_connection_config, crypto_factory)
+        
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir, 
data_table):
+    """Write an encrypted parquet, but give both column key and uniform 
encryption.
+    
+    This should raise an error.
+    """
+    path = tempdir / 'encrypted_table_no_col_key.in_mem.parquet'
+
+    # Encrypt the footer with the footer key
+    encryption_config = pe.EncryptionConfiguration(
+        footer_key=FOOTER_KEY_NAME)

Review Comment:
   Where are uniform_encryption and column_keys set?



##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -188,8 +286,13 @@ TEST_F(DatasetEncryptionTest, 
WriteReadDatasetWithEncryption) {
   AssertTablesEqual(*combined_table, *table_);
 }
 
-// Read a single parquet file with and without decryption properties.
-TEST_F(DatasetEncryptionTest, ReadSingleFile) {
+void read_single_file(
+    std::shared_ptr<fs::FileSystem> file_system_, std::shared_ptr<Table> 
table_,

Review Comment:
   Local variables and function arguments should not have a trailing 
underscore. The trailing underscore is for class member variables.



##########
python/pyarrow/tests/parquet/test_encryption.py:
##########
@@ -236,6 +236,35 @@ def kms_factory(kms_connection_configuration):
         # Write with encryption properties
         write_encrypted_parquet(path, data_table, encryption_config,
                                 kms_connection_config, crypto_factory)
+        
+def test_encrypted_parquet_write_col_key_and_uniform_encryption(tempdir, 
data_table):
+    """Write an encrypted parquet, but give both column key and uniform 
encryption.
+    
+    This should raise an error.
+    """
+    path = tempdir / 'encrypted_table_no_col_key.in_mem.parquet'

Review Comment:
   If we're specifying a file name we should be careful not to reuse the same 
name as in the previous test.



##########
python/pyarrow/_parquet_encryption.pyx:
##########
@@ -103,6 +114,19 @@ cdef class EncryptionConfiguration(_Weakrefable):
                 ["{}: {}".format(k, ", ".join(v)) for k, v in value.items()])
             self.configuration.get().column_keys = tobytes(column_keys)
 
+    @property
+    def uniform_encryption(self):
+        """Encrypt footer and all columns with the same encryption key 
(footer_key).
+        
+        This cannot be used together with column_keys.
+        """
+        return self.configuration.get().uniform_encryption
+
+

Review Comment:
   Nit: please remove this superfluous empty line.



##########
cpp/src/parquet/encryption/crypto_factory.h:
##########
@@ -41,6 +41,7 @@ struct PARQUET_EXPORT EncryptionConfiguration {
       : footer_key(footer_key) {}
 
   /// ID of the master key for footer encryption/signing
+  /// If uniform_encryption is True, will use this key to encrypt all columns 
as well.

Review Comment:
   Nit: let's add a blank line between summary and description.
   ```suggestion
     /// \brief ID of the master key for footer encryption/signing
     ///
     /// If uniform_encryption is True, this key will encrypt all columns as 
well.
   ```



##########
cpp/src/arrow/dataset/file_parquet_encryption_test.cc:
##########
@@ -142,12 +145,107 @@ class DatasetEncryptionTest : public ::testing::Test {
       kms_connection_config_;
 };
 
-// This test demonstrates the process of writing a partitioned Parquet file 
with the same
-// encryption properties applied to each file within the dataset. The 
encryption
-// properties are determined based on the selected columns. After writing the 
dataset, the
-// test reads the data back and verifies that it can be successfully decrypted 
and
-// scanned.
-TEST_F(DatasetEncryptionTest, WriteReadDatasetWithEncryption) {
+class DatasetEncryptionTestUniformEncryption : public ::testing::Test {
+ public:
+  // This function creates a mock file system using the current time point, 
creates a
+  // directory with the given base directory path, and writes a dataset to it 
using
+  // provided Parquet file write options. These include an encryption 
configuration
+  // which uses EncryptionConfiguration.column_keys to specify the encryption 
key for
+  // each column.
+  // The dataset is partitioned using a Hive partitioning scheme.
+  // The function also checks if the written files exist in the file system.
+  static void SetUpTestSuite() {

Review Comment:
   Could you please avoid copy-pasting all this? You should instead create a 
base class to receive all common setup functionality.



##########
python/pyarrow/tests/test_dataset_encryption.py:
##########
@@ -73,6 +74,18 @@ def create_encryption_config():
     )
 
 
+def create_uniform_encryption_config():
+    return pe.EncryptionConfiguration(
+        footer_key=FOOTER_KEY_NAME,
+        plaintext_footer=False,
+        uniform_encryption=True,
+        encryption_algorithm="AES_GCM_V1",
+        # requires timedelta or an assertion is raised

Review Comment:
   I don't think this comment is useful
   ```suggestion
   ```



##########
python/setup.py:
##########
@@ -181,8 +181,12 @@ def initialize_options(self):
         self.bundle_cython_cpp = strtobool(
             os.environ.get('PYARROW_BUNDLE_CYTHON_CPP', '0'))
 
-        self.with_parquet_encryption = (self.with_parquet_encryption and
-                                        self.with_parquet)
+        if self.with_parquet_encryption and not self.with_parquet:
+            raise ValueError(
+                "Cannot build Parquet encryption without Parquet. Set 
PYARROW_WITH_PARQUET=1 "
+                "or use --with-parquet-encryption."

Review Comment:
   It seems like this error message is not consistent?
   ```suggestion
                   "Cannot build Parquet encryption without Parquet. Set 
PYARROW_WITH_PARQUET=1 "
                   "or use --with-parquet."
   ```
   
   Also, I'm not sure about this change. What do you think @jorisvandenbossche ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to