adamreeve commented on code in PR #8305:
URL: https://github.com/apache/arrow-rs/pull/8305#discussion_r2625210327


##########
parquet/src/file/metadata/writer.rs:
##########
@@ -755,34 +755,39 @@ impl MetadataObjectWriter {
     ) -> Result<ColumnChunkMetaData> {
         // Column crypto metadata should have already been set when the column 
was created.
         // Here we apply the encryption by encrypting the column metadata if 
required.
-        match column_chunk.column_crypto_metadata.as_deref() {
-            None => {}
+        let encryptor = match column_chunk.column_crypto_metadata.as_deref() {
+            None => None,
             Some(ColumnCryptoMetaData::ENCRYPTION_WITH_FOOTER_KEY) => {
-                // When uniform encryption is used the footer is already 
encrypted,
-                // so the column chunk does not need additional encryption.
+                // We always encrypt column metadata separately and store in
+                // encrypted_column_metadata. This allows skipping the 
plaintext meta_data
+                // field in the writer to reduce footer size. If 
encrypted_column_metadata
+                // were not set, the reader would not be able to read the 
column metadata.
+                Some(file_encryptor.get_footer_encryptor()?)

Review Comment:
   I don't think this is the right approach. If the footer is already encrypted 
we shouldn't need to double-encrypt the column metadata. This goes against what 
the spec describes and might be incompatible with some readers, plus it will 
require extra decryption time when reading.
   
   If it's too complex to wire through the information about whether the footer 
is a plaintext or not to `serialize_column_meta_data`, I think we should stick 
with what you had before, or go back to what we currently do in the main branch 
where the full column metadata is skipped rather than just the stats. My 
understanding is that writing the metadata in plaintext without the stats was 
for backwards compatibility with readers that expect the column metadata to 
always be present when encryption was first being added. But that might not be 
such a concern now if readers know that field is optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to