adamreeve commented on code in PR #7111:
URL: https://github.com/apache/arrow-rs/pull/7111#discussion_r2009421835


##########
parquet/src/file/metadata/writer.rs:
##########
@@ -133,17 +190,53 @@ impl<'a, W: Write> ThriftMetadataWriter<'a, W> {
 
         // Write file metadata
         let start_pos = self.buf.bytes_written();
-        {
-            let mut protocol = TCompactOutputProtocol::new(&mut self.buf);
-            file_metadata.write_to_out_protocol(&mut protocol)?;
+        match self.file_encryptor.as_ref() {
+            #[cfg(feature = "encryption")]
+            Some(file_encryptor) if 
file_encryptor.properties().encrypt_footer() => {
+                // First write FileCryptoMetadata
+                let crypto_metadata = 
Self::file_crypto_metadata(file_encryptor)?;
+                let mut protocol = TCompactOutputProtocol::new(&mut self.buf);
+                crypto_metadata.write_to_out_protocol(&mut protocol)?;
+
+                // Then write encrypted footer
+                let aad = create_footer_aad(file_encryptor.file_aad())?;
+                let mut encryptor = file_encryptor.get_footer_encryptor()?;
+                encrypt_object(&file_metadata, &mut encryptor, &mut self.buf, 
&aad)?;
+            }
+            _ => {
+                let mut protocol = TCompactOutputProtocol::new(&mut self.buf);
+                file_metadata.write_to_out_protocol(&mut protocol)?;
+            }
         }
+
         let end_pos = self.buf.bytes_written();
 
         // Write footer
         let metadata_len = (end_pos - start_pos) as u32;
 
         self.buf.write_all(&metadata_len.to_le_bytes())?;
-        self.buf.write_all(&PARQUET_MAGIC)?;
+        #[cfg(feature = "encryption")]
+        let magic = get_file_magic(
+            self.file_encryptor
+                .as_ref()
+                .map(|encryptor| encryptor.properties()),
+        );
+        #[cfg(not(feature = "encryption"))]
+        let magic = get_file_magic();
+        self.buf.write_all(magic)?;
+
+        // TODO: Discuss what to do here with community
+        // The argument for returning unencrypted rowgroups is that any users 
working with
+        // the returned FIleMetaData have no easy way to decrypt it and hence 
expect an
+        // unencrypted structure be returned. So, the argument here would be 
backward compatibility.
+        // Return unencrypted row_group for use in program
+        // E.g. when collecting statistics.
+        // Related to this see: https://github.com/apache/arrow-rs/issues/7254

Review Comment:
   This is an issue we ran into when trying to integrate these changes into 
delta-rs, which reads the row group metadata after writing files to get column 
statistics. Making clients decrypt the metadata in this scenario doesn't seem 
reasonable.
   
   The suggestion to return `ParquetMetaData` instead of the Thrift 
`FileMetaData` in #7254 seems relevant here, if that was done then that might 
also solve this problem, depending on how it was implemented.
   
   We'd appreciate some feedback on this area and if there's a better way to 
handle this than our current solution of copying and restoring the unencrypted 
metadata after the file is written. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to