adamreeve commented on code in PR #8305:
URL: https://github.com/apache/arrow-rs/pull/8305#discussion_r2625210327
##########
parquet/src/file/metadata/writer.rs:
##########
@@ -755,34 +755,39 @@ impl MetadataObjectWriter {
) -> Result<ColumnChunkMetaData> {
// Column crypto metadata should have already been set when the column
was created.
// Here we apply the encryption by encrypting the column metadata if
required.
- match column_chunk.column_crypto_metadata.as_deref() {
- None => {}
+ let encryptor = match column_chunk.column_crypto_metadata.as_deref() {
+ None => None,
Some(ColumnCryptoMetaData::ENCRYPTION_WITH_FOOTER_KEY) => {
- // When uniform encryption is used the footer is already
encrypted,
- // so the column chunk does not need additional encryption.
+ // We always encrypt column metadata separately and store in
+ // encrypted_column_metadata. This allows skipping the
plaintext meta_data
+ // field in the writer to reduce footer size. If
encrypted_column_metadata
+ // were not set, the reader would not be able to read the
column metadata.
+ Some(file_encryptor.get_footer_encryptor()?)
Review Comment:
I don't think this is the right approach. If the footer is already encrypted
we shouldn't need to double-encrypt the column metadata. This goes against what
the spec describes and might be incompatible with some readers, plus it will
require extra decryption time when reading.
If it's too complex to wire through the information about whether the footer
is a plaintext or not to `serialize_column_meta_data`, I think we should stick
with what you had before, or go back to what we currently do in the main branch
where the full column metadata is skipped rather than just the stats. My
understanding is that writing the metadata in plaintext without the stats was
for backwards compatibility with readers that expect the column metadata to
always be present when encryption was first being added. But that might not be
such a concern now if readers know that field is optional.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]