ggershinsky commented on PR #41821:
URL: https://github.com/apache/arrow/pull/41821#issuecomment-2145109371

   > Although I understand the intention of this issue and the corresponding 
fix, I don't think the design of parquet encryption has included the 
`_metadata` summary file because it may point to several different encrypted 
parquet files. It would be great if @ggershinsky could advise to see if there 
is any defection in this use case.
   
   Yep, we haven't worked on supporting this (basically, there was no 
requirement; seemed heading towards deprecation).
   In general, using different encryption keys for different data files is 
considered to be a good security practice (mainly because there is a limit on 
number of crypto operations with one key; also, the key leak scope is smaller) 
- that's why we generate a fresh key for each parquet file in most of the APIs 
(Parquet, Arrow, Spark, Iceberg, etc). However, there are obviously some 
low-level parquet APIs that will allow to pass the same key to many files - if 
used carefully (making sure, somehow, not to exceed the limit), this might be 
ok in some cases. The limit is hight (~1 billion pages, somethings like 
10TB-1PB of data), but if exceeded, the cipher breaks and the data can be 
decrypted.
   Another option could be to create a separate key for the `_metadata` summary 
file, and manage it separately from the data file keys.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to