Hi all,

We are working on the Parquet modular encryption, and are currently adding
a high-level interface that allows to encrypt/decrypt parquet files via
properties only (without calling the low level API). In the
spark/parquet-mr domain, we're using the Hadoop configuration properties
for that purpose - they are already passed from Spark to Parquet, and allow
to add custom key-value properties that can carry the list of encrypted
columns, key identities etc, as described in the
https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit?usp=sharing

I'm not sufficiently familiar with the pandas/pyarrow/parquet-cpp
ecosystem. Is there an analog of Hadoop configuration (a free key-value
map, passed all the way down to parquet-cpp)? Or a more structured
configuration object (where we'll need to add the encryption-related
properties)? All suggestions are welcome.

Cheers, Gidon

Reply via email to