[GitHub] [parquet-mr] gszadovszky commented on pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

GitBox Tue, 11 Aug 2020 05:47:22 -0700


gszadovszky commented on pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-671925400



   @shangxinli, The column-wise configuration you are talking about 
(PARQUET-1784: Column-wise configuration (#754)) is only a specified key format 
and the related helper implementations for the Hadoop conf. We might have used 
this format to specify the encryption properties but I'm afraid it is do late 
to do that and I am even unsure if it would make sense to have a completely 
different approach for setting such properties than what the other components 
in the Hadoop era use.
   
   I tend to agree with @ggershinsky. The way you want to extend the parquet 
schema is a general extension to add any metadata for any schema elements. 
However, I cannot see any more purpose but what you have described. Moreover, 
this way you are only extending the schema objects that are used only inside 
parquet-mr. This metadata won't be written to the parquet files nor 
serialized/deserialized to/from the metastore as is. Anything you want to be in 
this metadata have to be implemented either inside parquet-mr or in the plugins.
   
   What you have described is good in adding the encryption properties to the 
schema is that it is easier and less error prone to define the properties just 
next to the schema elements (columns). But you can also write helper methods 
which can write the proper key/values to the hadoop conf or the extra metadata. 
These helpers can be unit tested to ensure they are working correctly. This way 
the implementation of the ParquetWriteSupport can be compact and type/value 
checked.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [parquet-mr] gszadovszky commented on pull request #808: Parquet-1396: Cryptodata Interface for Schema Activation of Parquet E…

Reply via email to