gszadovszky commented on pull request #808:
URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-672774756


   @shangxinli,
   If we will agree on the extending of the schema with metadata is a good idea 
and as you said the serialization/deserialization is also required we need to 
change the format first. The schema objects in parquet-mr are only exist in the 
parquet-mr runtime. To have them serialized we need to convert this object 
structure to the thrift object structure defined in the format. If we don't 
have the new metatdata fields in the format we cannot serialize/deserialize 
them. So it is a much bigger topic. Also, I'd like to see this feature 
separated from the encryption as it would be general approach for storing 
metadata in the schema. Meanwhile, I am not convinced that we need to have such 
extension.
   
   About the namespace prefix etc. I don't agree this is not user friendly. 
That's why I've suggested to implement a helper API so the user doesn't need to 
deal with the conf keys (and values) directly. 
   
   @ggershinsky,
   I don't agree we cannot have a meeting about this topic in terms of 
transparency. What we have to do is to document here about what we have 
discussed and what are the conclusions. Meanwhile, I am not sure if a meeting 
would help but I am happy to participate if anyone thinks otherwise.
   
   Also, if we think we are getting stuck with this issue I would suggest 
involving other members of the community. Maybe draw their attention on the dev 
list about this PR or bring up the topic on the next parquet sync.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to