gszadovszky commented on pull request #808: URL: https://github.com/apache/parquet-mr/pull/808#issuecomment-672774756
@shangxinli, If we will agree on the extending of the schema with metadata is a good idea and as you said the serialization/deserialization is also required we need to change the format first. The schema objects in parquet-mr are only exist in the parquet-mr runtime. To have them serialized we need to convert this object structure to the thrift object structure defined in the format. If we don't have the new metatdata fields in the format we cannot serialize/deserialize them. So it is a much bigger topic. Also, I'd like to see this feature separated from the encryption as it would be general approach for storing metadata in the schema. Meanwhile, I am not convinced that we need to have such extension. About the namespace prefix etc. I don't agree this is not user friendly. That's why I've suggested to implement a helper API so the user doesn't need to deal with the conf keys (and values) directly. @ggershinsky, I don't agree we cannot have a meeting about this topic in terms of transparency. What we have to do is to document here about what we have discussed and what are the conclusions. Meanwhile, I am not sure if a meeting would help but I am happy to participate if anyone thinks otherwise. Also, if we think we are getting stuck with this issue I would suggest involving other members of the community. Maybe draw their attention on the dev list about this PR or bring up the topic on the next parquet sync. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org