wgtmac commented on PR #1014:
URL: https://github.com/apache/parquet-mr/pull/1014#issuecomment-1382815489

   > > * I'd prefer creating a new JIRA for this refactor to be a prerequisite. 
Merging multiple files to a single one with customized pruning, encryption, and 
codec is also in my mind and will be supported later. I will create separate 
JIRAs as sub-tasks of PARQUET-2075 and work on them progressively.
   > 
   > Perfect! :)
   > 
   > > * Putting the original `created_by` into `key_value_metadata` is a good 
idea. However, it is tricky if a file has been rewritten for several times. 
What about adding a key named `original_created_by` to `key_value_metadata` and 
concatenating all old `created_by`s to it?
   > 
   > It sounds good to me. Maybe have the latest one at the beginning and use 
the separator `'\n'`?
   
   I am afraid some implementations may drop characters after `'\n'` when 
displaying the string content. Let me do some investigation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to