Cheng Lian created PARQUET-194:
----------------------------------

             Summary: Provide callback to allow user defined key-value metadata 
merging strategy
                 Key: PARQUET-194
                 URL: https://issues.apache.org/jira/browse/PARQUET-194
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
    Affects Versions: 1.6.0
            Reporter: Cheng Lian


When merging footers, Parquet doesn't know how to merge conflicting user 
defined key-value metadata entries, and simply throws. It would be better to 
provide callbacks to let users define metadata merging strategies.

For example, in Spark SQL, we store our own schema information in Parquet files 
as key-value metadata (similar to parquet-avro). While trying to add schema 
merging support for reading Parquet files with different but compatible 
schemas, {{InitContext.getMergedKeyValueMetaData}} throws because we have 
different Spark SQL schemas stored in different Parquet data files. Thus, we 
have to overwrite {{ParquetInputFormat}} and merge the schema within 
{{getSplits}}, which is kinda hacky and inconvenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to