Cheng Lian created PARQUET-194:
----------------------------------
Summary: Provide callback to allow user defined key-value metadata
merging strategy
Key: PARQUET-194
URL: https://issues.apache.org/jira/browse/PARQUET-194
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Affects Versions: 1.6.0
Reporter: Cheng Lian
When merging footers, Parquet doesn't know how to merge conflicting user
defined key-value metadata entries, and simply throws. It would be better to
provide callbacks to let users define metadata merging strategies.
For example, in Spark SQL, we store our own schema information in Parquet files
as key-value metadata (similar to parquet-avro). While trying to add schema
merging support for reading Parquet files with different but compatible
schemas, {{InitContext.getMergedKeyValueMetaData}} throws because we have
different Spark SQL schemas stored in different Parquet data files. Thus, we
have to overwrite {{ParquetInputFormat}} and merge the schema within
{{getSplits}}, which is kinda hacky and inconvenient.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)