Abhishek Agarwal created PARQUET-47:
---------------------------------------
Summary: SERDE backed schema for parquet storage in Hive
Key: PARQUET-47
URL: https://issues.apache.org/jira/browse/PARQUET-47
Project: Parquet
Issue Type: Improvement
Components: parquet-mr
Reporter: Abhishek Agarwal
As of now, for a hive table stored as parquet, the schema can only be specified
in Hive MetaStore. For our use-case, it is desired that the schema be provided
by Thrift SerDe rather than MetaStore. Using thrift IDL as a schema provider,
allows us to maintain a consistent schema across executions engines other than
Hive such as Pig and Native MR.
Additionally, for a large sparse schema, it is much easier to build thrift
objects, and use parquet-thrift/elephant-bird to convert them into
columns/tuples rather than constructing the whole big tuple itself.
--
This message was sent by Atlassian JIRA
(v6.2#6252)