Abhishek Agarwal created PARQUET-47:
---------------------------------------

             Summary: SERDE backed schema for parquet storage in Hive
                 Key: PARQUET-47
                 URL: https://issues.apache.org/jira/browse/PARQUET-47
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
            Reporter: Abhishek Agarwal


As of now, for a hive table stored as parquet, the schema can only be specified 
in Hive MetaStore. For our use-case, it is desired that the schema be provided 
by Thrift SerDe rather than MetaStore. Using thrift IDL as a schema provider, 
allows us to maintain a consistent schema across executions engines other than 
Hive such as Pig and Native MR. 

Additionally, for a large sparse schema, it is much easier to build thrift 
objects, and use parquet-thrift/elephant-bird to convert them into 
columns/tuples rather than constructing the whole big tuple itself.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to