[
https://issues.apache.org/jira/browse/PARQUET-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221542#comment-14221542
]
Ashish Kumar Singh edited comment on PARQUET-47 at 11/21/14 11:33 PM:
----------------------------------------------------------------------
I have started working on this. I could not assign this JIRA to myself. If
someone could, that will be helpful.
was (Author: singhashish):
I have started working on this. I could assign this JIRA to myself. If someone
could, that will be helpful.
> SERDE backed schema for parquet storage in Hive
> -----------------------------------------------
>
> Key: PARQUET-47
> URL: https://issues.apache.org/jira/browse/PARQUET-47
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Abhishek Agarwal
>
> As of now, for a hive table stored as parquet, the schema can only be
> specified in Hive MetaStore. For our use-case, it is desired that the schema
> be provided by Thrift SerDe rather than MetaStore. Using thrift IDL as a
> schema provider, allows us to maintain a consistent schema across executions
> engines other than Hive such as Pig and Native MR.
> Additionally, for a large sparse schema, it is much easier to build thrift
> objects, and use parquet-thrift/elephant-bird to convert them into
> columns/tuples rather than constructing the whole big tuple itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)