[jira] [Comment Edited] (PARQUET-47) SERDE backed schema for parquet storage in Hive

Ashish Kumar Singh (JIRA) Fri, 21 Nov 2014 15:34:44 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221542#comment-14221542
 ]


Ashish Kumar Singh edited comment on PARQUET-47 at 11/21/14 11:33 PM:
----------------------------------------------------------------------

I have started working on this. I could not assign this JIRA to myself. If 
someone could, that will be helpful.


was (Author: singhashish):
I have started working on this. I could assign this JIRA to myself. If someone 
could, that will be helpful.

> SERDE backed schema for parquet storage in Hive
> -----------------------------------------------
>
>                 Key: PARQUET-47
>                 URL: https://issues.apache.org/jira/browse/PARQUET-47
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Abhishek Agarwal
>
> As of now, for a hive table stored as parquet, the schema can only be 
> specified in Hive MetaStore. For our use-case, it is desired that the schema 
> be provided by Thrift SerDe rather than MetaStore. Using thrift IDL as a 
> schema provider, allows us to maintain a consistent schema across executions 
> engines other than Hive such as Pig and Native MR. 
> Additionally, for a large sparse schema, it is much easier to build thrift 
> objects, and use parquet-thrift/elephant-bird to convert them into 
> columns/tuples rather than constructing the whole big tuple itself.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PARQUET-47) SERDE backed schema for parquet storage in Hive

Reply via email to