[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

Kevin Wilfong (JIRA) Fri, 01 Feb 2013 11:48:15 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569010#comment-13569010
 ]


Kevin Wilfong commented on HIVE-3874:
-------------------------------------

The reason I supported the idea was that I was hoping this would get it into 
the repo sooner.  Based on my experiences trying it so far, it seems a little 
unstable, but I would like to help fix the issues.  Getting the code in contrib 
would make it easier for other contributors to provide fixes as we develop 
them, without suggesting to users that it is as solid as any other piece of 
code in Hive (relatively speaking of course).  I assumed it would be pulled 
into the serde module after this (short) period of cleanup.

If people are willing to pull this into the serde module with the knowledge of 
that instability and that people would be working to fix it, I'd be happy with 
that too.
                
> Create a new Optimized Row Columnar file format for Hive
> --------------------------------------------------------
>
>                 Key: HIVE-3874
>                 URL: https://issues.apache.org/jira/browse/HIVE-3874
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

Reply via email to