[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688397#action_12688397
]
Joydeep Sen Sarma commented on HIVE-352:
----------------------------------------
if u are doing B2.2 - i think it's still pretty easy to make sure that we don't
decompress all columns when we only want a few. using sequencefile record
compression - that's what will happen - and i think the performance gain might
be much less (the benefit would be reduced primarily to better compression of
the data due to columnar format)
In this past i have written a dummywritable class that doesn't deserialize -
but just passes the inputstream passed in by hadoop to the application. (the
serialization framework does this in a less hacky way - and we could do that as
well). if u do it this way - hive serde can get a massive blob of binary data -
and then based on header metadata - only decompress the relevant parts of it.
ie - i don't think we ever need to do B2.1 if we do B2.2 this way.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.