[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683353#action_12683353
]
Zheng Shao commented on HIVE-352:
---------------------------------
Let's do B2.2 first. I guess there will need to be some interface change to
make it possible (SerDe now only deserializes one row out of one Writable,
while we are looking for multiple rows per Writable). We can use the
sequencefile compression support transparently.
Once B2.2 is done, we can move to B2.1. As Joydeep said, we may need to extend
SequenceFile to make split work. At the same time we might want to use
SequenceFile record-compression (instead of SequenceFile block-compression) if
we can make relatively big records. That will save us the time of decompressing
unnecessary columns. Or we can disable SequenceFile compression, and compress
record by record by ourselves. As Joydeep said, we will have to decide whether
we want to open a big number of codecs at the same time, or buffer all
uncompressed data and compress one column by one column when writing out.
BZip2Codec needs 100KB to 900KB per compression codec.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.