[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683353#action_12683353
 ] 

Zheng Shao commented on HIVE-352:
---------------------------------

Let's do B2.2 first. I guess there will need to be some interface change to 
make it possible (SerDe now only deserializes one row out of one Writable, 
while we are looking for multiple rows per Writable). We can use the 
sequencefile compression support transparently.

Once B2.2 is done, we can move to B2.1. As Joydeep said, we may need to extend 
SequenceFile to make split work. At the same time we might want to use 
SequenceFile record-compression (instead of SequenceFile block-compression) if 
we can make relatively big records. That will save us the time of decompressing 
unnecessary columns. Or we can disable SequenceFile compression, and compress 
record by record by ourselves. As Joydeep said, we will have to decide whether 
we want to open a big number of codecs at the same time, or buffer all 
uncompressed data and compress one column by one column when writing out. 
BZip2Codec needs 100KB to 900KB per compression codec.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to