[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-352:
------------------------------

    Attachment: hive-352-2009-4-27.patch

hive-352-2009-4-27.patch changed back to bulk compression and now also compress 
the key part.

Here is a result on TPCH's lineitem:
Direct(incremental) compression, and does not compress key part:
274982705   hdfs://10.61.0.160:9000/user/hdfs/tpch1G_rc
First Buffered then compress(Bulk Compression), and compress key part:
188401365   hdfs://10.61.0.160:9000/user/hdfs/tpch1G_newRC


BTW, I also tried to implement direct(incremental) compression, and tried to 
decompress a value buffer's columns part by part. But at the last step( when 
implementing ValueBuffer's readFields), i noticed that it is not very easy to 
implement it. Because we only hold on InputStream to the underlying file, and 
we need to seek back and forth to decompress part of each columns, and also we 
need to hold one decompress stream for each column. If we seek the inputstream, 
the decompress stream is corrupt. 
To avoid all these, we need to read all needed columns' compressed data into 
memory, and do in memory decompress. But we stil need one decompress stream for 
each column. I stop implementing this at the last step, if it is needed i can 
finish it.

> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 
> progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, 
> hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, 
> hive-352-2009-4-23.patch, hive-352-2009-4-27.patch, 
> HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to