[jira] Commented: (HIVE-352) Make Hive support column based storage

Zheng Shao (JIRA) Thu, 23 Apr 2009 03:09:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701876#action_12701876
 ]


Zheng Shao commented on HIVE-352:
---------------------------------

Running Yongqiang's tests with hadoop native library, using DefaultCodec for 
both RCFile and SequenceFile.

It seems RCFile's read performance is around 2 times of that of SequenceFiles, 
probably because we do bulk decompression and one less copy of data.
This result looks reasonable. 
{code}
Write RCFile with 80 random string columns and 100000 rows cost 25464 
milliseconds. And the file's on disk size is 91874941
Write SequenceFile with 80 random string columns and 100000 rows cost 35711 
milliseconds. And the file's on disk size is 102521005
Read only one column of a RCFile with 80 random string columns and 100000 rows 
cost 594 milliseconds.
Read only first and last columns of a RCFile with 80 random string columns and 
100000 rows cost 600 milliseconds.
Read all columns of a RCFile with 80 random string columns and 100000 rows cost 
2227 milliseconds.
Read SequenceFile with 80  random string columns and 100000 rows cost 4343 
milliseconds.
{code}


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 
> progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, 
> hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, 
> hive-352-2009-4-23.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-352) Make Hive support column based storage

Reply via email to