[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701876#action_12701876 ]
Zheng Shao commented on HIVE-352: --------------------------------- Running Yongqiang's tests with hadoop native library, using DefaultCodec for both RCFile and SequenceFile. It seems RCFile's read performance is around 2 times of that of SequenceFiles, probably because we do bulk decompression and one less copy of data. This result looks reasonable. {code} Write RCFile with 80 random string columns and 100000 rows cost 25464 milliseconds. And the file's on disk size is 91874941 Write SequenceFile with 80 random string columns and 100000 rows cost 35711 milliseconds. And the file's on disk size is 102521005 Read only one column of a RCFile with 80 random string columns and 100000 rows cost 594 milliseconds. Read only first and last columns of a RCFile with 80 random string columns and 100000 rows cost 600 milliseconds. Read all columns of a RCFile with 80 random string columns and 100000 rows cost 2227 milliseconds. Read SequenceFile with 80 random string columns and 100000 rows cost 4343 milliseconds. {code} > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 > progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, > hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, > hive-352-2009-4-23.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.