[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702146#action_12702146 ]
He Yongqiang commented on HIVE-352: ----------------------------------- >>Can we also get some numbers on the amount of memory usage? I rerun the test(the same test as Zheng's,but with no native codec) in my local using local fs and DefaultCodec, and it read all columns of a rc file with 80 columns and 100000 rows(size:91849881 Bytes). And the maximum memory usages is shown below( i do couple of command 'ps -o vsz,rss,rsz,%mem -p 549' every minute), VSZ RSS RSZ %MEM 766732 63472 63472 -3.0 BTW, my physical memory is 3GB. >>Was this just a hdfs read or the measurement of a Hive query? The test was just a file read test. However, with no native codec and my results shows a much diff from Zheng's in that SequenceFile does much worse in my test. {noformat} Write RCFile with 80 random string columns and 100000 rows cost 30643 milliseconds. And the file's on disk size is 91849881 Write SequenceFile with 80 random string columns and 100000 rows cost 62034 milliseconds. And the file's on disk size is 102521005 Read only one column of a RCFile with 80 random string columns and 100000 rows cost 703 milliseconds. Read only first and last columns of a RCFile with 80 random string columns and 100000 rows cost 526 milliseconds. Read all columns of a RCFile with 80 random string columns and 100000 rows cost 3131 milliseconds. Read SequenceFile with 80 random string columns and 100000 rows cost 47876 milliseconds. {noformat} Why native codec matters so much for sequece file and not for RCFile? It should influence both RCFile and SequenceFile in the same way. > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 > progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, > hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, > hive-352-2009-4-23.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.