[ https://issues.apache.org/jira/browse/HIVE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744635#action_12744635 ]
Ning Zhang commented on HIVE-756: --------------------------------- The ret.set(i, BytesRefWritable.ZeroBytesRefWritable); in RCFile.java:1273 seems unnecessary here since when the BytesRefArrayWritable is constructed each member is initialized as the same value as BytesRefWritable.ZeroBytesRefWritable. So as long as the list of projected columns do not change during the table scan iterator RCFileRecord.next(), we don't need to set this values. The reason I'm kind of picky about this small thing is that the CPU cost could be a huge difference by maintaining reasonable invariants (assertions) during the two nested loops (over rows and over columns) and removing unnecessary code or reducing number of loops. The code inside the loop/iterator should be really lean and only do the absolutely necessary things. In my test, these simple changes reduce the iterator fetch time from 5 sec to less than 1 sec, and about 15% - 20% overall query performance. In this case the invariant is that the projected columns do not change during the table scan. Please let me know if you think there are cases that break the invariant. I'll revert the changes. > performance improvement for RCFile and ColumnarSerDe in Hive > ------------------------------------------------------------ > > Key: HIVE-756 > URL: https://issues.apache.org/jira/browse/HIVE-756 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: hive-756.patch, hive-756_2.patch > > > There are some easy performance improvements in the columnar storage in Hive > I found during Hackathon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.