[ https://issues.apache.org/jira/browse/HBASE-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627494#comment-13627494 ]
Ted Yu commented on HBASE-8316: ------------------------------- Here is the javadoc for requestSeek(): {code} * Similar to {@link #seek} (or {@link #reseek} if forward is true) but only * does a seek operation after checking that it is really necessary for the * row/column combination specified by the kv parameter. This function was * added to avoid unnecessary disk seeks by checking row-column Bloom filters * before a seek on multi-column get/scan queries, and to optimize by looking * up more recent files first. {code} Looks like requestSeek() should perform better. > JoinedHeap for essential column families should reseek instead of seek > ---------------------------------------------------------------------- > > Key: HBASE-8316 > URL: https://issues.apache.org/jira/browse/HBASE-8316 > Project: HBase > Issue Type: Sub-task > Components: Filters, Performance, regionserver > Reporter: Lars Hofhansl > Fix For: 0.98.0, 0.94.7, 0.95.1 > > Attachments: 8316-0.94.txt, 8316-0.96.txt, 8316-trunk.txt > > > This was raised by the Phoenix team. During a profiling session we noticed > that catching the joinedHeap up to the current rows via seek causes a > performance regression, which makes the joinedHeap only efficient when either > a high or low percentage is matched by the filter. > (High is fine, because the joinedHeap will not get behind as often and does > not need to be caught up, low is fine, because the seek isn't happening > frequently). > In our tests we found that the solution is quite simple: Replace seek with > reseek. Patch coming soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira