[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114113#comment-13114113 ]
jirapos...@reviews.apache.org commented on HBASE-4433: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2044/ ----------------------------------------------------------- Review request for Michael Stack, Jonathan Gray and Mikhail Bautin. Summary ------- Avoids extra next (potentially seek) calls when we are done with each column requested. This addresses bug HBASE-4433. https://issues.apache.org/jira/browse/HBASE-4433 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1175286 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1175286 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1175286 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 1175286 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1175286 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 1175286 Diff: https://reviews.apache.org/r/2044/diff Testing ------- Ran TestBlocksRead/TestExplicitColumnTracker/TestQueryMatcher. Running the full suite now. Thanks, Kannan > avoid extra next (potentially a seek) if done with column/row > ------------------------------------------------------------- > > Key: HBASE-4433 > URL: https://issues.apache.org/jira/browse/HBASE-4433 > Project: HBase > Issue Type: Improvement > Reporter: Kannan Muthukkaruppan > Assignee: Kannan Muthukkaruppan > > [Noticed this in 89, but quite likely true of trunk as well.] > When we are done with the requested column(s) the code still does an extra > next() call before it realizes that it is actually done. This extra next() > call could potentially result in an unnecessary extra block load. This is > likely to be especially bad for CFs where the KVs are large blobs where each > KV may be occupying a block of its own. So the next() can often load a new > unrelated block unnecessarily. > -- > For the simple case of reading say the top-most column in a row in a single > file, where each column (KV) was say a block of its own-- it seems that we > are reading 3 blocks, instead of 1 block! > I am working on a simple patch and with that the number of seeks is down to > 2. > [There is still an extra seek left. I think there were two levels of > extra/unnecessary next() we were doing without actually confirming that the > next was needed. One at the StoreScanner/ScanQueryMatcher level which this > diff avoids. I think the other is at hfs.next() (at the storefile scanner > level) that's happening whenever a HFile scanner servers out a data-- and > perhaps that's the additional seek that we need to avoid. But I want to > tackle this optimization first as the two issues seem unrelated.] > -- > The basic idea of the patch I am working on/testing is as follows. The > ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if > the KV needs to be included and then if done, only in the the next call it > returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases > when ExplicitColumnTracker knows it is done with a particular column/row, the > patch attempts to combine the INCLUDE code and done hint into a single match > code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira