avoid extra next (potentially a seek) if done with column/row
-------------------------------------------------------------

                 Key: HBASE-4433
                 URL: https://issues.apache.org/jira/browse/HBASE-4433
             Project: HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan
            Assignee: Kannan Muthukkaruppan


[Noticed this in 89, but quite likely true of trunk as well.]

When we are done with the requested column(s) the code still does an extra 
next() call before it realizes that it is actually done. This extra next() call 
could potentially result in an unnecessary extra block load. This is likely to 
be especially bad for CFs where the KVs are large blobs where each KV may be 
occupying a block of its own. So the next() can often load a new unrelated 
block unnecessarily.

--

For the simple case of reading say the top-most column in a row in a single 
file, where each column (KV) was say a block of its own-- it seems that we are 
reading 3 blocks, instead of 1 block!

I am working on a simple patch and with that the number of seeks is down to 2. 

[There is still an extra seek left.  I think there were two levels of 
extra/unnecessary next() we were doing without actually confirming that the 
next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff 
avoids. I think the other is at hfs.next() (at the storefile scanner level) 
that's happening whenever a HFile scanner servers out a data-- and perhaps 
that's the additional seek that we need to avoid. But I want to tackle this 
optimization first as the two issues seem unrelated.]

-- 

The basic idea of the patch I am working on/testing is as follows. The 
ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if 
the KV needs to be included and then if done, only in the the next call it 
returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when 
ExplicitColumnTracker knows it is done with a particular column/row, the patch 
attempts to combine the INCLUDE code and done hint into a single match code-- 
INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to