[ 
https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121482#comment-13121482
 ] 

Jonathan Gray commented on HBASE-4465:
--------------------------------------

Committed to trunk.  What's the status on the 89 branch?  Should we keep this 
open?
                
> Lazy-seek optimization for StoreFile scanners
> ---------------------------------------------
>
>                 Key: HBASE-4465
>                 URL: https://issues.apache.org/jira/browse/HBASE-4465
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>              Labels: optimization, seek
>             Fix For: 0.89.20100924, 0.94.0
>
>         Attachments: 
> HBASE-4465_Lazy-seek_optimization_for_St-20111005121052-b2ea8753.patch
>
>
> Previously, if we had several StoreFiles for a column family in a region, we 
> would seek in each of them and only then merge the results, even though the 
> row/column we are looking for might only be in the most recent (and the 
> smallest) file. Now we prioritize our reads from those files so that we check 
> the most recent file first. This is done by doing a "lazy seek" which 
> pretends that the next value in the StoreFile is (seekRow, seekColumn, 
> lastTimestampInStoreFile), which is earlier in the KV order than anything 
> that might actually occur in the file. So if we don't find the result in 
> earlier files, that fake KV will bubble up to the top of the KV heap and a 
> real seek will be done. This is expected to significantly reduce the amount 
> of disk IO (as of 09/22/2011 we are doing dark launch testing and 
> measurement).
> This is joint work with Liyin Tang -- huge thanks to him for many helpful 
> discussions on this and the idea of putting fake KVs with the highest 
> timestamp of the StoreFile in the scanner priority queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to