[ https://issues.apache.org/jira/browse/HBASE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121458#comment-13121458 ]
Mikhail Bautin commented on HBASE-4465: --------------------------------------- All unit tests passed, ready to be committed. > Lazy-seek optimization for StoreFile scanners > --------------------------------------------- > > Key: HBASE-4465 > URL: https://issues.apache.org/jira/browse/HBASE-4465 > Project: HBase > Issue Type: Improvement > Reporter: Mikhail Bautin > Assignee: Mikhail Bautin > Labels: optimization, seek > Fix For: 0.89.20100924, 0.94.0 > > > Previously, if we had several StoreFiles for a column family in a region, we > would seek in each of them and only then merge the results, even though the > row/column we are looking for might only be in the most recent (and the > smallest) file. Now we prioritize our reads from those files so that we check > the most recent file first. This is done by doing a "lazy seek" which > pretends that the next value in the StoreFile is (seekRow, seekColumn, > lastTimestampInStoreFile), which is earlier in the KV order than anything > that might actually occur in the file. So if we don't find the result in > earlier files, that fake KV will bubble up to the top of the KV heap and a > real seek will be done. This is expected to significantly reduce the amount > of disk IO (as of 09/22/2011 we are doing dark launch testing and > measurement). > This is joint work with Liyin Tang -- huge thanks to him for many helpful > discussions on this and the idea of putting fake KVs with the highest > timestamp of the StoreFile in the scanner priority queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira