[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488463#comment-13488463 ]
Karthik Ranganathan commented on HBASE-6874: -------------------------------------------- Thought about the N scanners, its a complicated change - you would have to change the entire scan protocol. Each of the next calls in scans are not numbered, and so you could go out of whack if prefetching N (and throw in exceptions). There is also the basic issue right now that scans do retries which is wrong. Also, reasoning about it another way, if your in memory scan throughput is > the time to read from disk, you're probably good. I found that there are other unrelated bottlenecks preventing this from being the case. Of course, if the filtering is very heavy then this will breakdown... you probably want to implement prefetching based on the num filtered rows, which should not be too hard. I have a patch I have tested with, but its waiting on HBASE-6770 - that is going to refactor scans quite a bit. Will put a patch out once that is done. > Implement prefetching for scanners > ---------------------------------- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task > Reporter: Karthik Ranganathan > Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira