[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488473#comment-13488473
 ] 

Lars Hofhansl commented on HBASE-6874:
--------------------------------------

Yeah, it's tricky to do that at the Scanner level.

In our case we have N ClientScanners. We break up the scan into chunks and for 
each chunk we use a separate ClientScanner (in a nutshell). We then sort the 
chunks (only the chunks not all the KVs) at the client based on the startkey 
for that chunk.
Some of our usecases do relatively large scans (hundreds of millions of rows), 
and we want to engage many cores and spindles at the RegionServers in parallel 
(we control the level of parallelism we want by the chunking)... This is for 
online analytics over preaggregated data.
It's quite possible that our use case is too special to fit into any kind of 
generalized scheme.

                
> Implement prefetching for scanners
> ----------------------------------
>
>                 Key: HBASE-6874
>                 URL: https://issues.apache.org/jira/browse/HBASE-6874
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to