[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667011#comment-13667011
 ] 

Hudson commented on HBASE-6874:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #542 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/542/])
HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb 
(Revision 1486246)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* 
/hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
> Fix For: 0.89-fb
>
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666998#comment-13666998
 ] 

Hudson commented on HBASE-6874:
---

Integrated in hbase-0.95-on-hadoop2 #111 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/111/])
HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb 
(Revision 1486247)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* 
/hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* /hbase/branches/0.95/hbase-protocol/src/main/protobuf/Client.proto
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
> Fix For: 0.89-fb
>
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666913#comment-13666913
 ] 

Hudson commented on HBASE-6874:
---

Integrated in hbase-0.95 #214 (See 
[https://builds.apache.org/job/hbase-0.95/214/])
HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb 
(Revision 1486247)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* 
/hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* /hbase/branches/0.95/hbase-protocol/src/main/protobuf/Client.proto
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
> Fix For: 0.89-fb
>
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-05-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666879#comment-13666879
 ] 

Hudson commented on HBASE-6874:
---

Integrated in HBase-TRUNK #4142 (See 
[https://builds.apache.org/job/HBase-TRUNK/4142/])
HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb 
(Revision 1486246)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* 
/hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
> Fix For: 0.89-fb
>
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-04-24 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640759#comment-13640759
 ] 

Jimmy Xiang commented on HBASE-6874:


Cool.  I will port it to trunk (HBASE-8420).

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-04-24 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640748#comment-13640748
 ] 

Karthik Ranganathan commented on HBASE-6874:


This has been implemented and checked in into the 0.89-fb branch.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2013-04-24 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640532#comment-13640532
 ] 

Jimmy Xiang commented on HBASE-6874:


[~karthik.ranga], any update on this issue?

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491740#comment-13491740
 ] 

Lars Hofhansl commented on HBASE-6874:
--

Oh, I see. I had assumed the prefetching would be implemented on the client 
only (the RPCs for scanning could be scheduled asynchronously, so the caller 
never has to wait for them); but obviously there are many ways to do this.


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-06 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491700#comment-13491700
 ] 

Karthik Ranganathan commented on HBASE-6874:


Matt - Yes, we had chatted about that as well. But right now, the focus is to 
improve scan performance from memory. We should definitely cycle back to that 
as well. The thought is if we can get one thread reading one block from memory 
to outperform the disk, we can get the parallelism from multiple on-going 
scans. In addition, the scan perf when I started this effort was around 20MB/s 
from memory, so no matter how much we read from the disk, it would be slow. 
Now, I am able to benchmark more than 100MB/s (all results on one thread), so 
other things make sense. Will publish my results in detail.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-06 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491695#comment-13491695
 ] 

Karthik Ranganathan commented on HBASE-6874:


Lars - the dependency on HBASE-6770 is more to make the code simpler. 
Currently, the HRegionServer loops over numRows, and the RegionScanner loops 
over the columns in the various CF's but for one row. HBASE-6770 will move the 
looping on the numRows into the RegionScanner itself, because we need to track 
both memory size and number of rows - in order to respect the more restrictive 
of the two. Once that happens, we can implement prefetching in the 
RegionScanner itself, instead of spreading the logic in HRegionServer also. So 
more of a code-simplicity and not having to resolve conflicts thing.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491213#comment-13491213
 ] 

Lars Hofhansl commented on HBASE-6874:
--

Why does this need HBASE-6770? Can't we just prefetch in multiple of the 
scanner caching (as long as the prefetching is optional)?


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-05 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490940#comment-13490940
 ] 

Matt Corgan commented on HBASE-6874:


Have you guys considered the possibility fetching multiple blocks in a single 
call to HDFS?  If compressed block size is 10KB, then maybe large scans should 
be requesting 100+ blocks (1MB) at a time given that rotational drives can read 
several MB in the same time they can do a seek.  The prefetch thread could chop 
the 1MB result into the individual blocks before putting them into the block 
cache.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-05 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490898#comment-13490898
 ] 

Karthik Ranganathan commented on HBASE-6874:


Awesome, then layering in multi-pre-fetch should be very easy!

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489546#comment-13489546
 ] 

stack commented on HBASE-6874:
--

Just to say that in trunk, next invocations now carry a sequence number 
HBASE-5974.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-11-01 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488689#comment-13488689
 ] 

Karthik Ranganathan commented on HBASE-6874:


Actually did this analysis and enhancement for an online analytics use-case as 
well (and search indexing), and most of what you say maps one to one. The only 
difference I guess is that so far we are not heavily relying on server side 
filtering much, so decided on punting on the prefetching=n case for now (we 
actually discussed this).

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-10-31 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488473#comment-13488473
 ] 

Lars Hofhansl commented on HBASE-6874:
--

Yeah, it's tricky to do that at the Scanner level.

In our case we have N ClientScanners. We break up the scan into chunks and for 
each chunk we use a separate ClientScanner (in a nutshell). We then sort the 
chunks (only the chunks not all the KVs) at the client based on the startkey 
for that chunk.
Some of our usecases do relatively large scans (hundreds of millions of rows), 
and we want to engage many cores and spindles at the RegionServers in parallel 
(we control the level of parallelism we want by the chunking)... This is for 
online analytics over preaggregated data.
It's quite possible that our use case is too special to fit into any kind of 
generalized scheme.


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-10-31 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488463#comment-13488463
 ] 

Karthik Ranganathan commented on HBASE-6874:


Thought about the N scanners, its a complicated change - you would have to 
change the entire scan protocol. Each of the next calls in scans are not 
numbered, and so you could go out of whack if prefetching N (and throw in 
exceptions). There is also the basic issue right now that scans do retries 
which is wrong. Also, reasoning about it another way, if your in memory scan 
throughput is > the time to read from disk, you're probably good. I found that 
there are other unrelated bottlenecks preventing this from being the case. Of 
course, if the filtering is very heavy then this will breakdown... you probably 
want to implement prefetching based on the num filtered rows, which should not 
be too hard.

I have a patch I have tested with, but its waiting on HBASE-6770 - that is 
going to refactor scans quite a bit. Will put a patch out once that is done.

> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6874) Implement prefetching for scanners

2012-10-31 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488448#comment-13488448
 ] 

Lars Hofhansl commented on HBASE-6874:
--

[~karthik.ranga] Do you have a patch. We were just discussing something similar 
here and I was about to open a similar issue before I found this one. This is 
even more useful with scanner caching.

One could even go a step further and parallelize the prefetching into N threads 
(useful if the results are heavily prefiltered at the server).

We do our own parallel scanner fetching (not necessarily on region or buffer 
boundaries), but it would be nice if that could be generalized and be part of 
HBase.


> Implement prefetching for scanners
> --
>
> Key: HBASE-6874
> URL: https://issues.apache.org/jira/browse/HBASE-6874
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in 
> memory and found that adding pre-fetching increases the throughput by about 
> 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result 
> ready. When the scanner's next comes in, return the pre-computed result and 
> issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira