[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667011#comment-13667011 ] Hudson commented on HBASE-6874: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #542 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/542/]) HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb (Revision 1486246) Result = FAILURE jxiang : Files : * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > Fix For: 0.89-fb > > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666998#comment-13666998 ] Hudson commented on HBASE-6874: --- Integrated in hbase-0.95-on-hadoop2 #111 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/111/]) HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb (Revision 1486247) Result = FAILURE jxiang : Files : * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/branches/0.95/hbase-protocol/src/main/protobuf/Client.proto * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > Fix For: 0.89-fb > > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666913#comment-13666913 ] Hudson commented on HBASE-6874: --- Integrated in hbase-0.95 #214 (See [https://builds.apache.org/job/hbase-0.95/214/]) HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb (Revision 1486247) Result = SUCCESS jxiang : Files : * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/branches/0.95/hbase-protocol/src/main/protobuf/Client.proto * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > Fix For: 0.89-fb > > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666879#comment-13666879 ] Hudson commented on HBASE-6874: --- Integrated in HBase-TRUNK #4142 (See [https://builds.apache.org/job/HBase-TRUNK/4142/]) HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb (Revision 1486246) Result = SUCCESS jxiang : Files : * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScannerHolder.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > Fix For: 0.89-fb > > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640759#comment-13640759 ] Jimmy Xiang commented on HBASE-6874: Cool. I will port it to trunk (HBASE-8420). > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640748#comment-13640748 ] Karthik Ranganathan commented on HBASE-6874: This has been implemented and checked in into the 0.89-fb branch. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640532#comment-13640532 ] Jimmy Xiang commented on HBASE-6874: [~karthik.ranga], any update on this issue? > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491740#comment-13491740 ] Lars Hofhansl commented on HBASE-6874: -- Oh, I see. I had assumed the prefetching would be implemented on the client only (the RPCs for scanning could be scheduled asynchronously, so the caller never has to wait for them); but obviously there are many ways to do this. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491700#comment-13491700 ] Karthik Ranganathan commented on HBASE-6874: Matt - Yes, we had chatted about that as well. But right now, the focus is to improve scan performance from memory. We should definitely cycle back to that as well. The thought is if we can get one thread reading one block from memory to outperform the disk, we can get the parallelism from multiple on-going scans. In addition, the scan perf when I started this effort was around 20MB/s from memory, so no matter how much we read from the disk, it would be slow. Now, I am able to benchmark more than 100MB/s (all results on one thread), so other things make sense. Will publish my results in detail. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491695#comment-13491695 ] Karthik Ranganathan commented on HBASE-6874: Lars - the dependency on HBASE-6770 is more to make the code simpler. Currently, the HRegionServer loops over numRows, and the RegionScanner loops over the columns in the various CF's but for one row. HBASE-6770 will move the looping on the numRows into the RegionScanner itself, because we need to track both memory size and number of rows - in order to respect the more restrictive of the two. Once that happens, we can implement prefetching in the RegionScanner itself, instead of spreading the logic in HRegionServer also. So more of a code-simplicity and not having to resolve conflicts thing. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491213#comment-13491213 ] Lars Hofhansl commented on HBASE-6874: -- Why does this need HBASE-6770? Can't we just prefetch in multiple of the scanner caching (as long as the prefetching is optional)? > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490940#comment-13490940 ] Matt Corgan commented on HBASE-6874: Have you guys considered the possibility fetching multiple blocks in a single call to HDFS? If compressed block size is 10KB, then maybe large scans should be requesting 100+ blocks (1MB) at a time given that rotational drives can read several MB in the same time they can do a seek. The prefetch thread could chop the 1MB result into the individual blocks before putting them into the block cache. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490898#comment-13490898 ] Karthik Ranganathan commented on HBASE-6874: Awesome, then layering in multi-pre-fetch should be very easy! > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489546#comment-13489546 ] stack commented on HBASE-6874: -- Just to say that in trunk, next invocations now carry a sequence number HBASE-5974. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488689#comment-13488689 ] Karthik Ranganathan commented on HBASE-6874: Actually did this analysis and enhancement for an online analytics use-case as well (and search indexing), and most of what you say maps one to one. The only difference I guess is that so far we are not heavily relying on server side filtering much, so decided on punting on the prefetching=n case for now (we actually discussed this). > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488473#comment-13488473 ] Lars Hofhansl commented on HBASE-6874: -- Yeah, it's tricky to do that at the Scanner level. In our case we have N ClientScanners. We break up the scan into chunks and for each chunk we use a separate ClientScanner (in a nutshell). We then sort the chunks (only the chunks not all the KVs) at the client based on the startkey for that chunk. Some of our usecases do relatively large scans (hundreds of millions of rows), and we want to engage many cores and spindles at the RegionServers in parallel (we control the level of parallelism we want by the chunking)... This is for online analytics over preaggregated data. It's quite possible that our use case is too special to fit into any kind of generalized scheme. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488463#comment-13488463 ] Karthik Ranganathan commented on HBASE-6874: Thought about the N scanners, its a complicated change - you would have to change the entire scan protocol. Each of the next calls in scans are not numbered, and so you could go out of whack if prefetching N (and throw in exceptions). There is also the basic issue right now that scans do retries which is wrong. Also, reasoning about it another way, if your in memory scan throughput is > the time to read from disk, you're probably good. I found that there are other unrelated bottlenecks preventing this from being the case. Of course, if the filtering is very heavy then this will breakdown... you probably want to implement prefetching based on the num filtered rows, which should not be too hard. I have a patch I have tested with, but its waiting on HBASE-6770 - that is going to refactor scans quite a bit. Will put a patch out once that is done. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6874) Implement prefetching for scanners
[ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488448#comment-13488448 ] Lars Hofhansl commented on HBASE-6874: -- [~karthik.ranga] Do you have a patch. We were just discussing something similar here and I was about to open a similar issue before I found this one. This is even more useful with scanner caching. One could even go a step further and parallelize the prefetching into N threads (useful if the results are heavily prefiltered at the server). We do our own parallel scanner fetching (not necessarily on region or buffer boundaries), but it would be nice if that could be generalized and be part of HBase. > Implement prefetching for scanners > -- > > Key: HBASE-6874 > URL: https://issues.apache.org/jira/browse/HBASE-6874 > Project: HBase > Issue Type: Sub-task >Reporter: Karthik Ranganathan >Assignee: Karthik Ranganathan > > I did some quick experiments by scanning data that should be completely in > memory and found that adding pre-fetching increases the throughput by about > 50% from 26MB/s to 39MB/s. > The idea is to perform the next in a background thread, and keep the result > ready. When the scanner's next comes in, return the pre-computed result and > issue another background read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira