[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503574#comment-14503574 ] Jonathan Lawlor commented on HBASE-13090: - Filed HBASE-13514 to address the test failures in branch-1 and branch-1.1 Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503512#comment-14503512 ] Jonathan Lawlor commented on HBASE-13090: - [~tedyu] thanks for digging in here. I have done some investigation into the root cause of this issue and it seems to be coming from the field {{MIN_RPC_TIMEOUT}} inside {{RpcRetryingCaller}} in branch-1. This {{MIN_RPC_TIMEOUT}} field in branch-1 prevents setting the RPC timeout value to anything less than 2 seconds. In master this field no longer exists and the timeout value can be specified to be as small as we wish. In the case of TestScannerHeartbeatMessages, the RPC timeout was specified to be 0.5 seconds which is why it fails when it is 2 seconds instead. I will attach a patch shortly to address this issue, thanks! Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0, 1.2.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501415#comment-14501415 ] Ted Yu commented on HBASE-13090: TestScannerHeartbeatMessages fails in branch-1 e.g. https://builds.apache.org/job/HBase-1.1/411/testReport/org.apache.hadoop.hbase.regionserver/TestScannerHeartbeatMessages/testScannerHeartbeatMessages/ In TestScannerHeartbeatMessages#testImportanceOfHeartbeats(), there was no exception raised when heartbeatsEnabled was set to false: {code} HeartbeatRPCServices.heartbeatsEnabled = false; try { testCallable.call(); } catch (Exception e) { return; {code} Debugging to see what might be the cause. BTW test table should be deleted at the end of the test: {code} @@ -173,6 +173,7 @@ public class TestScannerHeartbeatMessages { @AfterClass public static void tearDownAfterClass() throws Exception { +TEST_UTIL.deleteTable(TABLE_NAME); TEST_UTIL.shutdownMiniCluster(); } {code} I sometimes got the following in subsequent test run: {code} at org.apache.hadoop.hbase.regionserver.TestScannerHeartbeatMessages.createTestTable(TestScannerHeartbeatMessages.java:139) at org.apache.hadoop.hbase.regionserver.TestScannerHeartbeatMessages.setUpBeforeClass(TestScannerHeartbeatMessages.java:134) Caused by: org.apache.hadoop.ipc.RemoteException: testScannerHeartbeatMessagesTable at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:283) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:106) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:1) {code} Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500349#comment-14500349 ] Jonathan Lawlor commented on HBASE-13090: - [~ndimiduk] I believe the change is solid. Just figured with branch-1.1 release so close may be a bit 'risky' to stick such a large change in right before release. While the unit tests added do stress the relevant code paths, it would be nice to run it against a workload that was having timeout problems before to prove its worth Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500331#comment-14500331 ] Nick Dimiduk commented on HBASE-13090: -- bq. I wanted to put this in hbase 1.1 because it so sweet but [~jonathan.lawlor] won't let me... says too much change too close to release it might be a bit 'risky'. Ahem. That's a shame; this seems very important for Phoenix users who are pushing the line on OLAP-kinds of queries. What can we do to raise your confidence in the patch for branch-1.1? IT tests? CM? Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500078#comment-14500078 ] stack commented on HBASE-13090: --- +1 carried over from rb. This is a beautiful patch. I will commit later today. The long line is from generated code and is likely cause of the checkstyle complaint. I'll check it out on commit (and do any fixup if needed). Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501030#comment-14501030 ] Hudson commented on HBASE-13090: FAILURE: Integrated in HBase-1.1 #408 (See [https://builds.apache.org/job/HBase-1.1/408/]) HBASE-13090 Addendum fixes compilation error in TestScannerHeartbeatMessages (tedyu: rev 7b84d7d7812cffb7da3ccfb40123dc43f18e594c) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501034#comment-14501034 ] Hudson commented on HBASE-13090: FAILURE: Integrated in HBase-1.2 #3 (See [https://builds.apache.org/job/HBase-1.2/3/]) HBASE-13090 Addendum fixes compilation error in TestScannerHeartbeatMessages (tedyu: rev b655a9909e1f5ed2823d0adfd1f42d5af5017dd1) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500899#comment-14500899 ] Ted Yu commented on HBASE-13090: I got the following compilation error on branch-1: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure [ERROR] /Users/tyu/1-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java:[349,11] error: unreported exception InterruptedException; must be caught or declared to be thrown {code} Attached addendum fixes the compilation. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500924#comment-14500924 ] Jonathan Lawlor commented on HBASE-13090: - [~tedyu] Thanks for catching that. Seems HRegionServer no longer throws InterruptedException in master. Addendum lgtm. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500936#comment-14500936 ] Ted Yu commented on HBASE-13090: Integrated addendum to branch-1 and branch-1.1 Thanks for taking a look, Jonathan Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500913#comment-14500913 ] Hudson commented on HBASE-13090: FAILURE: Integrated in HBase-1.2 #2 (See [https://builds.apache.org/job/HBase-1.2/2/]) HBASE-13090 Progress heartbeats for long running scanners (stack: rev a4f77d49a5ae347c78e3d5934c4fc005d3914cb1) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NoLimitScannerContext.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestPartialResultsFromClientSide.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScannerContext.java * hbase-protocol/src/main/protobuf/Client.proto * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestStripeCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java * hbase-common/src/main/resources/hbase-default.xml * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500911#comment-14500911 ] Hudson commented on HBASE-13090: FAILURE: Integrated in HBase-1.1 #407 (See [https://builds.apache.org/job/HBase-1.1/407/]) HBASE-13090 Progress heartbeats for long running scanners (stack: rev 43f24db82566818d02062466ac421d86ddb735d8) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestStripeCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java * hbase-common/src/main/resources/hbase-default.xml * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-protocol/src/main/protobuf/Client.proto * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NoLimitScannerContext.java * hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScannerContext.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestPartialResultsFromClientSide.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501023#comment-14501023 ] Hudson commented on HBASE-13090: SUCCESS: Integrated in HBase-TRUNK #6387 (See [https://builds.apache.org/job/HBase-TRUNK/6387/]) HBASE-13090 Progress heartbeats for long running scanners (stack: rev abe3796a9907485c875932caa5f1c82071495c0f) * hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java * hbase-protocol/src/main/protobuf/Client.proto * hbase-common/src/main/resources/hbase-default.xml * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestPartialResultsFromClientSide.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScannerContext.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestStripeCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/client/ClientSideRegionScanner.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NoLimitScannerContext.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerHeartbeatMessages.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Fix For: 2.0.0, 1.1.0 Attachments: 13090-branch-1.addendum, HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499079#comment-14499079 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725996/HBASE-13090-v7.patch against master branch at commit e08ef99e3042767eaf2d11adae783674acfdddeb. ATTACHMENT ID: 12725996 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1911 checkstyle errors (more than the master's current 1910 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeats, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, MoreResultsInRegion, HeartbeatMessage, }); +public synchronized boolean next(ListCell outResults, ScannerContext scannerContext) throws IOException { {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13727//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13727//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13727//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13727//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch, HBASE-13090-v7.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370484#comment-14370484 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705727/HBASE-13090-v6.patch against master branch at commit 0d766544166fc9630bb00ae14a4a34a69d93f127. ATTACHMENT ID: 12705727 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1918 checkstyle errors (more than the master's current 1917 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeats, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, HeartbeatMessage, }); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13322//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch, HBASE-13090-v6.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370017#comment-14370017 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705648/HBASE-13090-v4.patch against master branch at commit 27cf749af884edae55454c885c7fb066f0a33c79. ATTACHMENT ID: 12705648 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1919 checkstyle errors (more than the master's current 1917 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeats, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, HeartbeatMessage, }); +public NextState next(ListCell outResults, int batchLimit, long sizeLimit) throws IOException { {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegion Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13319//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch, HBASE-13090-v4.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367025#comment-14367025 ] Eshcar Hillel commented on HBASE-13090: --- Could be useful to return a *non* empty result array even when the region is not exhausted. For example, if the scanner is async (HBASE-13071) the application can start iterating over the results instead of waiting for the server to collect the entire batch. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367026#comment-14367026 ] Eshcar Hillel commented on HBASE-13090: --- Could be useful to return a *non* empty result array even when the region is not exhausted. For example, if the scanner is async (HBASE-13071) the application can start iterating over the results instead of waiting for the server to collect the entire batch. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367402#comment-14367402 ] Jonathan Lawlor commented on HBASE-13090: - [~eshcar] Actually, that is how it works (sorry, I was explicitly clear). When the time limit is reached the server will return to the client whatever it has accumulated thus far in a heartbeat message. What I meant by #2 is that it is possible (in the case of aggressive filtering) that when the time limit is reached, the server hasn't had a chance to accumulate ANY Results. In such a case, the Result array returned to the client would be empty Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367423#comment-14367423 ] Jonathan Lawlor commented on HBASE-13090: - edit: was not* clear Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366036#comment-14366036 ] Lars Hofhansl commented on HBASE-13090: --- Great. +1 Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365427#comment-14365427 ] Jonathan Lawlor commented on HBASE-13090: - A heartbeat message is very similar to the typical response from the server with the following two exceptions: 1. The heartbeat message will be tagged with the heartbeat flag in the ScanResponse 2. The heartbeat message may contain an empty Result array when the region on the server has not been exhausted (i.e. there are still elements to be scanned in the current region) Scanners currently track their position by saving lastResult, and this mechanism will continue to work as expected with heartbeats since heartbeats ensure that we receive a Result back from the server before we return anything to the application layer. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364513#comment-14364513 ] He Liangliang commented on HBASE-13090: --- Similar to HBASE-13215. It's nice to piggyback the current scanner position in the heartbeat when the limit is reached. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361181#comment-14361181 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704509/HBASE-13090-v2.patch against master branch at commit e60cae0500daf3c146e10d808c5070c6cb24ecec. ATTACHMENT ID: 12704509 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1918 checkstyle errors (more than the master's current 1917 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + result = result (hasClientHandlesHeartbeatMessages() == other.hasClientHandlesHeartbeatMessages()); + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeatMessages, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, HeartbeatMessage, }); + || !Bytes.equals(row, offset, length, matcher.row, matcher.rowOffset, matcher.rowLength)) { +public ScanResponse scan(RpcController controller, ScanRequest request) throws ServiceException { +public HeartbeatReversedKVHeap(List? extends KeyValueScanner scanners, KVComparator comparator) {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13232//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361439#comment-14361439 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704530/HBASE-13090-v3.patch against master branch at commit e60cae0500daf3c146e10d808c5070c6cb24ecec. ATTACHMENT ID: 12704530 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 28 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + result = result (hasClientHandlesHeartbeatMessages() == other.hasClientHandlesHeartbeatMessages()); + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeatMessages, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, HeartbeatMessage, }); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13237//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357903#comment-14357903 ] Jonathan Lawlor commented on HBASE-13090: - Thanks for the comments [~stack] bq. If only timeout, then maybe premature for ScanLimit unless anything in current Scan structure that might sit better in ScanLimit? I was thinking that we could combine the batch limit, size limit, and now the time limit into ScannerLimit object. With this patch, the InternalScanner and RegionScanner interfaces now have a large cascading call structure that looks like this: {code} NextState next(ListCell result) throws IOException; ... NextState next(ListCell result, int batchLimit) throws IOException; ... NextState next(ListCell result, int batchLimit, long sizeLimit) throws IOException; ... NextState next(ListCell result, int batchLimit, long sizeLimit, long timeLimit) throws IOException; {code} As more limits are added, it gets uglier and uglier. The idea with ScannerLimit would be to change it to this: {code} NextState next(ListCell result) throws IOException; ... NextState next(ListCell result, ScannerLimit limit) throws IOException; {code} Where the ScannerLimit object can have as many limits specified as it wants (may only contain a time limit, or may contain a time limit, batch limit and size limit). bq. What would be the downsides if default was to allow return of partials to clients? So right now partial result support is on by default but in the case that the scan is specified to be a small scan we disable partial results server side. This means that in the case of small scans we wouldn't allow heartbeat messages either since they could potentially create partials. Outside of small scans heartbeats would be supported. bq. since you can't specify your own Scanner implementation serverside (you can't right?) As far as I can tell there is no nice way to specify your own StoreScanner implementation but upon further investigation it looks like I can specify my own KeyValueHeap implementation inside the RegionScanners. This would allow me to take this method out. Going to investigate further and see if this ugly postHeapNext method can be taken out. bq. When do I call isHeartbeatMessage? At want point in the processing? Currently it is used inside ClientScanner.java after the Result array comes back from the server. By checking it here, we can see if the most recent response from the server (the one that returned the Results array) was a heartbeat message. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357867#comment-14357867 ] stack commented on HBASE-13090: --- Nice writeup [~jonathan.lawlor] If only timeout, then maybe premature for ScanLimit unless anything in current Scan structure that might sit better in ScanLimit? bq. if the client has specified that heartbeats are supported AND partial results are also supported This might be ok for 1.1 but partials should be on all the time in 2.0.. This feature should be on all the time in 2.0. What would be the downsides if default was to allow return of partials to clients? On postHeapNext, yeah, ugly, but since you can't specify your own Scanner implementation serverside (you can't right?), ugly injection is all you have ... so yeah, ugly but we need it (can you make the scan latched rather than slowed) When do I call isHeartbeatMessage? At want point in the processing? Your reasoning that new session or reset doesn't work makes sense to me. Will give review on patch later. Good stuff [~jonathan.lawlor] Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357997#comment-14357997 ] Hadoop QA commented on HBASE-13090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704058/HBASE-13090-v1.patch against master branch at commit 9c83fa7b52188d6bdfebcba75272c5c11e8b8566. ATTACHMENT ID: 12704058 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 20 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1926 checkstyle errors (more than the master's current 1924 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + result = result (hasClientHandlesHeartbeatMessages() == other.hasClientHandlesHeartbeatMessages()); + new java.lang.String[] { Region, Scan, ScannerId, NumberOfRows, CloseScanner, NextCallSeq, ClientHandlesPartials, ClientHandlesHeartbeatMessages, }); + new java.lang.String[] { CellsPerResult, ScannerId, MoreResults, Ttl, Results, Stale, PartialFlagPerResult, HeartbeatMessage, }); +public NextState next(ListCell outResults, int batchLimit, long sizeLimit) throws IOException { + NextState next(ListCell result, int batchLimit, long sizeLimit, long timeLimit) throws IOException; {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13206//console This message is automatically generated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350782#comment-14350782 ] Jonathan Lawlor commented on HBASE-13090: - bq. Why does RsRpcServices have to be involved? Could remaining scan time not be up in RegionScanner? I think RsRpcServices needs to be involved because it has the global view of when the scan started. A particular call to RegionScanner#nextRaw may not necessarily cause a timeout, but multiple calls to RegionScanner#nextRaw must be made in order to form the ScanResponse. In other words, a timeout may not be caused by a single call to RegionScanner#nextRaw but rather the accumulated time of all calls necessary to form the ScanResponse. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350796#comment-14350796 ] stack commented on HBASE-13090: --- [~jonathan.lawlor] Doesn't RegionScanner get created when the scan starts? Can it not run the timer? Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350766#comment-14350766 ] stack commented on HBASE-13090: --- Thanks [~jonathan.lawlor] Why does RsRpcServices have to be involved? Could remaining scan time not be up in RegionScanner? bq. This means that it would still be possible to timeout due to a single long running StoreScanner#next() call in the event that partial Results are not supported. Dang. Can we flag these timeouts as Its your own fault or, don't use filter or don't short scan ? If you can do the heartbeat usiing ScanResponse rather than pollute Result, that'd be better. Looks good to me [~jonathan.lawlor] [~lhofhansl] Any input here honey? Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350768#comment-14350768 ] stack commented on HBASE-13090: --- Or mighty [~andrew.purt...@gmail.com]? Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350874#comment-14350874 ] Jonathan Lawlor commented on HBASE-13090: - bq. Doesn't RegionScanner get created when the scan starts? Can it not run the timer? That's true, the RegionScanner is created when the scan starts. It also remains open Server-side as long as the Client does not close it (as is the case in non-small scans). To keep all accounting of the timer within the RegionScanner we could add a call to RegionScanner#updateTimeoutTimestamp (or something along those lines) into RsRpcServices each time we either create or retrieve the RegionScanner (would avoid accounting for timeouts within RsRpcServices). The timestamp would need to be updated on each RsRpcServices#scan call to make sure that we aren't using a previously defined timeout timestamp that would be too restrictive at this point. Then all timeout information would be communicated to RsRpcServices via the newly defined NextState from HBASE-11544. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350965#comment-14350965 ] stack commented on HBASE-13090: --- That sounds good. A reset or a new session because there may be other stuff the RegionScanner wants to reset on session start other than just timers. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351027#comment-14351027 ] Andrew Purtell commented on HBASE-13090: Sounds ok to me too, will be interested to see the details in a patch. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349215#comment-14349215 ] Jonathan Lawlor commented on HBASE-13090: - With HBASE-11544 now in, I was thinking of tackling this one next and was looking for some feedback on the thought process: Implementing the timeout server side would involve changes at three different levels: * RSRpcServers * RegionScannerImpl/ReversedRegionScannerImpl * StoreScanner The RSRpcServices could maintain a variable; something along the lines of remainingScanTime. This value could be initialized to be some fraction of the scanner timeout (maybe half would be good enough?). On each call to RegionScanner#nextRaw, RSRpcServices would communicate that the RegionScanner can take at most remainingScanTime to retrieve a Result -- if a Result cannot be formed in that time, a timeout occurs. The RegionScanner would communicate this same remainingScanTime to the StoreScanner so that calls to InternalScanner#next() may also timeout if they are taking too long. Note that if partial Results are NOT supported by the scan configuration (as is the case for small scans, and scans with a filter that requires whole rows to be read before a filtering decision can be made) then the timeout would not be enforceable within StoreScanner but only within RegionScannerImpl and RSRpcServices. This means that it would still be possible to timeout due to a single long running StoreScanner#next() call in the event that partial Results are not supported. If a timeout does occur on the server, we would have to decide how this should be communicated back to the Client. I was thinking it would be most appropriate to communicate this back to the client via fields in the ScanResponse rather than flags on the Results in the ScanResponse (there is already a lot of state information implied through the contents of the Results in the ScanResponse and adding more seems like it would complicate things). Something along the lines of a timeoutOccurred boolean flag may be sufficient. Then on the Client side we could decide if enough Results were accumulated prior to the timeout to service the application request or if we must make another RPC to enough Results. If anyone else has been thinking about how to approach the solution to this issue or has any other ideas please chime in. Any feedback would be much appreciated. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343108#comment-14343108 ] Eshcar Hillel commented on HBASE-13090: --- In addition to a timer, or as an alternative to it, one can consider capping prefetched data at the server side by counting the number of rows scanned at each prefetch step. A capping factor limits the max number of rows to be scanned before returning the result to the client. This way when the limit is exceeded the server sends whatever data it gathered so far. If no data was found it only sends a heartbeat. When finished scanning the region signal that it is exhausted. At the client side, the scanner continuos to scan agains the current region until it is exhausted. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335320#comment-14335320 ] Lars Hofhansl commented on HBASE-13090: --- Related to discussion on HBASE-13082. There we've been discussing exiting the scan loop to be able to release the lock in order to give flushes/compactions a chance to finish. Exiting the scan loop after some time react to slow scans seems a prerequisite for this. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335140#comment-14335140 ] Andrew Purtell commented on HBASE-13090: Perhaps as simple as checking from a timer if any Results have been sent over the preceding interval, forcing back an empty one if none have been sent and no new results are available yet. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region is filtered out. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)