[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085881#comment-14085881 ] Lars Hofhansl commented on HBASE-11667: --- I also wonder now how this would behave with a filter that filters most (or all) KVs for a region in the case when we have a stale cache due to splits... Since the ClientScanner has nothing to go by it would redo the scan of the entire prior region. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-doc-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085519#comment-14085519 ] Lars Hofhansl commented on HBASE-11667: --- [~jamestaylor], [~apurtell], FYI. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085626#comment-14085626 ] Devaraj Das commented on HBASE-11667: - [~lhofhansl], HBASE-11665 is tracking failures in the TestRegionReplicas. For scan tests you should run TestReplicasClient tests - that has tests for scans with replicas. Will have a look at your patch. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.98.5, 2.0.0, 0.94.23 Attachments: 11667-0.94.txt, 11667-trunk.txt We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085660#comment-14085660 ] chunhui shen commented on HBASE-11667: -- The logic of skipFirst is used to handle the case that client call 'next' request and server returns the NSRE in the process of scanning. I think it shouln't be removed directly. For example, for a region contains rows: 'aaa','bbb','ccc','ddd'. Do the following things: 1.Client open the scanner(empty start row) of this region. 2.Client call next, and get row 'aaa' 3.Move the region to another server 4.Client call next request to the old server, it will get NSRE, thus client will open the scanner again with 'aaa'(the last result) as the start row. 5.Client should skip the first row 'aaa' So, for the test case testScansWithSplits(), we should do TEST_UTIL.getHBaseAdmin().split(TABLE) in the process of scanning rather than after completing scanning. Pardon me if something wrong. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.98.5, 2.0.0, 0.94.23 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085704#comment-14085704 ] Lars Hofhansl commented on HBASE-11667: --- That's exactly the kind of discussion I wanted to have. In your example [~zjushch], each request to a region server involves a key (the Scan's start key). So after your #3 there would be a new RPC. With a new key ('bbb' if scanner caching is 1 or 'ccc' if scanner caching is 2, etc). In each case the scanner would correctly reset itself to retry only the part needed for the last startkey used for the RPC that failed with NSRE. As for the test, it running/interleaving scans and splits in a loop (20 iteratoions) and test for the scenario you mention. I still think the change is correct. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085727#comment-14085727 ] chunhui shen commented on HBASE-11667: -- For test case, I mean move the split action to the while block, like this: {code} while (rs.next() != null) { c++; if (c % 4333 == 1) { TEST_UTIL.getHBaseAdmin().split(TABLE); } } assertEquals(7733, c); {code} In our internal 0.94 branch, the test will failed with the patch: {noformat} java.lang.AssertionError: expected:7733 but was:7743 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.testScansWithSplits(TestFromClientSide.java:5096) {noformat} [~lhofhansl] Could you take the above change about test case and try a test? Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085731#comment-14085731 ] Lars Hofhansl commented on HBASE-11667: --- Sure. Split is not synchronous though. Will try. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085741#comment-14085741 ] Lars Hofhansl commented on HBASE-11667: --- Hmm... I always get this: {code} Tests in error: testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: testScansWithSplits,wWW4,1407209622238.f53974a70f899779b20750fadb5cf4bd. {code} But I do not get this without the patch to {{ClientScanner}}. Hmm... Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085764#comment-14085764 ] Lars Hofhansl commented on HBASE-11667: --- Actually I do get this exception without the patch {{ClientScanner}} Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085770#comment-14085770 ] Lars Hofhansl commented on HBASE-11667: --- Ah... Now I got a run with {code} Failed tests: testScansWithSplits(org.apache.hadoop.hbase.client.TestFromClientSide): expected:7733 but was:7743 {code} OK... I buy that there is an issue. Although I do not understand why. An RPC either fails or it doesn't. There is no notion of partial RPC. Say the scan RPC with 'aaa' fails with an NSRE. Then no rows of that RPC will be returned and it can be retried. With scanner caching = 1, the next RPC would then try with 'bbb', which would also either succeed or fail. The scanner cannot fail partially *during* an RPC. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085791#comment-14085791 ] Lars Hofhansl commented on HBASE-11667: --- Interestingly it is always wrong by exactly one batch. When I set the caching to 100 I sometimes get 7833 instead of the 7733. Hmm... So maybe not as a simple as I thought. :( Thanks for catching me [~zjushch]! Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085795#comment-14085795 ] Andrew Purtell commented on HBASE-11667: FWIW, that integration test passed with this change on 0.98, executing for 20 minutes. So not good enough. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085797#comment-14085797 ] Hadoop QA commented on HBASE-11667: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659813/IntegrationTestBigLinkedListWithRegionMovement.patch against trunk revision . ATTACHMENT ID: 12659813 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.procedure.TestProcedureManager org.apache.hadoop.hbase.ipc.TestIPC org.apache.hadoop.hbase.master.TestClockSkewDetection Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10290//console This message is automatically generated. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085803#comment-14085803 ] Lars Hofhansl commented on HBASE-11667: --- I think I see what the issue is now. Each RPC needs to set the correct startRow as it might just hit the next region. My patch removes exactly that part. Hence an NSRE in any but the first round would redo scanning from the previous startKey, which then might rescan the last batch (and hence the number of extra rows scanned is always the caching setting). Will close as Invalid unless somebody has more ideas. Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085818#comment-14085818 ] stack commented on HBASE-11667: --- Can we at least doc why we do this setting of the start row since I'd forgotten and other fellas, smart fellas, couldn't see the why w/o intervention from 12k miles away? What about the phoenix issue? Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11667) Simplify ClientScanner logic for NSREs.
[ https://issues.apache.org/jira/browse/HBASE-11667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085842#comment-14085842 ] Lars Hofhansl commented on HBASE-11667: --- Yeah, I can add some Javadoc. James has another workaround for Phoenix, where he detects a stale cache and reloads for the situations where that is necessary (as in the count(*) query I mentioned in the description). Simplify ClientScanner logic for NSREs. --- Key: HBASE-11667 URL: https://issues.apache.org/jira/browse/HBASE-11667 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6 Attachments: 11667-0.94.txt, 11667-trunk.txt, HBASE-11667-0.98.patch, IntegrationTestBigLinkedListWithRegionMovement.patch We ran into an issue with Phoenix where a RegionObserver coprocessor intercepts a scan and returns an aggregate (in this case a count) with a fake row key. It turns out this does not work when the {{ClientScanner}} encounters NSREs, as it uses the last key it saw to reset the scanner to try again (which in this case would be the fake key). While this is arguably a rare case and one could also argue that a region observer just shouldn't do this... While looking at {{ClientScanner}}'s code I found this logic not necessary. A NSRE occurred because we contacted a region server with a key that it no longer hosts. This is the start key, so it is always correct to retry with this same key. That simplifies the ClientScanner logic and also make this sort of coprocessors possible, -- This message was sent by Atlassian JIRA (v6.2#6252)