[ https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287949#comment-13287949 ]
Zhihong Yu commented on HBASE-5974: ----------------------------------- HRegionInterface.java doesn't exist in trunk so patch v2 wouldn't apply to trunk. I would suggest creating patch for trunk and run through hadoop QA. {code} + LOG.info("Seq number based scan API not present at RS side! Trying with API: " {code} I think the above log should be at warn level. {code} + } else if (ioe instanceof CallSequenceOutOfOrderException) { + // The callSeq from the client not matched with the one expected at the RS side + // This means the RS might have done extra scanning of data which is not received by the + // client.Throw a DNRE so that we close the current scanner and opens a new one with RS. + throw new DoNotRetryIOException("Reset scanner", ioe); {code} Should we disclose a little more detail in the message of DNRIOE ? The above is the same as response to NotServingRegionException and RegionServerStoppedException. 'not matched with' -> 'does not match' 'is not received' -> 'has not been received' 'opens a new' -> 'open a new' {code} + // if callSeq do not match throw Exception straight away. This needs to be performed even {code} 'do not match' -> 'does not match' {code} +public class TestClientScannerRPCTimesout {^M {code} Please add short javadoc for the test class. I think it should be called TestClientScannerRPCTimeout. Please use utility such as dos2unix to remove the trailing ^M from the patch file. {code} + public static class RegionServerWithScanTimesout extends MiniHBaseClusterRegionServer {^M {code} The above class can be made private. It should be named RegionServerWithScanTimeout. {code} + * Thrown by a region server while scan related next() calls. Both client and server maintain a^M + * callSequence and if the both do not match, RS will throw this exception.^M + */^M +public class CallSequenceOutOfOrderException extends IOException {^M {code} CallSequenceOutOfOrderException should extend DoNotRetryIOException so that we don't need to create DoNotRetryIOException instance (shown above). 'while scan related next()' -> 'while doing scan related next()' 'the both do not' -> 'they do not' It would be nice for Todd to take a look at the patch. > Scanner retry behavior with RPC timeout on next() seems incorrect > ----------------------------------------------------------------- > > Key: HBASE-5974 > URL: https://issues.apache.org/jira/browse/HBASE-5974 > Project: HBase > Issue Type: Bug > Components: client, regionserver > Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0 > Reporter: Todd Lipcon > Assignee: Anoop Sam John > Priority: Critical > Fix For: 0.94.1 > > Attachments: HBASE-5974_0.94.patch, HBASE-5974_94-V2.patch > > > I'm seeing the following behavior: > - set RPC timeout to a short value > - call next() for some batch of rows, big enough so the client times out > before the result is returned > - the HConnectionManager stuff will retry the next() call to the same server. > At this point, one of two things can happen: 1) the previous next() call will > still be processing, in which case you get a LeaseException, because it was > removed from the map during the processing, or 2) the next() call will > succeed but skip the prior batch of rows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira