[ https://issues.apache.org/jira/browse/PHOENIX-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436407#comment-15436407 ]
James Taylor commented on PHOENIX-1751: --------------------------------------- It appears that in 0.98, a delay > rpcTimeout in the postScannerNext causes the client scan to fail with many retries leading to a number of these exceptions from HRegionServer.scan(): {code} rsh = scanners.get(scannerName); if (rsh == null) { LOG.info("Client tried to access missing scanner " + scannerName); throw new UnknownScannerException( "Name: " + scannerName + ", already closed?"); } {code} and then finally this exception: {code} if (request.getNextCallSeq() != rsh.getNextCallSeq()) { throw new OutOfOrderScannerNextException( "Expected nextCallSeq: " + rsh.getNextCallSeq() + " But the nextCallSeq got from client: " + request.getNextCallSeq() + "; request=" + TextFormat.shortDebugString(request)); } {code} In 1.1, the rsh.getNextCallSeq() is never > 1. I've tried to compare the execution path between the working 1.1 and 0.98, but it seems that the client RPC classes have changed a lot. bq. Break up the work in small enough chunks so the RPC timeout is not exceeded If Phoenix stats are enabled, this helps by doing just that, but there's no guarantee it'll finish quick enough. Without stats enabled, it'll likely happen unless users set the RPC timeout very hight. If we can't get this patch in our 0.98 branch, then we'll continue to see spurious renew lease issues. Should I file an HBase bug, [~apurtell]? > Perform aggregations, sorting, etc, in the preScannerNext instead of > postScannerOpen > ------------------------------------------------------------------------------------ > > Key: PHOENIX-1751 > URL: https://issues.apache.org/jira/browse/PHOENIX-1751 > Project: Phoenix > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Attachments: 1751-WIP-v2.txt, 1751-WIP-v2b.patch, 1751-WIP.txt, > PHOENIX-1751-0.98.patch, PHOENIX-1751-v2c.patch, PHOENIX-1751.patch, > PHOENIX-1751_v3.patch, PHOENIX-1751_v4.patch > > > HBase retains a lease for every scanner. Then lease expires the scan will no > longer (be allowed to) work. The leases guard against the client going away, > and allow cleaning up resources if that happens. > At various points HBase "suspends" the lease while the region server are > working on behalf of this scanner, so that the lease won't expire even though > the server is working on it. > HBase does that during the scanning process. Crucially it suspends the leaser > after the scanner is opened, before next() is issued on it. > The outcome of all this is that Phoenix executes aggregates, sorts, etc, with > the lease in place, and hence if these take a bit the lease can expire even > though the server was working on it. > Phoenix should do this work in preScannerNext, being careful that the > precalculation is only performed once. > I'll attach a sample patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)