[ 
https://issues.apache.org/jira/browse/HBASE-28595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846774#comment-17846774
 ] 

Duo Zhang commented on HBASE-28595:
-----------------------------------

[~MikaelSmith] Connection can be broken in any reason, it is not always 
reliable.

The problem here is that, when server hits an exception and closes the scanner, 
it fails to return the exception to client because of a network issue.

The client just receives a connection error(connection closed or connection 
reset, whatever), so it generates a retry RPC.

But at server side, because of the compatiblity code we added, we will consider 
the request is coming from an old client where we will still issue a next or 
close when the scanner is exhausted, so we just return an empty result and 
cause unexpected behavior at client side.

I think this should be fixed at server side, a simple fix would be that, only 
record a closed scanner when it is closed because of exhausted, if it is closed 
because of an exception, we should not record it so we will generate the 
correct UnknownScannerException to later client requests.

Thanks.

> Losing exception from scan RPC can lead to partial results
> ----------------------------------------------------------
>
>                 Key: HBASE-28595
>                 URL: https://issues.apache.org/jira/browse/HBASE-28595
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver, Scanners
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: pull-request-available
>
> This was discovered in Apache Impala using HBase 2.2 based branch hbase 
> client and server. It is not clear yet whether other branches are also 
> affected.
> The issue happens if the server side of the scan throws an exception and 
> closes the scanner, but at the same time, the client gets an rpc connection 
> closed error and doesn't process the exception sent by the server. Client 
> then thinks it got a network error, which leads to retrying the RPC instead 
> of opening a new scanner. But then when the client retry reaches the server, 
> the server returns an empty ScanResponse instead of an error, leading to 
> closing the scanner on client side without returning any error.
> A few pointers to critical parts:
> region server:
> 1st call throws exception leading to closing (but not deleting) scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3539]
> 2nd call (retry of 1st) returns empty results:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3403]
> client:
> some exceptions are handled as non-retriable at RPC level and are only 
> handled through opening a new scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java#L214]
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java#L367]
> This mechanism in the client only works if it gets the exception from the 
> server. If there are connection issues during the RPC then the client won't 
> really know the state of the server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to