[ 
https://issues.apache.org/jira/browse/HBASE-28595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846804#comment-17846804
 ] 

Csaba Ringhofer commented on HBASE-28595:
-----------------------------------------

About the test scenario:
I would separate closing the connection and the other network issues.

Closed connections seem to be handled explicitly differently on client and 
server: the server throws an exception and closes the scanner if it detects 
that the connection is closed, while the client treats it as retriable without 
starting a new scanner. This makes it easier to reproduce the issue with 
closing connection.

The "exception/results are lost due to network issues" is more of a 
hypothetical issue, I don't have logs which show that it is actually happening. 
 It should be possible, but needs very specific timing.

I used the reproduction scenario with closing connections by [~MikaelSmith] to 
investigate the issue and verify the fix.

> Losing exception from scan RPC can lead to partial results
> ----------------------------------------------------------
>
>                 Key: HBASE-28595
>                 URL: https://issues.apache.org/jira/browse/HBASE-28595
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, regionserver, Scanners
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Critical
>              Labels: pull-request-available
>
> This was discovered in Apache Impala using HBase 2.2 based branch hbase 
> client and server. It is not clear yet whether other branches are also 
> affected.
> The issue happens if the server side of the scan throws an exception and 
> closes the scanner, but at the same time, the client gets an rpc connection 
> closed error and doesn't process the exception sent by the server. Client 
> then thinks it got a network error, which leads to retrying the RPC instead 
> of opening a new scanner. But then when the client retry reaches the server, 
> the server returns an empty ScanResponse instead of an error, leading to 
> closing the scanner on client side without returning any error.
> A few pointers to critical parts:
> region server:
> 1st call throws exception leading to closing (but not deleting) scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3539]
> 2nd call (retry of 1st) returns empty results:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L3403]
> client:
> some exceptions are handled as non-retriable at RPC level and are only 
> handled through opening a new scanner:
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java#L214]
> [https://github.com/apache/hbase/blob/0c8607a35008b7dca15e9daaec41ec362d159d67/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java#L367]
> This mechanism in the client only works if it gets the exception from the 
> server. If there are connection issues during the RPC then the client won't 
> really know the state of the server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to