[ 
https://issues.apache.org/jira/browse/HBASE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507628#comment-17507628
 ] 

chenglei edited comment on HBASE-26812 at 3/16/22, 2:01 PM:
------------------------------------------------------------

[~zhangduo], the scenario is in {{RSRpcServices.scan}}  for serving remote rpc 
call, we may directly invoke {{RSRpcServices.scan}} or {{RSRpcServices.get}} on 
the same RegionServer through {{ShortCircuitingClusterConnection}} in region 
CPs such as {{RegionObserver.postScannerOpen}} to scan other rows, so the 
{{RegionScanner}} created for the directly {{RSRpcServices.scan}} or 
{{RSRpcServices.get}}  could not be closed until the outer rpc call completes 
because there is an outer {{RpcContext}}, and even worse , the 
{{ServerCall.rpcCallback}} may be override which would cause serious problem. 
A simple fix I could think is for 
{{ShortCircuitingClusterConnection.getClient}}, if return 
{{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper 
class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and 
{{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}} 
method call.


was (Author: comnetwork):
[~zhangduo], the scenario is in {{RSRpcServices.scan}} or {{RSRpcServices.get}} 
 for serving remote rpc call, we may directly invoke {{RSRpcServices.scan}} or 
{{RSRpcServices.get}} on the same RegionServer through 
{{ShortCircuitingClusterConnection}} in region CPs such as 
{{RegionObserver.postScannerOpen}} to scan other rows, so the {{RegionScanner}} 
created for the directly {{RSRpcServices.scan}} or {{RSRpcServices.get}}  could 
not be closed until the outer rpc call completes because there is an outer 
{{RpcContext}}, and even worse , the {{ServerCall.rpcCallback}} may be override 
which would cause serious problem. 
A simple fix I could think is for 
{{ShortCircuitingClusterConnection.getClient}}, if return 
{{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper 
class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and 
{{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}} 
method call.

> ShortCircuitingClusterConnection fails to close RegionScanners when making 
> short-circuited calls
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26812
>                 URL: https://issues.apache.org/jira/browse/HBASE-26812
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.9
>            Reporter: Lars Hofhansl
>            Priority: Critical
>
> Just ran into this on the Phoenix side.
> We retrieve a Connection via 
> {{{}RegionCoprocessorEnvironment.createConnection... getTable(...){}}}. And 
> then call get on that table. The Get's key happens to be local. Now each call 
> to table.get() leaves an open StoreScanner around forever. (verified with a 
> memory profiler).
> There references are held via 
> RegionScannerImpl.storeHeap.scannersForDelayedClose. Eventially the 
> RegionServer goes into a GC of death and can only ended with kill -9.
> The reason appears to be that in this case there is no currentCall context. 
> Some time in 2.x the Rpc handler/call was made responsible for closing open 
> region scanners, but we forgot to handle {{ShortCircuitingClusterConnection}}
> It's not immediately clear how to fix this. But it does make 
> ShortCircuitingClusterConnection useless and dangerous. If you use it, you 
> *will* create a giant memory leak.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to