[ https://issues.apache.org/jira/browse/HBASE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507628#comment-17507628 ]
chenglei edited comment on HBASE-26812 at 3/16/22, 2:01 PM: ------------------------------------------------------------ [~zhangduo], the scenario is in {{RSRpcServices.scan}} for serving remote rpc call, we may directly invoke {{RSRpcServices.scan}} or {{RSRpcServices.get}} on the same RegionServer through {{ShortCircuitingClusterConnection}} in region CPs such as {{RegionObserver.postScannerOpen}} to scan other rows, so the {{RegionScanner}} created for the directly {{RSRpcServices.scan}} or {{RSRpcServices.get}} could not be closed until the outer rpc call completes because there is an outer {{RpcContext}}, and even worse , the {{ServerCall.rpcCallback}} may be override which would cause serious problem. A simple fix I could think is for {{ShortCircuitingClusterConnection.getClient}}, if return {{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and {{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}} method call. was (Author: comnetwork): [~zhangduo], the scenario is in {{RSRpcServices.scan}} or {{RSRpcServices.get}} for serving remote rpc call, we may directly invoke {{RSRpcServices.scan}} or {{RSRpcServices.get}} on the same RegionServer through {{ShortCircuitingClusterConnection}} in region CPs such as {{RegionObserver.postScannerOpen}} to scan other rows, so the {{RegionScanner}} created for the directly {{RSRpcServices.scan}} or {{RSRpcServices.get}} could not be closed until the outer rpc call completes because there is an outer {{RpcContext}}, and even worse , the {{ServerCall.rpcCallback}} may be override which would cause serious problem. A simple fix I could think is for {{ShortCircuitingClusterConnection.getClient}}, if return {{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and {{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}} method call. > ShortCircuitingClusterConnection fails to close RegionScanners when making > short-circuited calls > ------------------------------------------------------------------------------------------------ > > Key: HBASE-26812 > URL: https://issues.apache.org/jira/browse/HBASE-26812 > Project: HBase > Issue Type: Bug > Affects Versions: 2.4.9 > Reporter: Lars Hofhansl > Priority: Critical > > Just ran into this on the Phoenix side. > We retrieve a Connection via > {{{}RegionCoprocessorEnvironment.createConnection... getTable(...){}}}. And > then call get on that table. The Get's key happens to be local. Now each call > to table.get() leaves an open StoreScanner around forever. (verified with a > memory profiler). > There references are held via > RegionScannerImpl.storeHeap.scannersForDelayedClose. Eventially the > RegionServer goes into a GC of death and can only ended with kill -9. > The reason appears to be that in this case there is no currentCall context. > Some time in 2.x the Rpc handler/call was made responsible for closing open > region scanners, but we forgot to handle {{ShortCircuitingClusterConnection}} > It's not immediately clear how to fix this. But it does make > ShortCircuitingClusterConnection useless and dangerous. If you use it, you > *will* create a giant memory leak. -- This message was sent by Atlassian Jira (v8.20.1#820001)