[ https://issues.apache.org/jira/browse/HDFS-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035039#comment-16035039 ]
Mukul Kumar Singh commented on HDFS-11887: ------------------------------------------ Hi [~cheersyang], thanks for the review. My apologies that I didn't update the context on this bug. There are two problems in this bug which I found out while solving the issue mentioned in the bug. Issues fixed in this patch =================== 1) There is a StackOverFlow which happens when the client is re-enqueued in case there is reference on the client. This will happen because there is not enough space in the cache, and adding the *removed* element back will trigger the evict operation again. This will cause the stack overflow error. Hence re-adding an element to cache on eviction is not right as it might trigger a stack overflow. For an example please look at the following test results for TestBufferManager https://builds.apache.org/job/PreCommit-HDFS-Build/19742/testReport/org.apache.hadoop.cblock/TestBufferManager/testRepeatedBlockWrites/ 2) The second issue is the same as the one mentioned in the title of this bug, which is about closing the client when there are no references on it. Details about the patch ================== With the updated patch on the bug, the idea is to close the client session only when 1) it has been evicted from the cache 2) there are no references on the client. This will ensure that client is not closed while a) there are any pending operations on the client (because of reference count) b) the client has been evicted from cache (This element has been deemed to be not usable in cache, "evict cache flag") Having the combination of both these checks will ensure that a) no clients are leaked b) no active references on the cache will work on a closed client Comments ========== 1) as explained earlier, if there are more references on the client, it will be closed when all the references drop down to 0. 2) I agree to the point, even I wanted to keep the interface same as earlier, but releaseClient now needs to work on RefCountedXceiverClient, hence the change. 3) The client still works because there are still active references on it, hence it hasn't been closed. However after releasing the client on line#134, the client is closed and then the call fails. In summary, this patch tries to ensure that 1) clients are not leaked 2) there are no errors in the code (stack overflow) 3) clients are kept in cache to ensure that they are not recreated and also so that clients are reused effectively. Please let me know if any more details are needed. I should had added them while updating the bug. > XceiverClientManager should close XceiverClient on eviction from cache > ---------------------------------------------------------------------- > > Key: HDFS-11887 > URL: https://issues.apache.org/jira/browse/HDFS-11887 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Reporter: Mukul Kumar Singh > Assignee: Mukul Kumar Singh > Attachments: HDFS-11887-HDFS-7240.001.patch, > HDFS-11887-HDFS-7240.002.patch > > > XceiverClientManager doesn't close client on eviction which can leak > resources. > {code} > public XceiverClientManager(Configuration conf) { > . > . > . > public void onRemoval( > RemovalNotification<String, XceiverClientWithAccessInfo> > removalNotification) { > // If the reference count is not 0, this xceiver client should > not > // be evicted, add it back to the cache. > WithAccessInfo info = removalNotification.getValue(); > if (info.hasRefence()) { > synchronized (XceiverClientManager.this.openClient) { > XceiverClientManager.this > .openClient.put(removalNotification.getKey(), info); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org