[ https://issues.apache.org/jira/browse/HDFS-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036235#comment-16036235 ]
Weiwei Yang commented on HDFS-11887: ------------------------------------ Hi [~msingh] Even we assume all clients will call {{releaseClient}}, but the problem is you can't predict when "releaseClient" will be called. If a client wants to acquire a client and use it for sometime before releasing it, and at some point of time, the cache size reaches the max limit (or other evict criteria satisfied) and this client is evicted. In this case, that's a leak. Also the existing client may get connection refused error if it continue to use that old instance. I am concerned because if we have a big cluster, hundreds of clients may heavily put/evict xciever clients in/from this cache, concurrently. If this is not handled correctly, there might be causing a lot of wired problems for clients. That's why I was saying if we can satisfy following requirement # XceiverClientManager manages a bunch of XceiverClients (per container) in cache, each XceiverClients can be reused by multiple clients if they want to access same container. # XceiverClientManager needs to make sure a XceiverClients won't be removed from cache as long as there still has client using it (avoid heavy operation that recreates a connection). # If a XceiverClients is removed from cache, guarantees it is closed to avoid resource leak. Does that make sense? > XceiverClientManager should close XceiverClient on eviction from cache > ---------------------------------------------------------------------- > > Key: HDFS-11887 > URL: https://issues.apache.org/jira/browse/HDFS-11887 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Reporter: Mukul Kumar Singh > Assignee: Mukul Kumar Singh > Attachments: HDFS-11887-HDFS-7240.001.patch, > HDFS-11887-HDFS-7240.002.patch > > > XceiverClientManager doesn't close client on eviction which can leak > resources. > {code} > public XceiverClientManager(Configuration conf) { > . > . > . > public void onRemoval( > RemovalNotification<String, XceiverClientWithAccessInfo> > removalNotification) { > // If the reference count is not 0, this xceiver client should > not > // be evicted, add it back to the cache. > WithAccessInfo info = removalNotification.getValue(); > if (info.hasRefence()) { > synchronized (XceiverClientManager.this.openClient) { > XceiverClientManager.this > .openClient.put(removalNotification.getKey(), info); > } > } > {code} > Also a stack overflow can be triggered because of putting the element back in > the cache on eviction. > {code} > synchronized (XceiverClientManager.this.openClient) { > XceiverClientManager.this > .openClient.put(removalNotification.getKey(), info); > } > {code} > This bug will try to fix both of these cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org