[
https://issues.apache.org/jira/browse/ZOOKEEPER-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646912#comment-17646912
]
Rushabh Shah commented on ZOOKEEPER-4650:
-----------------------------------------
We don't have ZOOKEEPER-3112 in our build but from the patch it looks like it
will not fix the issue.
The problematic code is
[this|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNIO.java#L258-L264]
{noformat}
void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws
IOException {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
boolean immediateConnect = sock.connect(addr);
if (immediateConnect) {
sendThread.primeConnection();
}
}
{noformat}
bq. sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
This will register the socket (sun.nio.ch.SocketChannelImpl) with the selector .
bq. boolean immediateConnect = sock.connect(addr);
It then tries to connect which will throw UnresolvedAddressException.
{noformat}
@Override
void connect(InetSocketAddress addr) throws IOException {
SocketChannel sock = createSock();
try {
registerAndConnect(sock, addr);
} catch (UnresolvedAddressException | UnsupportedAddressTypeException |
SecurityException | IOException e) {
LOG.error("Unable to open socket to {}", addr);
sock.close();
throw e;
}
...
...
}
{noformat}
In exception handling, we do close the socket but in
AbstractSelectableChannel#implCloseChannel, we only cancel the key.
{noformat}
protected final void implCloseChannel() throws IOException {
implCloseSelectableChannel();
// clone keys to avoid calling cancel when holding keyLock
SelectionKey[] copyOfKeys = null;
synchronized (keyLock) {
if (keys != null) {
copyOfKeys = keys.clone();
}
}
if (copyOfKeys != null) {
for (SelectionKey k : copyOfKeys) {
if (k != null) {
k.cancel(); // invalidate and adds key to cancelledKey set
}
}
}
}
{noformat}
Thats why in heap dump we see 17,120 elements in cancelledKeys structure.
The cancelled keys will only be processed/removed when a select happens.
{noformat}
@Override
protected int doSelect(Consumer<SelectionKey> action, long timeout)
throws IOException
{
...
processUpdateQueue();
processDeregisterQueue();
...
...
processDeregisterQueue();
return processEvents(action);
}
{noformat}
{noformat}
protected final void processDeregisterQueue() throws IOException {
...
Set<SelectionKey> cks = cancelledKeys();
synchronized (cks) {
if (!cks.isEmpty()) {
Iterator<SelectionKey> i = cks.iterator();
while (i.hasNext()) {
SelectionKeyImpl ski = (SelectionKeyImpl)i.next();
i.remove();
// remove the key from the selector
implDereg(ski);
selectedKeys.remove(ski);
keys.remove(ski);
// remove from channel's key set
deregister(ski);
SelectableChannel ch = ski.channel();
if (!ch.isOpen() && !ch.isRegistered())
((SelChImpl)ch).kill();
}
}
}
}
{noformat}
But because we never successfully resolve the destination host(s), ClientCnxn
never invokes the select, hence the leak.
> Zookeeper client leaks file descriptor in case of UnknownHostException
> ----------------------------------------------------------------------
>
> Key: ZOOKEEPER-4650
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4650
> Project: ZooKeeper
> Issue Type: Bug
> Components: java client
> Affects Versions: 3.4.14, 3.7.1
> Reporter: Rushabh Shah
> Priority: Major
> Attachments: Screen Shot 2022-12-13 at 4.30.34 PM.png
>
>
> Zookeeper client is causing the file descriptor leak when it is unable to
> properly reach the destination. In this case, the DNS lookup fails but still
> leaves an unbounded TCP socket.
> A colleague took a heap dump of the application. Saw 17400
> sun.nio.ch.SocketChannelImpl object reachable from root and the number
> matched the open file descriptors on the host.
> !Screen Shot 2022-12-13 at 4.30.34 PM.png!
>
> More analysis from the heap dump in the comment.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)