[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646912#comment-17646912
 ] 

Rushabh Shah commented on ZOOKEEPER-4650:
-----------------------------------------

We don't have  ZOOKEEPER-3112  in our build but from the patch it looks like it 
will not fix the issue.

The problematic code is 
[this|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNIO.java#L258-L264]
 

{noformat}
    void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws 
IOException {
        sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
        boolean immediateConnect = sock.connect(addr);
        if (immediateConnect) {
            sendThread.primeConnection();
        }
    }
{noformat}

bq.    sockKey = sock.register(selector, SelectionKey.OP_CONNECT);

This will register the socket (sun.nio.ch.SocketChannelImpl) with the selector .

bq.         boolean immediateConnect = sock.connect(addr);

It then tries to connect which will throw UnresolvedAddressException. 
 
{noformat}
    @Override
    void connect(InetSocketAddress addr) throws IOException {
        SocketChannel sock = createSock();
        try {
            registerAndConnect(sock, addr);
        } catch (UnresolvedAddressException | UnsupportedAddressTypeException | 
SecurityException | IOException e) {
            LOG.error("Unable to open socket to {}", addr);
            sock.close();
            throw e;
        }
       ...
       ...
    }
{noformat}

In exception handling, we do close the socket but in 
AbstractSelectableChannel#implCloseChannel, we only cancel the key.

{noformat}
    protected final void implCloseChannel() throws IOException {
        implCloseSelectableChannel();

        // clone keys to avoid calling cancel when holding keyLock
        SelectionKey[] copyOfKeys = null;
        synchronized (keyLock) {
            if (keys != null) {
                copyOfKeys = keys.clone();
            }
        }

        if (copyOfKeys != null) {
            for (SelectionKey k : copyOfKeys) {
                if (k != null) {
                    k.cancel();   // invalidate and adds key to cancelledKey set
                }
            }
        }
    }
{noformat}

Thats why in heap dump we see 17,120 elements in cancelledKeys structure.

The cancelled keys will only be processed/removed when a select happens.

{noformat}
    @Override
    protected int doSelect(Consumer<SelectionKey> action, long timeout)
        throws IOException
    {
        ...
        processUpdateQueue();
        processDeregisterQueue();
        ...
        ...
        processDeregisterQueue();
        return processEvents(action);
    }
{noformat}


{noformat}
    protected final void processDeregisterQueue() throws IOException {
        ...
        Set<SelectionKey> cks = cancelledKeys();
        synchronized (cks) {
            if (!cks.isEmpty()) {
                Iterator<SelectionKey> i = cks.iterator();
                while (i.hasNext()) {
                    SelectionKeyImpl ski = (SelectionKeyImpl)i.next();
                    i.remove();

                    // remove the key from the selector
                    implDereg(ski);

                    selectedKeys.remove(ski);
                    keys.remove(ski);

                    // remove from channel's key set
                    deregister(ski);

                    SelectableChannel ch = ski.channel();
                    if (!ch.isOpen() && !ch.isRegistered())
                        ((SelChImpl)ch).kill();
                }
            }
        }
    }
{noformat}
 
But because we never successfully resolve the destination host(s), ClientCnxn 
never invokes the select, hence the leak.  

> Zookeeper client leaks file descriptor in case of UnknownHostException
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4650
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4650
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.4.14, 3.7.1
>            Reporter: Rushabh Shah
>            Priority: Major
>         Attachments: Screen Shot 2022-12-13 at 4.30.34 PM.png
>
>
> Zookeeper client is causing the file descriptor leak when it is unable to 
> properly reach the destination. In this case, the DNS lookup fails but still 
> leaves an unbounded TCP socket.
> A colleague took a heap dump of the application. Saw 17400 
> sun.nio.ch.SocketChannelImpl object reachable from root and the number 
> matched the open file descriptors on the host.
> !Screen Shot 2022-12-13 at 4.30.34 PM.png!
>  
> More analysis from the heap dump in the comment.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to