Greg Harris created KAFKA-16765: ----------------------------------- Summary: NioEchoServer leaks accepted SocketChannel instances due to race condition Key: KAFKA-16765 URL: https://issues.apache.org/jira/browse/KAFKA-16765 Project: Kafka Issue Type: Bug Components: core, unit tests Affects Versions: 3.8.0 Reporter: Greg Harris
The NioEchoServer has an AcceptorThread that calls accept() to open new SocketChannel instances and insert them into the `newChannels` List, and a main thread that drains the `newChannels` List and moves them to the `socketChannels` List. During shutdown, the serverSocketChannel is closed, which causes both threads to exit their while loops. It is possible for the NioEchoServer main thread to sense the serverSocketChannel close and terminate before the Acceptor thread does, and for the Acceptor thread to put a SocketChannel in `newChannels` before terminating. This instance is never closed by either thread, because it is never moved to `socketChannels`. A precise execution order that has this leak is: 1. NioEchoServer thread locks `newChannels`. 2. Acceptor thread accept() completes, and the SocketChannel is created 3. Acceptor thread blocks waiting for the `newChannels` lock 4. NioEchoServer thread releases the `newChannels` lock and does some processing 5. NioEchoServer#close() is called, which closes the serverSocketChannel 6. NioEchoServer thread checks serverSocketChannel.isOpen() and then terminates 7. Acceptor thread acquires the `newChannels` lock and adds the SocketChannel to `newChannels`. 8. Acceptor thread checks serverSocketChannel.isOpen() and then terminates. 9. NioEchoServer#close() stops blocking now that both other threads have terminated. The end result is that the leaked socket is left open in the `newChannels` list at the end of close(), which is incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010)