subject:"\[jira\] \[Commented\] \(KAFKA\-16768\) SocketServer leaks accepted SocketChannel instances due to race condition"

[jira] [Commented] (KAFKA-16768) SocketServer leaks accepted SocketChannel instances due to race condition

2024-05-15 Thread Greg Harris (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846684#comment-17846684
 ] 

Greg Harris commented on KAFKA-16768:
-

[~muralibasani] Do you mean call Acceptor#close? That might cause an infinite 
loop (because Acceptor#close calls Processor#close which calls 
Processor#closeAll).
I think if the Processor closeAll (where newConnections is drained) could call 
just the acceptor thread `join` and get the same effect, but without the 
infinite loop.

> SocketServer leaks accepted SocketChannel instances due to race condition
> -
>
> Key: KAFKA-16768
> URL: https://issues.apache.org/jira/browse/KAFKA-16768
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Priority: Major
>
> The SocketServer has threads for Acceptors and Processors. These threads 
> communicate via Processor#accept/Processor#configureNewConnections and the 
> `newConnections` queue.
> During shutdown, the Acceptor and Processors are each stopped by setting 
> shouldRun to false, and then shutdown proceeds asynchronously in all 
> instances together. This leads to a race condition where an Acceptor accepts 
> a SocketChannel and queues it to a Processor, but that Processor instance has 
> already started shutting down and has already drained the newConnections 
> queue.
> KAFKA-16765 is an analogous bug in NioEchoServer, which uses a completely 
> different implementation but has the same flaw.
> An example execution order that includes this leak:
> 1. Acceptor#accept() is called, and a new SocketChannel is accepted.
> 2. Acceptor#assignNewConnection() begins
> 3. Acceptor#close() is called, which sets shouldRun to false in the Acceptor 
> and attached Processor instances
> 4. Processor#run() checks the shouldRun variable, and exits the loop
> 5. Processor#closeAll() executes, and drains the `newConnections` variable
> 6. Processor#run() returns and the Processor thread terminates
> 7. Acceptor#assignNewConnection() calls Processor#accept(), which adds the 
> SocketChannel to `newConnections`
> 8. Acceptor#assignNewConnection() returns
> 9. Acceptor#run() checks the shouldRun variable and exits the loop, and the 
> Acceptor thread terminates.
> 10. Acceptor#close() joins all of the terminated threads, and returns
> At the end of this sequence, there are still open SocketChannel instances in 
> newConnections, which are then considered leaked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-16768) SocketServer leaks accepted SocketChannel instances due to race condition

2024-05-15 Thread Muralidhar Basani (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-16768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846607#comment-17846607
 ] 

Muralidhar Basani commented on KAFKA-16768:
---

To fix this, when Processor#closeAll() is called, should we close all the 
acceptors in dataPlaneAcceptors, so that no new connections are accepted ?

 

> SocketServer leaks accepted SocketChannel instances due to race condition
> -
>
> Key: KAFKA-16768
> URL: https://issues.apache.org/jira/browse/KAFKA-16768
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.8.0
>Reporter: Greg Harris
>Priority: Major
>
> The SocketServer has threads for Acceptors and Processors. These threads 
> communicate via Processor#accept/Processor#configureNewConnections and the 
> `newConnections` queue.
> During shutdown, the Acceptor and Processors are each stopped by setting 
> shouldRun to false, and then shutdown proceeds asynchronously in all 
> instances together. This leads to a race condition where an Acceptor accepts 
> a SocketChannel and queues it to a Processor, but that Processor instance has 
> already started shutting down and has already drained the newConnections 
> queue.
> KAFKA-16765 is an analogous bug in NioEchoServer, which uses a completely 
> different implementation but has the same flaw.
> An example execution order that includes this leak:
> 1. Acceptor#accept() is called, and a new SocketChannel is accepted.
> 2. Acceptor#assignNewConnection() begins
> 3. Acceptor#close() is called, which sets shouldRun to false in the Acceptor 
> and attached Processor instances
> 4. Processor#run() checks the shouldRun variable, and exits the loop
> 5. Processor#closeAll() executes, and drains the `newConnections` variable
> 6. Processor#run() returns and the Processor thread terminates
> 7. Acceptor#assignNewConnection() calls Processor#accept(), which adds the 
> SocketChannel to `newConnections`
> 8. Acceptor#assignNewConnection() returns
> 9. Acceptor#run() checks the shouldRun variable and exits the loop, and the 
> Acceptor thread terminates.
> 10. Acceptor#close() joins all of the terminated threads, and returns
> At the end of this sequence, there are still open SocketChannel instances in 
> newConnections, which are then considered leaked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-16768) SocketServer leaks accepted SocketChannel instances due to race condition

[jira] [Commented] (KAFKA-16768) SocketServer leaks accepted SocketChannel instances due to race condition

2 matches

Site Navigation

Mail list logo

Footer information