[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-2743:
-----------------------------------
    Description: 
This is a tricky issue found while debugging failure of "flaky" watcher test 
(ZOOKEEPER-2686). When closing a Netty connection, depend on timing the 
connection bean registered when the connection was provisioned might not get 
unregistered, leading to leaked Java beans. 

The race happens at the time when the client is in the process of finalizing 
the session. As part of session finalization, a connection bean will be 
registered [1]. But right before the registering bean, the connection might 
gets closed, in cases for example the server that the client is connecting to 
is shutdown. As part of connection close, the bean will be un-registered, as 
expected [2], however the problem is when we execute at [2], the connection 
bean might not finish registering at [1], so the unregister of bean is a NOP. 
What's worse, as part of connection close, we remove this connection from 
connection factory [3], so future connection close call will get short 
circuited and directly return; in other words the bean unregister code in 
connection close call will only get executed once. Depends on luck, the bean 
might not get unregistered, as previously illustrated.


[1] 
https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L700
[2] 
https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L114
[3] 
https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L96

> Netty connection leaks JMX connection bean upon connection close in certain 
> race conditions.
> --------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2743
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2743
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.10, 3.5.2
>            Reporter: Michael Han
>            Assignee: Michael Han
>              Labels: netty
>
> This is a tricky issue found while debugging failure of "flaky" watcher test 
> (ZOOKEEPER-2686). When closing a Netty connection, depend on timing the 
> connection bean registered when the connection was provisioned might not get 
> unregistered, leading to leaked Java beans. 
> The race happens at the time when the client is in the process of finalizing 
> the session. As part of session finalization, a connection bean will be 
> registered [1]. But right before the registering bean, the connection might 
> gets closed, in cases for example the server that the client is connecting to 
> is shutdown. As part of connection close, the bean will be un-registered, as 
> expected [2], however the problem is when we execute at [2], the connection 
> bean might not finish registering at [1], so the unregister of bean is a NOP. 
> What's worse, as part of connection close, we remove this connection from 
> connection factory [3], so future connection close call will get short 
> circuited and directly return; in other words the bean unregister code in 
> connection close call will only get executed once. Depends on luck, the bean 
> might not get unregistered, as previously illustrated.
> [1] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L700
> [2] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L114
> [3] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L96



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to