[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mutu updated ZOOKEEPER-4817:
----------------------------
    Attachment:     (was: system-20s.log)

> Client disconnection warning is missed in system log sometimes.
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4817
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4817
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.10.0
>            Reporter: mutu
>            Priority: Major
>         Attachments: system1_20s.log, system1_25s.log, system2_20s.log, 
> system2_25s.log, system3_20s.log, system3_25s.log
>
>
> Recently, we encounter an confused issue. The client disconnection warning 
> disappears in system log. However, sometimes, this message appears in system 
> log. There is a cluster consisting of three node. A client sends many 
> creation requests and then read the node created by the first request. The 
> client read operation failed due to missing node. We watch the system log. 
> Sometimes, there is a client disconnection warning. Sometimes, there is not. 
> This incomplete system log mislead client judgement on the problem.
> After investigating, when NIOServerCnxn.doIO is stuck in any IO point in this 
> function and the stuck time exceeds 20s, the client disconnection warning 
> will disappear. If the stuck time is less than 20s, the client disconnection 
> warning will appear in system log. 
> We find that the root cause is that selectorThread is set as the daemon 
> thread. When one node encounter the fail-slow nic, the client disconnects 
> with the node. If the NIOServerCnxn.doIO is stuck and the stuck time exceeds 
> 20s, the corresponding selectorThread will be killed by JVM. Hence, the 
> client disconnection warning is missed.
> Are there any comments to figure out this issues and improve the 
> diagnosability of ZooKeeper? I will very appreciate them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to