[ https://issues.apache.org/jira/browse/ZOOKEEPER-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155715#comment-17155715 ]
Aishwarya Soni commented on ZOOKEEPER-3828: ------------------------------------------- Upgrading to a major bug as now multiple people are getting affected and its a blocker for version upgrades and HA. > zookeeper clients gets connection timeout when the leader node is restarted > --------------------------------------------------------------------------- > > Key: ZOOKEEPER-3828 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3828 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.6.1, 3.5.8 > Reporter: Aishwarya Soni > Priority: Major > Attachments: debug_logs.zip, node1.txt, node2.txt, node3.txt, > node4.txt, node5.txt > > > I have configured 5 nodes zookeeper cluster using 3.6.1 version in a docker > containerized environment. As a part of some destructive testing, I restarted > zookeeper leader. Now, re-election happened and all 5 nodes (containers) are > back in good state with new leader. But when I login to one of the container > and go inside zk Cli (./zkCli.sh) and run the cmd *ls /* I see below error, > {color:#000000} {color} > *{color:#000000}[zk: localhost:2181(CONNECTING) 1]{color}* > *{color:#000000}[zk: localhost:2181(CONNECTING) 1] ls /{color}* > *{color:#000000}2020-05-14 23:48:26,556 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1229] - Client session > timed out, have not heard from server in 30001ms for session id 0x0{color}* > *{color:#000000}2020-05-14 23:48:26,556 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1272] - Session 0x0 > for sever localhost/127.0.0.1:2181, Closing socket connection. Attempting > reconnect except it is a SessionExpiredException.{color}* > *{color:#000000}org.apache.zookeeper.ClientCnxn$SessionTimeoutException: > Client session timed out, have not heard from server in 30001ms for session > id 0x0{color}* > *{color:#000000}at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230){color}* > *{color:#000000}KeeperErrorCode = ConnectionLoss for /{color}* > *{color:#000000}[zk: localhost:2181(CONNECTING) 2] 2020-05-14 23:48:28,089 > [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1154] - Opening socket > connection to server localhost/127.0.0.1:2181.{color}* > *{color:#000000}2020-05-14 23:48:28,089 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1156] - SASL config > status: Will not attempt to authenticate using SASL (unknown error){color}* > *{color:#000000}2020-05-14 23:48:28,090 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@986] - Socket > connection established, initiating session, client: /127.0.0.1:60384, server: > localhost/127.0.0.1:2181{color}* > *{color:#000000}2020-05-14 23:48:58,119 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1229] - Client session > timed out, have not heard from server in 30030ms for session id 0x0{color}* > *{color:#000000}2020-05-14 23:48:58,120 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1272] - Session 0x0 > for sever localhost/127.0.0.1:2181, Closing socket connection. Attempting > reconnect except it is a SessionExpiredException.{color}* > *{color:#000000}org.apache.zookeeper.ClientCnxn$SessionTimeoutException: > Client session timed out, have not heard from server in 30030ms for session > id 0x0{color}* > *{color:#000000}at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230){color}* > *{color:#000000}2020-05-14 23:49:00,003 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1154] - Opening socket > connection to server localhost/127.0.0.1:2181.{color}* > *{color:#000000}2020-05-14 23:49:00,004 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1156] - SASL config > status: Will not attempt to authenticate using SASL (unknown error){color}* > *{color:#000000}2020-05-14 23:49:00,004 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@986] - Socket > connection established, initiating session, client: /127.0.0.1:32936, server: > localhost/127.0.0.1:2181{color}* > *{color:#000000}2020-05-14 23:49:30,032 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1229] - Client session > timed out, have not heard from server in 30029ms for session id 0x0{color}* > *{color:#000000}2020-05-14 23:49:30,033 [myid:localhost:2181] - WARN > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1272] - Session 0x0 > for sever localhost/127.0.0.1:2181, Closing socket connection. Attempting > reconnect except it is a SessionExpiredException.{color}* > *{color:#000000}org.apache.zookeeper.ClientCnxn$SessionTimeoutException: > Client session timed out, have not heard from server in 30029ms for session > id 0x0{color}* > *{color:#000000}at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230){color}* > *{color:#000000}2020-05-14 23:49:31,230 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1154] - Opening socket > connection to server localhost/127.0.0.1:2181.{color}* > *{color:#000000}2020-05-14 23:49:31,230 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@1156] - SASL config > status: Will not attempt to authenticate using SASL (unknown error){color}* > *{color:#000000}2020-05-14 23:49:31,230 [myid:localhost:2181] - INFO > [main-SendThread(localhost:2181):ClientCnxn$SendThread@986] - Socket > connection established, initiating session, client: /127.0.0.1:33766, server: > localhost/127.0.0.1:2181{color}* > {color:#000000}Does anyone know what could possibly be wrong? For reference: > https://issues.apache.org/jira/browse/ZOOKEEPER-2164{color} > This behavior is observed on all the nodes when the leader is restarted. All > is good when a follower is restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005)