[ https://issues.apache.org/jira/browse/ZOOKEEPER-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541588#comment-14541588 ]
Rakesh R commented on ZOOKEEPER-2188: ------------------------------------- [~haitao-tony], IIUC {{zkclient#isAlive}} is used to see the client is dead or not. In your case, the cluster is down. Now, when a client tries to connect to the server it fails to get a socket connection and it will continue retrying infinitely to establish a connection. This means the client is alive to establish a connection once the ZK quorum is available. There is a way to get out of this infinite loop, but it should be implemented through an application thread, here this thread would do a connection time out logic using {{zk.getState().isConnected();}} status or a connection watcher event of {{Event#SyncConnected}}/{{Event#SaslAuthenticated}}. Does this satisfy your case? > client connection hung up because of dead loop > ----------------------------------------------- > > Key: ZOOKEEPER-2188 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2188 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.5.0 > Reporter: sunhaitao > > There is something wrong with the client code ClientCnxn.java, it will keep > trying to connect to server in a dead loop. > This is my test step, shut down zookeeper cluster, exectue zkCli.sh script to > connect to zookeeper cluster, it will keep trying to connect to zookeeper > server without stop. > public void run() { > clientCnxnSocket.introduce(this, sessionId, outgoingQueue); > clientCnxnSocket.updateNow(); > clientCnxnSocket.updateLastSendAndHeard(); > int to; > long lastPingRwServer = Time.currentElapsedTime(); > final int MAX_SEND_PING_INTERVAL = 10000; //10 seconds > while (state.isAlive()) { > try { > if (!clientCnxnSocket.isConnected()) { > // don't re-establish connection if we are closing > if (closing) { > break; > } > startConnect(); > clientCnxnSocket.updateLastSendAndHeard(); > } > public boolean isAlive() { > return this != CLOSED && this != AUTH_FAILED; > } > because at the beginning it is CONNECTING so isAlive always returns true, > which leads to dead loop. > we should add some retry limit to stop this -- This message was sent by Atlassian JIRA (v6.3.4#6332)