[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohammad Arshad reassigned ZOOKEEPER-2775: ------------------------------------------ Assignee: Mohammad Arshad > ZK Client not able to connect with Xid out of order error > ---------------------------------------------------------- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Reporter: Bhupendra Kumar Jain > Assignee: Mohammad Arshad > Priority: Critical > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v6.3.15#6346)