[ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340172#comment-16340172 ]
ASF GitHub Bot commented on ZOOKEEPER-2775: ------------------------------------------- GitHub user jiajunwang opened a pull request: https://github.com/apache/helix/pull/131 Bump up ZOOKEEPER version to 3.4.11. There is a zk connection related bug (ZOOKEEPER-2775) fixed in 3.4.11. Bump up version to get the fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiajunwang/helix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 ---- commit 22cc4a4d4819b211b30094de7b3b0944cb5c033b Author: jiajunwang <ericwang1985@...> Date: 2018-01-25T22:04:13Z Bump up ZOOKEEPER version to 3.4.11. There is a zk connection related bug (ZOOKEEPER-2775) fixed in 3.4.11. Bump up version to get the fix. ---- > ZK Client not able to connect with Xid out of order error > ---------------------------------------------------------- > > Key: ZOOKEEPER-2775 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.4.10, 3.5.3, 3.6.0 > Reporter: Bhupendra Kumar Jain > Assignee: Mohammad Arshad > Priority: Critical > Fix For: 3.4.11, 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2775-01.patch > > > During Network unreachable scenario in one of the cluster, we observed Xid > out of order and Nothing in the queue error continously. And ZK client it > finally not able to connect successully to ZK server. > *Logs:* > unexpected error, closing socket connection and attempting reconnect | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) > java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 > for a packet with details: clientPath:null serverPath:null finished:false > header:: 53,101 replyHeader:: 0,0,-4 request:: > 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes} > response:: null > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > unexpected error, closing socket connection and attempting reconnect > java.io.IOException: Nothing in the queue, but got 1 > at > org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426) > > *Analysis:* > 1) First time Client fails to do SASL login due to network unreachable > problem. > 2017-03-29 10:03:59,377 | WARN | [main-SendThread(192.168.130.8:24002)] | > SASL configuration failed: javax.security.auth.login.LoginException: Network > is unreachable (sendto failed) Will continue connection to Zookeeper server > without SASL authentication, if Zookeeper server allows it. | > org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) > Here the boolean saslLoginFailed becomes true. > 2) After some time network connection is recovered and client is successully > able to login but still the boolean saslLoginFailed is not reset to false. > 3) Now SASL negotiation between client and server start happening and during > this time no user request will be sent. ( As the socket channel will be > closed for write till sasl negotiation complets) > 4) Now response from server for SASL packet will be processed by the client > and client assumes that tunnelAuthInProgress() is finished ( method checks > for saslLoginFailed boolean Since the boolean is true it assumes its done.) > and tries to process the packet as a other packet and will result in above > errors. > *Solution:* Reset the saslLoginFailed boolean every time before client login -- This message was sent by Atlassian JIRA (v7.6.3#76005)