[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015523#comment-16015523
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2775:
-------------------------------------------

Github user rakeshadr commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/254#discussion_r117207576
  
    --- Diff: src/java/main/org/apache/zookeeper/ClientCnxn.java ---
    @@ -1054,6 +1054,8 @@ private void sendPing() {
             private boolean saslLoginFailed = false;
     
             private void startConnect() throws IOException {
    +            // initializing it for new connection
    +            saslLoginFailed = false;
    --- End diff --
    
    Thanks @arshadmohammad  for the details.
    
    yes, only `SendThread` is updating the flag. But, during sasl login retries 
period, the flag status will be checked by `tunnelAuthInProgress()` packet 
processing thread, so multiple threads are accessing the flag. The code looks 
little tricky and `zooKeeperSaslClient `null value represents auth in progress. 
I'm almost OK with the change and trying another attempt to avoid any 
compatibility issues to the users as this would go to stable branches:-). 
    
    Earlier the behavior was, once the flag updated to flase, 
`tunnelAuthInProgress` function would return false always. Now, with the 
proposed fix, sometimes it would return false and sometimes it would return 
true, right? Will this results in any consistency issues later?
    
    Assume  a case, where successful login takes several retries.
    (1) Immediately after the login failure the flag will be false. During this 
time `tunnelAuthInProgress() ` function returns false to the callers.
    (2) Assume, `startConnect()` retries started. During this time, 
`tunnelAuthInProgress() ` function returns true to the callers.
    
    My previous suggestion was to avoid this situation and consistently 
`tunnelAuthInProgress()` function return false until the next successful login. 
Does this makes sense to you?
    
    @hanm, welcome your thoughts. Thanks!


> ZK Client not able to connect with Xid out of order error 
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2775
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.4.10, 3.5.3, 3.6.0
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Mohammad Arshad
>            Priority: Critical
>         Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid 
> out of order and Nothing in the queue error continously. And ZK client it 
> finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 
> for a packet with details: clientPath:null serverPath:null finished:false 
> header:: 53,101  replyHeader:: 0,0,-4  request:: 
> 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}
>   response:: null
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
>       at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>       at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
>       at 
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
>       at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
>       at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
>       
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable 
> problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | 
> SASL configuration failed: javax.security.auth.login.LoginException: Network 
> is unreachable (sendto failed) Will continue connection to Zookeeper server 
> without SASL authentication, if Zookeeper server allows it. | 
> org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
>       Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully 
> able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during 
> this time no user request will be sent. ( As the socket channel will be 
> closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client 
> and client assumes that tunnelAuthInProgress() is finished ( method checks 
> for saslLoginFailed boolean Since the boolean is true it assumes its done.) 
> and tries to process the packet as a other packet and will result in above 
> errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to