[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928556#comment-15928556
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2722:
-------------------------------------------

Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/191
  
    >> if that creation is failing due to connection loss, shouldn't the places 
that check the watcher connection fail there instead of in your check?
    
    ConnectionLossException can happen *after* a connection between ZooKeeper 
client and server has been established, right? So having the check only in 
watcher is not enough. A pass in watcher does not guarantee 
ConnectionLossException will not occur in a later point in time. Imagine an 
extreme case where the a network partition happened between client / server 
after a session establishment - the client will first get a connected event, 
and watcher happily reports everything is fine, then subsequent operation (e.g. 
create) will fail with ConnectionLossException until the network healed. 
    
    >> I think it's worth understanding why we are getting a connection event 
in the watcher that should be waiting for connection, but still failing by not 
connecting, instead of fixing this with additional waiting.
    
    Yes I'd like to know what causes this though I had a hard time to reproduce 
this failure locally / in internal Jenkins. It is so far only reproducible in 
Apache Jenkins. I can add some loggings to capture more contexts when the 
failure happens on Apache Jenkins, but in that case the retry logic in create 
is still needed, unless we can prove it is not possible to get a 
ConnectionLossException after a session establishment.



> Flaky Test: 
> org.apache.zookeeper.test.ReadOnlyModeTest.testSessionEstablishment
> -------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2722
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2722
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: 3.4.9, 3.5.2
>            Reporter: Michael Han
>            Assignee: Michael Han
>              Labels: flaky, flaky-build, flaky-test
>             Fix For: 3.5.3, 3.6.0
>
>
> {noformat}
> Error Message
> KeeperErrorCode = ConnectionLoss for /test
> Stacktrace
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /test
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1423)
>       at 
> org.apache.zookeeper.test.ReadOnlyModeTest.testSessionEstablishment(ReadOnlyModeTest.java:238)
>       at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looks like we should retry before giving up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to