[ 
https://issues.apache.org/jira/browse/CURATOR-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874656#comment-13874656
 ] 

Evaristo commented on CURATOR-72:
---------------------------------

Hi there:

I have been comparing code in CURATOR-72 (current code) and CURATOR 2.1.0, and 
these are my conclussions:

- In both releases  there is race condition between background and foreground 
errors  and KeeperState events that can break the ConnectionState listener 
statusm (as demonstrated by attached tests
- CURATOR-72 (I think from Curator 2.3.0) there is an issue and the number of 
retries in some cases is huge and out of control and therefore the previous 
race condition is much more evident. (Can be also seen with attached tests 
using LeaderElection)

So in fact there are 2 issues here. The first one (the race condition I think 
was already there before 2.3) and the second one is new.

On the other hand, I suggest to have a clear defintion of the different values 
for ConnectionState and when they are triggered, because implementations in 
CUARTOR 2.1 and CURATOR 72 are different and in my opinion they follow a 
different specification (e.g. how ConnectionLoss exceptions are managed)

In CURATOR-72 this is the logic:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState 
events)
- SUSPENDED can be triggered by KeeperState.Disconnected event or by a 
ConnectionLoss or OperationTimeout operation (whatever happens first)
- LOST can be triggered directly by KeeperState.Expired and SessionExpired 
exceptions and when received a ConnectionLost or OperationTimeout a background 
operation is started to check if it is possible to connect to other ZK server 
and in that case LOST is also triggered

In CURATOR-2.1.0 the logic is:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState 
events)
- SUSPENDED can be only triggered by KeeperState.Disconnected
- LOST is triggered by any ConnectionLoss exception

I am trying to provide a patch for the code but I am struggling, but if we 
agree on definition I can help with test cases








> Background operations don't wait for connection timeout
> -------------------------------------------------------
>
>                 Key: CURATOR-72
>                 URL: https://issues.apache.org/jira/browse/CURATOR-72
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.3.0
>            Reporter: Evaristo Camarero
>            Assignee: Jordan Zimmerman
>             Fix For: 2.4.0
>
>         Attachments: TestListener.java, TestListenerConnectedAtStart.java, 
> TestListenerSequence.java, TestListenerWithLeaderSelector.java, 
> TestListenerWithLeaderSelectorBis.java, test.java
>
>
> Background operations don't wait for the configured connection timeout before 
> failing. Attached test shows the problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to