[
https://issues.apache.org/jira/browse/CURATOR-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874656#comment-13874656
]
Evaristo commented on CURATOR-72:
---------------------------------
Hi there:
I have been comparing code in CURATOR-72 (current code) and CURATOR 2.1.0, and
these are my conclussions:
- In both releases there is race condition between background and foreground
errors and KeeperState events that can break the ConnectionState listener
statusm (as demonstrated by attached tests
- CURATOR-72 (I think from Curator 2.3.0) there is an issue and the number of
retries in some cases is huge and out of control and therefore the previous
race condition is much more evident. (Can be also seen with attached tests
using LeaderElection)
So in fact there are 2 issues here. The first one (the race condition I think
was already there before 2.3) and the second one is new.
On the other hand, I suggest to have a clear defintion of the different values
for ConnectionState and when they are triggered, because implementations in
CUARTOR 2.1 and CURATOR 72 are different and in my opinion they follow a
different specification (e.g. how ConnectionLoss exceptions are managed)
In CURATOR-72 this is the logic:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState
events)
- SUSPENDED can be triggered by KeeperState.Disconnected event or by a
ConnectionLoss or OperationTimeout operation (whatever happens first)
- LOST can be triggered directly by KeeperState.Expired and SessionExpired
exceptions and when received a ConnectionLost or OperationTimeout a background
operation is started to check if it is possible to connect to other ZK server
and in that case LOST is also triggered
In CURATOR-2.1.0 the logic is:
- CONNECTED AND RECONNECTED are clear for me (they only depend on KeepState
events)
- SUSPENDED can be only triggered by KeeperState.Disconnected
- LOST is triggered by any ConnectionLoss exception
I am trying to provide a patch for the code but I am struggling, but if we
agree on definition I can help with test cases
> Background operations don't wait for connection timeout
> -------------------------------------------------------
>
> Key: CURATOR-72
> URL: https://issues.apache.org/jira/browse/CURATOR-72
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 2.3.0
> Reporter: Evaristo Camarero
> Assignee: Jordan Zimmerman
> Fix For: 2.4.0
>
> Attachments: TestListener.java, TestListenerConnectedAtStart.java,
> TestListenerSequence.java, TestListenerWithLeaderSelector.java,
> TestListenerWithLeaderSelectorBis.java, test.java
>
>
> Background operations don't wait for the configured connection timeout before
> failing. Attached test shows the problem.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)