XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970541350
##########
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##########
@@ -586,6 +586,9 @@ private void checkLeadership(List<String> children) throws
Exception
final String localOurPath = ourPath.get();
List<String> sortedChildren =
LockInternals.getSortedChildren(LOCK_NAME, sorter, children);
int ourIndex = (localOurPath != null) ?
sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1;
+
+ log.debug("checkLeadership with ourPath: {}, children: {}",
localOurPath, sortedChildren);
Review Comment:
```suggestion
log.debug("[{}] checkLeadership with ourPath: {}, children: {}", id,
localOurPath, sortedChildren);
```
##########
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##########
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws
Exception
}
}
+ @Test
+ public void
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws
Exception
+ {
+ final String latchPath = "/foo/bar";
+ final Timing2 timing = new Timing2();
+ try (CuratorFramework client =
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(),
timing.connection(), new RetryOneTime(1)))
+ {
+ client.start();
+ LeaderLatch latchInitialLeader = new LeaderLatch(client,
latchPath, "initial-leader");
+ LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath,
"candidate-0");
+ LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath,
"candidate-1");
+
+ try
+ {
+ latchInitialLeader.start();
+
+ // we want to make sure that the leader gets leadership before
other instances are joining the party
+ waitForALeader(Collections.singletonList(latchInitialLeader),
new Timing());
+
+ // candidate #0 will wait for the leader to go away - this
should happen after the child nodes are retrieved by candidate #0
+ latchCandidate0.debugCheckLeaderShipLatch = new
CountDownLatch(1);
+
+ latchCandidate0.start();
+ timing.sleepABit();
+
+ // no extra CountDownLatch needs to be set here because
candidate #1 will rely on candidate #0
+ latchCandidate1.start();
+ timing.sleepABit();
Review Comment:
You need to add `Awaitility` to the pom of the `curator-recipes` module:
```
<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<scope>test</scope>
</dependency>
```
##########
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##########
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws
Exception
}
}
+ @Test
+ public void
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws
Exception
+ {
+ final String latchPath = "/foo/bar";
+ final Timing2 timing = new Timing2();
+ try (CuratorFramework client =
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(),
timing.connection(), new RetryOneTime(1)))
+ {
+ client.start();
+ LeaderLatch latchInitialLeader = new LeaderLatch(client,
latchPath, "initial-leader");
+ LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath,
"candidate-0");
+ LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath,
"candidate-1");
+
+ try
+ {
+ latchInitialLeader.start();
+
+ // we want to make sure that the leader gets leadership before
other instances are joining the party
+ waitForALeader(Collections.singletonList(latchInitialLeader),
new Timing());
+
+ // candidate #0 will wait for the leader to go away - this
should happen after the child nodes are retrieved by candidate #0
+ latchCandidate0.debugCheckLeaderShipLatch = new
CountDownLatch(1);
+
+ latchCandidate0.start();
+ timing.sleepABit();
+
+ // no extra CountDownLatch needs to be set here because
candidate #1 will rely on candidate #0
+ latchCandidate1.start();
+ timing.sleepABit();
Review Comment:
```suggestion
// we want to make sure that the leader gets leadership
before other instances are going to join the party
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
// candidate #0 will wait for the leader to go away - this
should happen after the child nodes are retrieved by candidate #0
latchCandidate0.debugCheckLeaderShipLatch = new
CountDownLatch(1);
latchCandidate0.start();
final int expectedChildrenAfterCandidate0Joins = 2;
Awaitility.await("There should be " +
expectedChildrenAfterCandidate0Joins + " child nodes created after candidate #0
joins the leader election.")
.pollInterval(Duration.ofMillis(100))
.pollInSameThread()
.until(() ->
client.getChildren().forPath(latchPath).size() ==
expectedChildrenAfterCandidate0Joins);
// no extra CountDownLatch needs to be set here because
candidate #1 will rely on candidate #0
latchCandidate1.start();
final int expectedChildrenAfterCandidate1Joins = 3;
Awaitility.await("There should be " +
expectedChildrenAfterCandidate1Joins + " child nodes created after candidate #1
joins the leader election.")
.pollInterval(Duration.ofMillis(100))
.pollInSameThread()
.until(() ->
client.getChildren().forPath(latchPath).size() ==
expectedChildrenAfterCandidate1Joins);
```
Initially, I played around with checking the `ourPath` of each `LeaderLatch`
instance. This didn't work, because that field is set in a `BackgroundCallback`
(see
[LeaderLatch:540](https://github.com/apache/curator/blob/2775855fc80fdea8433d5dedbb746f58ddd443c4/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L540)).
The background tasks are handled in the `EventThread` of the ZK client. That
one is already blocked by the `CountdownLatch` of candidate #0.
But we could still use the `SendThread` which is used for outgoing commands.
That's why calling `.getChildren` on the client itself still works and is
sufficient enough to verify that the child nodes are actually created.
##########
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##########
@@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState)
{
try
{
- if (
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED)
|| !hasLeadership.get() )
+ if (
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) )
{
- reset();
+ getChildren();
Review Comment:
I'm not convinced anymore that this change is correct. Here, we're covering
the case where `ConnectionState.SUSPENDED` is considered an error state, i.e.
the leadership is lost if the connection was suspended. In that case, we should
do a reset because we're considering the leadership being lost already due to
the connection instability having caused an error.
Instead, we should add an `else` branch that does the `getChildren` call in
case of a reconnect and the policy is allowing reconnects after suspension.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]