XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573
##########
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##########
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws
Exception
}
}
+ @Test
+ public void
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws
Exception
+ {
+ final String latchPath = "/foo/bar";
+ final Timing2 timing = new Timing2();
+ try (CuratorFramework client =
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(),
timing.connection(), new RetryOneTime(1)))
+ {
+ client.start();
+ LeaderLatch latchInitialLeader = new LeaderLatch(client,
latchPath, "initial-leader");
+ LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath,
"candidate-0");
+ LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath,
"candidate-1");
+
+ try
+ {
+ latchInitialLeader.start();
+
+ // we want to make sure that the leader gets leadership before
other instances joining the party
+ waitForALeader(Collections.singletonList(latchInitialLeader),
new Timing());
+
+ // candidate #0 will wait for the leader to go away - this
should happen after the child nodes are retrieved by candidate #0
+ latchCandidate0.debugCheckLeaderShipLatch = new
CountDownLatch(1);
+
+ latchCandidate0.start();
+ timing.sleepABit();
Review Comment:
tbh, I'm not really happy with the sleep here and in [line
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
because they are a cause for instabilities: The `close` in [line
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
has to happen after the child nodes for `candidate #0` and `candidate #1` are
created. AFAIU, we cannot ensure that with the sleep calls due to the
asynchronous nature of the `start` command that is triggered right before each
sleep.
I tried to add a `waitForCondition` instead, when coming up with this test
first, that would wait for the child to be created. Unfortunately, this
resulted in the test blocking forever because (I guess) the await on the
`latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which
makes any subsequent operation (including the check for children nodes) being
blocked.
I hoped that somebody else could come up with a better approach here.
:innocent:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]