XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573
########## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ########## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } + @Test + public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception + { + final String latchPath = "/foo/bar"; + final Timing2 timing = new Timing2(); + try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) + { + client.start(); + LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); + LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); + LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + + try + { + latchInitialLeader.start(); + + // we want to make sure that the leader gets leadership before other instances joining the party + waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + + // candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 + latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + + latchCandidate0.start(); + timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command that is triggered right before each sleep. I tried to add a `waitForCondition` instead, when coming up with this test, initially, that would wait for corresponding child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is executed in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org