[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r976161611 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -586,6 +608,9 @@ private void checkLeadership(List children) throws Exception final String localOurPath = ourPath.get(); List sortedChildren = LockInternals.getSortedChildren(LOCK_NAME, sorter, children); int ourIndex = (localOurPath != null) ? sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1; + +log.debug("checkLeadership with id: {},ourPath: {}, children: {}", id, localOurPath, sortedChildren); Review Comment: ```suggestion log.debug("checkLeadership with id: {}, ourPath: {}, children: {}", id, localOurPath, sortedChildren); ``` nit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r974062828 ## curator-test-zk35/pom.xml: ## @@ -166,6 +166,12 @@ slf4j-log4j12 test + + +org.awaitility +awaitility +test + Review Comment: I see, sorry that I missed that :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r973954972 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -190,6 +190,12 @@ public void close() throws IOException close(closeMode); } +// for testing +void closeOnDemand() throws IOException +{ +internalClose(closeMode, false); +} Review Comment: For now, this is a test-only method. I'm a bit reluctant to add test-related code in production code rather than providing a utilily class instead (to be fair: we do that in curator code in other locations like the `debug*CountDownLatches` in the `LeaderLatch` implementation). I see a risk if convenient methods like this are available in production: They might make it easier to write less-consistent production code. :thinking: ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -190,6 +190,12 @@ public void close() throws IOException close(closeMode); } +// for testing Review Comment: ```suggestion @VisibleForTesting ``` Isn't this annotation the better way of documenting this here? ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -190,6 +190,12 @@ public void close() throws IOException close(closeMode); } +// for testing +void closeOnDemand() throws IOException +{ +internalClose(closeMode, false); +} Review Comment: ...but if you want to stick to it, I would vote for coming up with a more descriptive name for it: Something like `closeIfStillRunning` ## curator-test-zk35/pom.xml: ## @@ -166,6 +166,12 @@ slf4j-log4j12 test + + +org.awaitility +awaitility +test + Review Comment: Why is this change necessary? ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -198,9 +204,25 @@ public void close() throws IOException * @param closeMode allows the default close mode to be overridden at the time the latch is closed. * @throws IOException errors */ -public synchronized void close(CloseMode closeMode) throws IOException +public void close(CloseMode closeMode) throws IOException { -Preconditions.checkState(state.compareAndSet(State.STARTED, State.CLOSED), "Already closed or has not been started"); +internalClose(closeMode, true); +} + +private synchronized void internalClose(CloseMode closeMode, boolean failOnClosed) throws IOException +{ +if (!state.compareAndSet(State.STARTED, State.CLOSED)) +{ +if (failOnClosed) +{ +throw new IllegalStateException("Already closed or has not been started"); +} +else +{ +return; +} Review Comment: ```suggestion return; ``` nit: Is the `else` branch required based on code guidelines? If not, I'd propose removing it to shorten the code. The code format (having opening brackets on a new line) makes the code long, anyway already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r970793303 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState) { try { -if ( client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) || !hasLeadership.get() ) +if ( client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) ) { -reset(); +getChildren(); Review Comment: Ok, after revisiting your response in [that thread earlier in this PR](https://github.com/apache/curator/pull/430#discussion_r967852427), I notice again that `getChildren` is the proper way of doing it: * either the corresponding child node is still around: Then we would pick up the leadership again. * if it's not around, `reset()` will be called, anyway Sorry for the confusion. ...to many threads -.- -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r970541350 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -586,6 +586,9 @@ private void checkLeadership(List children) throws Exception final String localOurPath = ourPath.get(); List sortedChildren = LockInternals.getSortedChildren(LOCK_NAME, sorter, children); int ourIndex = (localOurPath != null) ? sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1; + +log.debug("checkLeadership with ourPath: {}, children: {}", localOurPath, sortedChildren); Review Comment: ```suggestion log.debug("[{}] checkLeadership with ourPath: {}, children: {}", id, localOurPath, sortedChildren); ``` ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances are joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); + +// no extra CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 +latchCandidate1.start(); +timing.sleepABit(); Review Comment: You need to add `Awaitility` to the pom of the `curator-recipes` module: ``` org.awaitility awaitility test ``` ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances are joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); + +// no extra CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 +latchCandidate1.start(); +timing.sleepABit(); Review Comment: ```suggestion // we want to make sure that the leader gets leadership before other instances are going to join the party waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); // candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r970539512 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: I found a better solution for that by introducing another ZK client that is used for monitoring the children. That way, we don't end up being blocked on concurrent calls in the actual test. I'm gonna propose the change in a separate review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r970539512 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: I found a better solution for that by introducing another ZK client that is used for monitoring the children. That way, we don't end up being blocked on concurrent calls in the actual test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command that is triggered right before each sleep. I tried to add a `waitForCondition` instead, when coming up with this test, initially, that would wait for corresponding child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is blocking the thread that executes the `CuratorFrameworkImpl#backgroundOperations` queue which makes any subsequent operation (including the check for children nodes) on the same client being blocked as well. I hoped that somebody else could come up with a better approach here. :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command that is triggered right before each sleep. I tried to add a `waitForCondition` instead, when coming up with this test, initially, that would wait for corresponding child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is executed in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969546342 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: There's a proposal to work around that issue in [the comment below](https://github.com/apache/curator/pull/430/files#r969545913) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969545913 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); + +// no extract CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 +latchCandidate1.start(); +timing.sleepABit(); Review Comment: ```suggestion assertNull(latchCandidate0.getOurPath()); assertNull(latchCandidate1.getOurPath()); latchCandidate0.start(); latchCandidate1.start(); Awaitility.await("Wait for the candidates to have a path assigned to them.") .pollInterval(Duration.ofMillis(100)) .pollInSameThread() .until(() -> latchCandidate0.getOurPath() != null && latchCandidate1.getOurPath() != null); ``` Would that work? The `LeaderLatch#ourPath` is set after the child is created. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command that is triggered right before each sleep. I tried to add a `waitForCondition` instead, when coming up with this test, initially, that would wait for corresponding child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is executed in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :innocent: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command that is triggered right before each sleep. I tried to add a `waitForCondition` instead, when coming up with this test first, that would wait for the child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :innocent: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969519573 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in [line 248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248) because they are a cause for instabilities: The `close` in [line 251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251) has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command. I tried to add a `waitForCondition` instead, when coming up with this test first, that would wait for the child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :innocent: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r969507748 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party Review Comment: ```suggestion // we want to make sure that the leader gets leadership before other instances are joining the party ``` and another typo ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); + +// candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 +latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); + +latchCandidate0.start(); +timing.sleepABit(); Review Comment: tbh, I'm not really happy with the sleep here and in line 248 because they are a cause for instabilities: The `close` in line 251 has to happen after the child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot ensure that with the sleep calls due to the asynchronous nature of the `start` command. I tried to add a `waitForCondition` instead, when coming up with this test first, that would wait for the child to be created. Unfortunately, this resulted in the test blocking forever because (I guess) the await on the `latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which makes any subsequent operation (including the check for children nodes) being blocked. I hoped that somebody else could come up with a better approach here. :innocent: ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception +{ +final String latchPath = "/foo/bar"; +final Timing2 timing = new Timing2(); +try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) +{ +client.start(); +LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); +LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); +LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); + +try +{ +latchInitialLeader.start(); + +// we want to make sure that the leader gets leadership before other instances joining the party +waitForALeader(Collections.singletonList(latchInitialLe
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958456254 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -717,6 +720,7 @@ else if ( !oldValue && newValue ) private void setNode(String newValue) throws Exception { String oldPath = ourPath.getAndSet(newValue); +log.debug("setNode with oldPath: {}, newValue: {}", oldPath, newValue); Review Comment: ```suggestion log.debug("setNode in {}, with oldPath: {}, newValue: {}", id, oldPath, newValue); ``` Adding the ID helps when debugging tests that run multiple `LeaderLatch` instances locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958163715 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test (I updated the test a bit): ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); try { latchInitialLeader.start(); // we want to make sure that the leader gets leadership before other instances joining the party waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); // candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); latchCandidate0.start(); timing.sleepABit(); // no extract CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 latchCandidate1.start(); timing.sleepABit(); // triggers the removal of the corresponding child node after candidate #0 retrieved the children latchInitialLeader.close(); latchCandidate0.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latchCandidate0, latchCandidate1), new Timing()); assertTrue(latchCandidate0.hasLeadership() ^ latchCandidate1.hasLeadership()); } finally { for (LeaderLatch latchToClose : Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1)) { if ( latchToClose.getState() != LeaderLatch.State.CLOSED ) { latchToClose.close(); } } } } ``` This test case describes the situation for [CURATOR-645](https://issues.apache.org/jira/browse/CURATOR-645). I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958163715 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test (I updated the test a bit): ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); try { latchInitialLeader.start(); // we want to make sure that the leader gets leadership before other instances joining the party waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); // candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); latchCandidate0.start(); timing.sleepABit(); // no extract CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 latchCandidate1.start(); timing.sleepABit(); // triggers the removal of the corresponding child node after candidate #0 retrieved the children latchInitialLeader.close(); latchCandidate0.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latchCandidate0, latchCandidate1), new Timing()); assertTrue(latchCandidate0.hasLeadership() ^ latchCandidate1.hasLeadership()); } finally { for (LeaderLatch latchToClose : Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1)) { if ( latchToClose.getState() != LeaderLatch.State.CLOSED ) { latchToClose.close(); } } } } ``` This test case describes the situation for CURATOR-645. I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. This covers the use-case that made me identify CURATOR-645. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958163715 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test (I updated the test a bit) ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latchInitialLeader = new LeaderLatch(client, latchPath, "initial-leader"); LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, "candidate-0"); LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, "candidate-1"); try { latchInitialLeader.start(); // we want to make sure that the leader gets leadership before other instances joining the party waitForALeader(Collections.singletonList(latchInitialLeader), new Timing()); // candidate #0 will wait for the leader to go away - this should happen after the child nodes are retrieved by candidate #0 latchCandidate0.debugCheckLeaderShipLatch = new CountDownLatch(1); latchCandidate0.start(); timing.sleepABit(); // no extract CountDownLatch needs to be set here because candidate #1 will rely on candidate #0 latchCandidate1.start(); timing.sleepABit(); // triggers the removal of the corresponding child node after candidate #0 retrieved the children latchInitialLeader.close(); latchCandidate0.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latchCandidate0, latchCandidate1), new Timing()); assertTrue(latchCandidate0.hasLeadership() ^ latchCandidate1.hasLeadership()); } finally { for (LeaderLatch latchToClose : Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1)) { if ( latchToClose.getState() != LeaderLatch.State.CLOSED ) { latchToClose.close(); } } } } ``` This test case describes the situation for CURATOR-645. I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958163715 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test: ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0"); latch0.start(); waitForALeader(Collections.singletonList(latch0), new Timing()); LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1"); LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2"); latch1.debugCheckLeaderShipLatch = new CountDownLatch(1); latch1.start(); timing.sleepABit(); latch2.start(); timing.sleepABit(); latch0.close(); latch1.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latch1, latch2), new Timing()); latch1.close(); latch2.close(); } } ``` This test case describes the situation for CURATOR-645. I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test: ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0"); latch0.start(); waitForALeader(Collections.singletonList(latch0), new Timing()); LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1"); LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2"); latch1.debugCheckLeaderShipLatch = new CountDownLatch(1); latch1.start(); timing.sleepABit(); latch2.start(); timing.sleepABit(); latch0.close(); latch1.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latch1, latch2), new Timing()); latch1.close(); latch2.close(); } } ``` This test case describes the situation for CURATOR-645. I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r958163715 ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: What do you think of the following test: ``` @Test public void testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws Exception { final String latchPath = "/foo/bar"; final Timing2 timing = new Timing2(); try (CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), timing.connection(), new RetryOneTime(1))) { client.start(); LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0"); latch0.start(); waitForALeader(Collections.singletonList(latch0), new Timing()); LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1"); LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2"); latch1.debugCheckLeaderShipLatch = new CountDownLatch(1); latch1.start(); timing.sleepABit(); latch2.start(); timing.sleepABit(); latch0.close(); latch1.debugCheckLeaderShipLatch.countDown(); waitForALeader(Arrays.asList(latch1, latch2), new Timing()); latch1.close(); latch2.close(); } } ``` This test case describes the situation for CURATOR-645. I checked in a local test run, that it will run forever when still using `reset()` in [LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633) but would succeed when using `getChildren()`, instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch
XComp commented on code in PR #430: URL: https://github.com/apache/curator/pull/430#discussion_r957341820 ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState) { try { -if ( client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) || !hasLeadership.get() ) Review Comment: I'm wondering whether we should remove this `if` entirely: Let's assume the current instance doesn't have the leadership but watches the child node referring to the current leader. If the connection was temporarily suspended it could be that the actual leader lost its leadership in the meantime. In that case, we want the current instance to check the leadership. :thinking: ## curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java: ## @@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws Exception } } +@Test +public void testOurPathDeletedOnReconnect() throws Exception Review Comment: The test is not suitable for the change. I reverted the change and ran the test. The test still succeeded. ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState) { try { -if ( client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) || !hasLeadership.get() ) Review Comment: But thinking about it once more, this might not be an issue because the current LeaderLatch would have a watcher on the current leader's node which would have been triggered when the leader's child would have been deleted. :thinking: ## curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java: ## @@ -604,7 +604,7 @@ else if ( ourIndex == 0 ) @Override public void process(WatchedEvent event) { -if ( (state.get() == State.STARTED) && (event.getType() == Event.EventType.NodeDeleted) && (localOurPath != null) ) Review Comment: Good point. Due to the condition in [LeaderLatch:588](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L588), `ourIndex` must be `-1` if `localOurPath` is actually `null` and, therefore, would prevent the execution from reaching the `else` branch (instead the if condition in [LeaderLatch:592](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L592) applies) where the watcher is set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org