[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-21 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r976161611


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -586,6 +608,9 @@ private void checkLeadership(List children) throws 
Exception
 final String localOurPath = ourPath.get();
 List sortedChildren = 
LockInternals.getSortedChildren(LOCK_NAME, sorter, children);
 int ourIndex = (localOurPath != null) ? 
sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1;
+
+log.debug("checkLeadership with id: {},ourPath: {}, children: {}", id, 
localOurPath, sortedChildren);

Review Comment:
   ```suggestion
   log.debug("checkLeadership with id: {}, ourPath: {}, children: {}", 
id, localOurPath, sortedChildren);
   ```
   nit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-19 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r974062828


##
curator-test-zk35/pom.xml:
##
@@ -166,6 +166,12 @@
 slf4j-log4j12
 test
 
+
+
+org.awaitility
+awaitility
+test
+

Review Comment:
   I see, sorry that I missed that :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-19 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r973954972


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -190,6 +190,12 @@ public void close() throws IOException
 close(closeMode);
 }
 
+// for testing
+void closeOnDemand() throws IOException
+{
+internalClose(closeMode, false);
+}

Review Comment:
   For now, this is a test-only method. I'm a bit reluctant to add test-related 
code in production code rather than providing a utilily class instead (to be 
fair: we do that in curator code in other locations like the 
`debug*CountDownLatches` in the `LeaderLatch` implementation). I see a risk if 
convenient methods like this are available in production: They might make it 
easier to write less-consistent production code. :thinking: 



##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -190,6 +190,12 @@ public void close() throws IOException
 close(closeMode);
 }
 
+// for testing

Review Comment:
   ```suggestion
   @VisibleForTesting
   ```
   Isn't this annotation the better way of documenting this here?



##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -190,6 +190,12 @@ public void close() throws IOException
 close(closeMode);
 }
 
+// for testing
+void closeOnDemand() throws IOException
+{
+internalClose(closeMode, false);
+}

Review Comment:
   ...but if you want to stick to it, I would vote for coming up with a more 
descriptive name for it: Something like `closeIfStillRunning`



##
curator-test-zk35/pom.xml:
##
@@ -166,6 +166,12 @@
 slf4j-log4j12
 test
 
+
+
+org.awaitility
+awaitility
+test
+

Review Comment:
   Why is this change necessary?



##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -198,9 +204,25 @@ public void close() throws IOException
  * @param closeMode allows the default close mode to be overridden at the 
time the latch is closed.
  * @throws IOException errors
  */
-public synchronized void close(CloseMode closeMode) throws IOException
+public void close(CloseMode closeMode) throws IOException
 {
-Preconditions.checkState(state.compareAndSet(State.STARTED, 
State.CLOSED), "Already closed or has not been started");
+internalClose(closeMode, true);
+}
+
+private synchronized void internalClose(CloseMode closeMode, boolean 
failOnClosed) throws IOException
+{
+if (!state.compareAndSet(State.STARTED, State.CLOSED))
+{
+if (failOnClosed)
+{
+throw new IllegalStateException("Already closed or has not 
been started");
+}
+else
+{
+return;
+}

Review Comment:
   ```suggestion
   return;
   ```
   nit: Is the `else` branch required based on code guidelines? If not, I'd 
propose removing it to shorten the code. The code format (having opening 
brackets on a new line) makes the code long, anyway already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-14 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970793303


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState)
 {
 try
 {
-if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) 
|| !hasLeadership.get() )
+if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) )
 {
-reset();
+getChildren();

Review Comment:
   Ok, after revisiting your response in [that thread earlier in this 
PR](https://github.com/apache/curator/pull/430#discussion_r967852427), I notice 
again that `getChildren` is the proper way of doing it:
   * either the corresponding child node is still around: Then we would pick up 
the leadership again.
   * if it's not around, `reset()` will be called, anyway
   
   Sorry for the confusion. ...to many threads -.-



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-14 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970541350


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -586,6 +586,9 @@ private void checkLeadership(List children) throws 
Exception
 final String localOurPath = ourPath.get();
 List sortedChildren = 
LockInternals.getSortedChildren(LOCK_NAME, sorter, children);
 int ourIndex = (localOurPath != null) ? 
sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1;
+
+log.debug("checkLeadership with ourPath: {}, children: {}", 
localOurPath, sortedChildren);

Review Comment:
   ```suggestion
   log.debug("[{}] checkLeadership with ourPath: {}, children: {}", id, 
localOurPath, sortedChildren);
   ```



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances are joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();
+
+// no extra CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
+latchCandidate1.start();
+timing.sleepABit();

Review Comment:
   You need to add `Awaitility` to the pom of the `curator-recipes` module:
   ```
   
   org.awaitility
   awaitility
   test
   
   ```



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances are joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();
+
+// no extra CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
+latchCandidate1.start();
+timing.sleepABit();

Review Comment:
   ```suggestion
   // we want to make sure that the leader gets leadership 
before other instances are going to join the party
   
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
   
   // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
   latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(

[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-14 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970539512


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   I found a better solution for that by introducing another ZK client that is 
used for monitoring the children. That way, we don't end up being blocked on 
concurrent calls in the actual test. I'm gonna propose the change in a separate 
review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-14 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970539512


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   I found a better solution for that by introducing another ZK client that is 
used for monitoring the children. That way, we don't end up being blocked on 
concurrent calls in the actual test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in [line 
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
 because they are a cause for instabilities: The `close` in [line 
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
 has to happen after the child nodes for `candidate #0` and `candidate #1` are 
created. AFAIU, we cannot ensure that with the sleep calls due to the 
asynchronous nature of the `start` command that is triggered right before each 
sleep.
   I tried to add a `waitForCondition` instead, when coming up with this test, 
initially, that would wait for corresponding child to be created. 
Unfortunately, this resulted in the test blocking forever because (I guess) the 
await on the `latchCandidate0.debugCheckLeaderShipLatch` is blocking the thread 
that executes the `CuratorFrameworkImpl#backgroundOperations` queue which makes 
any subsequent operation (including the check for children nodes) on the same 
client being blocked as well.
   
   I hoped that somebody else could come up with a better approach here. 
:thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in [line 
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
 because they are a cause for instabilities: The `close` in [line 
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
 has to happen after the child nodes for `candidate #0` and `candidate #1` are 
created. AFAIU, we cannot ensure that with the sleep calls due to the 
asynchronous nature of the `start` command that is triggered right before each 
sleep.
   I tried to add a `waitForCondition` instead, when coming up with this test, 
initially, that would wait for corresponding child to be created. 
Unfortunately, this resulted in the test blocking forever because (I guess) the 
await on the `latchCandidate0.debugCheckLeaderShipLatch` is executed in the 
main thread which makes any subsequent operation (including the check for 
children nodes) being blocked.
   
   I hoped that somebody else could come up with a better approach here. 
:thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969546342


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   There's a proposal to work around that issue in [the comment 
below](https://github.com/apache/curator/pull/430/files#r969545913)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969545913


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();
+
+// no extract CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
+latchCandidate1.start();
+timing.sleepABit();

Review Comment:
   ```suggestion
   assertNull(latchCandidate0.getOurPath());
   assertNull(latchCandidate1.getOurPath());
   
   latchCandidate0.start();
   latchCandidate1.start();
   
   Awaitility.await("Wait for the candidates to have a path 
assigned to them.")
   .pollInterval(Duration.ofMillis(100))
   .pollInSameThread()
   .until(() -> latchCandidate0.getOurPath() != null && 
latchCandidate1.getOurPath() != null);
   ```
   Would that work? The `LeaderLatch#ourPath` is set after the child is 
created. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in [line 
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
 because they are a cause for instabilities: The `close` in [line 
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
 has to happen after the child nodes for `candidate #0` and `candidate #1` are 
created. AFAIU, we cannot ensure that with the sleep calls due to the 
asynchronous nature of the `start` command that is triggered right before each 
sleep.
   I tried to add a `waitForCondition` instead, when coming up with this test, 
initially, that would wait for corresponding child to be created. 
Unfortunately, this resulted in the test blocking forever because (I guess) the 
await on the `latchCandidate0.debugCheckLeaderShipLatch` is executed in the 
main thread which makes any subsequent operation (including the check for 
children nodes) being blocked.
   
   I hoped that somebody else could come up with a better approach here. 
:innocent: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in [line 
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
 because they are a cause for instabilities: The `close` in [line 
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
 has to happen after the child nodes for `candidate #0` and `candidate #1` are 
created. AFAIU, we cannot ensure that with the sleep calls due to the 
asynchronous nature of the `start` command that is triggered right before each 
sleep.
   I tried to add a `waitForCondition` instead, when coming up with this test 
first, that would wait for the child to be created. Unfortunately, this 
resulted in the test blocking forever because (I guess) the await on the 
`latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which 
makes any subsequent operation (including the check for children nodes) being 
blocked.
   
   I hoped that somebody else could come up with a better approach here. 
:innocent: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969519573


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in [line 
248](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR248)
 because they are a cause for instabilities: The `close` in [line 
251](https://github.com/apache/curator/pull/430/files#diff-75966280cab1f9788b771d244e889731ba35c7918d365c070565e070d5b801ebR251)
 has to happen after the child nodes for `candidate #0` and `candidate #1` are 
created. AFAIU, we cannot ensure that with the sleep calls due to the 
asynchronous nature of the `start` command.
   I tried to add a `waitForCondition` instead, when coming up with this test 
first, that would wait for the child to be created. Unfortunately, this 
resulted in the test blocking forever because (I guess) the await on the 
`latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which 
makes any subsequent operation (including the check for children nodes) being 
blocked.
   
   I hoped that somebody else could come up with a better approach here. 
:innocent: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-09-13 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r969507748


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party

Review Comment:
   ```suggestion
   // we want to make sure that the leader gets leadership 
before other instances are joining the party
   ```
   and another typo



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+// candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+latchCandidate0.start();
+timing.sleepABit();

Review Comment:
   tbh, I'm not really happy with the sleep here and in line 248 because they 
are a cause for instabilities: The `close` in line 251 has to happen after the 
child nodes for `candidate #0` and `candidate #1` are created. AFAIU, we cannot 
ensure that with the sleep calls due to the asynchronous nature of the `start` 
command.
   I tried to add a `waitForCondition` instead, when coming up with this test 
first, that would wait for the child to be created. Unfortunately, this 
resulted in the test blocking forever because (I guess) the await on the 
`latchCandidate0.debugCheckLeaderShipLatch` is called in the main thread which 
makes any subsequent operation (including the check for children nodes) being 
blocked.
   
   I hoped that somebody else could come up with a better approach here. 
:innocent: 



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+{
+final String latchPath = "/foo/bar";
+final Timing2 timing = new Timing2();
+try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+{
+client.start();
+LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+try
+{
+latchInitialLeader.start();
+
+// we want to make sure that the leader gets leadership before 
other instances joining the party
+waitForALeader(Collections.singletonList(latchInitialLe

[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958456254


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -717,6 +720,7 @@ else if ( !oldValue && newValue )
 private void setNode(String newValue) throws Exception
 {
 String oldPath = ourPath.getAndSet(newValue);
+log.debug("setNode with oldPath: {}, newValue: {}", oldPath, newValue);

Review Comment:
   ```suggestion
   log.debug("setNode in {}, with oldPath: {}, newValue: {}", id, 
oldPath, newValue);
   ```
   Adding the ID helps when debugging tests that run multiple `LeaderLatch` 
instances locally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958163715


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test (I updated the test a bit):
   ```
   @Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
   LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
   LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
   
   try
   {
   latchInitialLeader.start();
   
   // we want to make sure that the leader gets leadership 
before other instances joining the party
   
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
   
   // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
   latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
   
   latchCandidate0.start();
   timing.sleepABit();
   
   // no extract CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
   latchCandidate1.start();
   timing.sleepABit();
   
   // triggers the removal of the corresponding child node 
after candidate #0 retrieved the children
   latchInitialLeader.close();
   
   latchCandidate0.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latchCandidate0, 
latchCandidate1), new Timing());
   
   assertTrue(latchCandidate0.hasLeadership() ^ 
latchCandidate1.hasLeadership());
   } finally
   {
   for (LeaderLatch latchToClose : 
Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1))
   {
   if ( latchToClose.getState() != LeaderLatch.State.CLOSED 
)
   {
   latchToClose.close();
   }
   }
   }
   }
   
   ```
   
   This test case describes the situation for 
[CURATOR-645](https://issues.apache.org/jira/browse/CURATOR-645). I checked in 
a local test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958163715


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test (I updated the test a bit):
   ```
   @Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
   LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
   LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
   
   try
   {
   latchInitialLeader.start();
   
   // we want to make sure that the leader gets leadership 
before other instances joining the party
   
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
   
   // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
   latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
   
   latchCandidate0.start();
   timing.sleepABit();
   
   // no extract CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
   latchCandidate1.start();
   timing.sleepABit();
   
   // triggers the removal of the corresponding child node 
after candidate #0 retrieved the children
   latchInitialLeader.close();
   
   latchCandidate0.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latchCandidate0, 
latchCandidate1), new Timing());
   
   assertTrue(latchCandidate0.hasLeadership() ^ 
latchCandidate1.hasLeadership());
   } finally
   {
   for (LeaderLatch latchToClose : 
Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1))
   {
   if ( latchToClose.getState() != LeaderLatch.State.CLOSED 
)
   {
   latchToClose.close();
   }
   }
   }
   }
   
   ```
   
   This test case describes the situation for CURATOR-645. I checked in a local 
test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.
   
   This covers the use-case that made me identify CURATOR-645.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958163715


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test (I updated the test a bit)
   ```
   @Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
   LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
   LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
   
   try
   {
   latchInitialLeader.start();
   
   // we want to make sure that the leader gets leadership 
before other instances joining the party
   
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
   
   // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
   latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
   
   latchCandidate0.start();
   timing.sleepABit();
   
   // no extract CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
   latchCandidate1.start();
   timing.sleepABit();
   
   // triggers the removal of the corresponding child node 
after candidate #0 retrieved the children
   latchInitialLeader.close();
   
   latchCandidate0.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latchCandidate0, 
latchCandidate1), new Timing());
   
   assertTrue(latchCandidate0.hasLeadership() ^ 
latchCandidate1.hasLeadership());
   } finally
   {
   for (LeaderLatch latchToClose : 
Arrays.asList(latchInitialLeader, latchCandidate0, latchCandidate1))
   {
   if ( latchToClose.getState() != LeaderLatch.State.CLOSED 
)
   {
   latchToClose.close();
   }
   }
   }
   }
   
   ```
   
   This test case describes the situation for CURATOR-645. I checked in a local 
test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958163715


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test:
   ```
  @Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0");
   latch0.start();
   
   waitForALeader(Collections.singletonList(latch0), new Timing());
   
   LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1");
   LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2");
   
   latch1.debugCheckLeaderShipLatch = new CountDownLatch(1);
   
   latch1.start();
   timing.sleepABit();
   
   latch2.start();
   timing.sleepABit();
   
   latch0.close();
   
   latch1.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latch1, latch2), new Timing());
   
   latch1.close();
   latch2.close();
   }
   }
   ```
   
   This test case describes the situation for CURATOR-645. I checked in a local 
test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test:
   ```
   @Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0");
   latch0.start();
   
   waitForALeader(Collections.singletonList(latch0), new Timing());
   
   LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1");
   LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2");
   
   latch1.debugCheckLeaderShipLatch = new CountDownLatch(1);
   
   latch1.start();
   timing.sleepABit();
   
   latch2.start();
   timing.sleepABit();
   
   latch0.close();
   
   latch1.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latch1, latch2), new Timing());
   
   latch1.close();
   latch2.close();
   }
   }
   ```
   
   This test case describes the situation for CURATOR-645. I checked in a local 
test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-30 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r958163715


##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   What do you think of the following test:
   ```
@Test
   public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
   {
   final String latchPath = "/foo/bar";
   final Timing2 timing = new Timing2();
   try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
   {
   client.start();
   LeaderLatch latch0 = new LeaderLatch(client, latchPath, "0");
   latch0.start();
   
   waitForALeader(Collections.singletonList(latch0), new Timing());
   
   LeaderLatch latch1 = new LeaderLatch(client, latchPath, "1");
   LeaderLatch latch2 = new LeaderLatch(client, latchPath, "2");
   
   latch1.debugCheckLeaderShipLatch = new CountDownLatch(1);
   
   latch1.start();
   timing.sleepABit();
   
   latch2.start();
   timing.sleepABit();
   
   latch0.close();
   
   latch1.debugCheckLeaderShipLatch.countDown();
   
   waitForALeader(Arrays.asList(latch1, latch2), new Timing());
   
   latch1.close();
   latch2.close();
   }
   }
   ```
   
   This test case describes the situation for CURATOR-645. I checked in a local 
test run, that it will run forever when still using `reset()` in 
[LeaderLatch:633](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L633)
 but would succeed when using `getChildren()`, instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [curator] XComp commented on a diff in pull request #430: CURATOR-644. CURATOR-645. Fix livelock in LeaderLatch

2022-08-29 Thread GitBox


XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r957341820


##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState)
 {
 try
 {
-if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) 
|| !hasLeadership.get() )

Review Comment:
   I'm wondering whether we should remove this `if` entirely: Let's assume the 
current instance doesn't have the leadership but watches the child node 
referring to the current leader. If the connection was temporarily suspended it 
could be that the actual leader lost its leadership in the meantime. In that 
case, we want the current instance to check the leadership. :thinking: 



##
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##
@@ -218,6 +218,30 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
 }
 }
 
+@Test
+public void testOurPathDeletedOnReconnect() throws Exception

Review Comment:
   The test is not suitable for the change. I reverted the change and ran the 
test. The test still succeeded.



##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState)
 {
 try
 {
-if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) 
|| !hasLeadership.get() )

Review Comment:
   But thinking about it once more, this might not be an issue because the 
current LeaderLatch would have a watcher on the current leader's node which 
would have been triggered when the leader's child would have been deleted. 
:thinking: 



##
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##
@@ -604,7 +604,7 @@ else if ( ourIndex == 0 )
 @Override
 public void process(WatchedEvent event)
 {
-if ( (state.get() == State.STARTED) && (event.getType() == 
Event.EventType.NodeDeleted) && (localOurPath != null) )

Review Comment:
   Good point. Due to the condition in 
[LeaderLatch:588](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L588),
 `ourIndex` must be `-1` if `localOurPath` is actually `null` and, therefore, 
would prevent the execution from reaching the `else` branch (instead the if 
condition in 
[LeaderLatch:592](https://github.com/apache/curator/blob/425598cb6bf6a5b227a5fdd293fe46c7978d6578/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L592)
 applies) where the watcher is set.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org