XComp commented on code in PR #430:
URL: https://github.com/apache/curator/pull/430#discussion_r970541350


##########
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##########
@@ -586,6 +586,9 @@ private void checkLeadership(List<String> children) throws 
Exception
         final String localOurPath = ourPath.get();
         List<String> sortedChildren = 
LockInternals.getSortedChildren(LOCK_NAME, sorter, children);
         int ourIndex = (localOurPath != null) ? 
sortedChildren.indexOf(ZKPaths.getNodeFromPath(localOurPath)) : -1;
+
+        log.debug("checkLeadership with ourPath: {}, children: {}", 
localOurPath, sortedChildren);

Review Comment:
   ```suggestion
           log.debug("[{}] checkLeadership with ourPath: {}, children: {}", id, 
localOurPath, sortedChildren);
   ```



##########
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##########
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
         }
     }
 
+    @Test
+    public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+    {
+        final String latchPath = "/foo/bar";
+        final Timing2 timing = new Timing2();
+        try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+        {
+            client.start();
+            LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+            LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+            LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+            try
+            {
+                latchInitialLeader.start();
+
+                // we want to make sure that the leader gets leadership before 
other instances are joining the party
+                waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+                // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+                latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+                latchCandidate0.start();
+                timing.sleepABit();
+
+                // no extra CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
+                latchCandidate1.start();
+                timing.sleepABit();

Review Comment:
   You need to add `Awaitility` to the pom of the `curator-recipes` module:
   ```
           <dependency>
               <groupId>org.awaitility</groupId>
               <artifactId>awaitility</artifactId>
               <scope>test</scope>
           </dependency>
   ```



##########
curator-recipes/src/test/java/org/apache/curator/framework/recipes/leader/TestLeaderLatch.java:
##########
@@ -218,6 +218,56 @@ public void testWatchedNodeDeletedOnReconnect() throws 
Exception
         }
     }
 
+    @Test
+    public void 
testLeadershipElectionWhenNodeDisappearsAfterChildrenAreRetrieved() throws 
Exception
+    {
+        final String latchPath = "/foo/bar";
+        final Timing2 timing = new Timing2();
+        try (CuratorFramework client = 
CuratorFrameworkFactory.newClient(server.getConnectString(), timing.session(), 
timing.connection(), new RetryOneTime(1)))
+        {
+            client.start();
+            LeaderLatch latchInitialLeader = new LeaderLatch(client, 
latchPath, "initial-leader");
+            LeaderLatch latchCandidate0 = new LeaderLatch(client, latchPath, 
"candidate-0");
+            LeaderLatch latchCandidate1 = new LeaderLatch(client, latchPath, 
"candidate-1");
+
+            try
+            {
+                latchInitialLeader.start();
+
+                // we want to make sure that the leader gets leadership before 
other instances are joining the party
+                waitForALeader(Collections.singletonList(latchInitialLeader), 
new Timing());
+
+                // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
+                latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
+
+                latchCandidate0.start();
+                timing.sleepABit();
+
+                // no extra CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
+                latchCandidate1.start();
+                timing.sleepABit();

Review Comment:
   ```suggestion
                   // we want to make sure that the leader gets leadership 
before other instances are going to join the party
                   
waitForALeader(Collections.singletonList(latchInitialLeader), new Timing());
   
                   // candidate #0 will wait for the leader to go away - this 
should happen after the child nodes are retrieved by candidate #0
                   latchCandidate0.debugCheckLeaderShipLatch = new 
CountDownLatch(1);
                   latchCandidate0.start();
                   
                   final int expectedChildrenAfterCandidate0Joins = 2;
                   Awaitility.await("There should be " + 
expectedChildrenAfterCandidate0Joins + " child nodes created after candidate #0 
joins the leader election.")
                       .pollInterval(Duration.ofMillis(100))
                       .pollInSameThread()
                       .until(() -> 
client.getChildren().forPath(latchPath).size() == 
expectedChildrenAfterCandidate0Joins);
   
                   // no extra CountDownLatch needs to be set here because 
candidate #1 will rely on candidate #0
                   latchCandidate1.start();
                   
                   final int expectedChildrenAfterCandidate1Joins = 3;
                   Awaitility.await("There should be " + 
expectedChildrenAfterCandidate1Joins + " child nodes created after candidate #1 
joins the leader election.")
                       .pollInterval(Duration.ofMillis(100))
                       .pollInSameThread()
                       .until(() -> 
client.getChildren().forPath(latchPath).size() == 
expectedChildrenAfterCandidate1Joins);
   ```
   Initially, I played around with checking the `ourPath` of each `LeaderLatch` 
instance. This didn't work, because that field is set in a `BackgroundCallback` 
(see 
[LeaderLatch:540](https://github.com/apache/curator/blob/2775855fc80fdea8433d5dedbb746f58ddd443c4/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java#L540)).
 The background tasks are handled in the `EventThread` of the ZK client. That 
one is already blocked by the `CountdownLatch` of candidate #0. 
   
   But we could still use the `SendThread` which is used for outgoing commands. 
That's why calling `.getChildren` on the client itself still works and is 
sufficient enough to verify that the child nodes are actually created.



##########
curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderLatch.java:
##########
@@ -667,9 +670,9 @@ protected void handleStateChange(ConnectionState newState)
             {
                 try
                 {
-                    if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) 
|| !hasLeadership.get() )
+                    if ( 
client.getConnectionStateErrorPolicy().isErrorState(ConnectionState.SUSPENDED) )
                     {
-                        reset();
+                        getChildren();

Review Comment:
   I'm not convinced anymore that this change is correct. Here, we're covering 
the case where `ConnectionState.SUSPENDED` is considered an error state, i.e. 
the leadership is lost if the connection was suspended. In that case, we should 
do a reset because we're considering the leadership being lost already due to 
the connection instability having caused an error.
   
   Instead, we should add an `else` branch that does the `getChildren` call in 
case of a reconnect and the policy is allowing reconnects after suspension.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@curator.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to