XComp commented on code in PR #24132:
URL: https://github.com/apache/flink/pull/24132#discussion_r1467558102


##########
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/resources/KubernetesLeaderElector.java:
##########
@@ -86,12 +117,38 @@ public KubernetesLeaderElector(
                                                         newLeader,
                                                         
leaderConfig.getConfigMapName())))
                         .build();
+        this.executorService = executorService;
+
+        LOG.info(
+                "Create KubernetesLeaderElector on lock {}.",
+                leaderElectionConfig.getLock().describe());
+    }
+
+    @GuardedBy("lock")
+    private void resetInternalLeaderElector() {
+        stopLeaderElectionCycle();
+
         internalLeaderElector =
                 new LeaderElector(kubernetesClient, leaderElectionConfig, 
executorService);
+        currentLeaderElectionSession = internalLeaderElector.start();
+
         LOG.info(
-                "Create KubernetesLeaderElector {} with lock identity {}.",
-                leaderConfig.getConfigMapName(),
-                leaderConfig.getLockIdentity());
+                "Triggered leader election on lock {}.", 
leaderElectionConfig.getLock().describe());
+    }
+
+    @GuardedBy("lock")
+    private void stopLeaderElectionCycle() {
+        if (internalLeaderElector != null) {
+            Preconditions.checkNotNull(currentLeaderElectionSession);
+
+            // the current leader election cycle needs to be cancelled before 
releasing the lock to
+            // avoid retrying
+            currentLeaderElectionSession.cancel(true);
+            currentLeaderElectionSession = null;
+
+            internalLeaderElector.release();

Review Comment:
   How will the `LeaderElector#release()` call trigger another `notLeader()` 
callback?
   
   `KubernetesLeaderElector#stopLeaderElectionCycle` is called in two places:
   * `KubernetesLeaderElector#stop`
     * State `leadership acquired`: The renew cycle of the 
`internalLeaderElector` will be cancelled which triggers the 
`LeaderCallbackHandler#notLeader` callback in 
[LeaderElector:96](https://github.com/fabric8io/kubernetes-client/blob/0f6c696509935a6a86fdb4620caea023d8e680f1/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/extended/leaderelection/LeaderElector.java#L96).
 The v6.6.2 implementation of `LeaderElector#stopLeading` either calls release 
(if we enable `ReleaseOnCall`) or calls the `notLeader` callback. The bugfix 
for this is https://github.com/fabric8io/kubernetes-client/commit/0f6c6965 and 
ended in v6.9.0. Without upgrading the k8s client dependency, we have to call 
release explicitly to trigger the cleanup.
     * State `leadership lost` (assuming that a `KubernetesLeaderElector#run` 
happened and the elector is initialized): The `#release` call won't have any 
affect because of the if condition in 
[LeaderElector:136](https://github.com/fabric8io/kubernetes-client/blob/v6.6.2/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/extended/leaderelection/LeaderElector.java#L136).
   * `KubernetesLeaderElector#resetInternalLeaderElector` which is called as 
part of the `KubernetesLeaderElector#run` method
     * State `leadership acquired`: That case shouldn't happen because we only 
call `#run` after the leadership is lost. If we would do it anyway, it would 
trigger the cancellation of leadership analogously to what happens for the 
`KubernetesLeaderElectorl#stop` call with leadership acquired described above. 
A new leaderelection lifecycle will be initiated afterwards in 
`KubernetesLeaderElector#resetInternalLeaderElector` call.
     * State `leadership lost`: The 
`KubernetesLeaderElector#currentLeaderElectionSession` is already completed. 
Therefore, cancelling this future doesn't have any effect. The subsequent 
`LeaderElector#release` call won't have any effect as well because of the if 
condition in 
[LeaderElector:136](https://github.com/fabric8io/kubernetes-client/blob/v6.6.2/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/extended/leaderelection/LeaderElector.java#L136)
 again.
   
   I created FLINK-34243 and will add a comment to cover this
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to