kevin-wu24 commented on code in PR #20859:
URL: https://github.com/apache/kafka/pull/20859#discussion_r2570363560


##########
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:
##########
@@ -500,11 +501,31 @@ public void initialize(
             kafkaRaftMetrics,
             externalKRaftMetrics
         );
+
+        // Set up listener to track voter set changes
+        partitionState.setVoterSetChangeListener((offset, voterSet) -> {
+            // We dont need to check high watermark here since it already 
check by
+            // hasJoined is not empty.
+            if (nodeId.isPresent() && hasJoined.isPresent()) {
+                ReplicaKey localReplicaKey = ReplicaKey.of(nodeId.getAsInt(), 
nodeDirectoryId);
+                if (voterSet.isVoter(localReplicaKey) && !hasJoined.get()) {
+                    logger.error("Detected that local node {} has been added 
to voter set at offset {}",
+                               localReplicaKey, offset);
+                    hasJoined = Optional.of(true);
+                }
+            }
+        });
+
         // Read the entire log
         logger.info("Reading KRaft snapshot and log as part of the 
initialization");
         partitionState.updateState();
         logger.info("Starting voters are {}", partitionState.lastVoterSet());
 
+        if (nodeId.isPresent() && canBecomeVoter && quorumConfig.autoJoin()
+                && isVoter(ReplicaKey.of(nodeId.getAsInt(), nodeDirectoryId))) 
{
+            hasJoined = Optional.of(true);
+        }
+

Review Comment:
   You need to wait for a fetch to complete, so that the starting voters can 
complete a fetch loop and transition from `UNKNOWN -> HAS_JOINED`.
   
   In `testBootstrapVoterSetDoesNotSendAddVoterAfterRemove`, I am pretty sure 
that `advanceTimeAndCompleteFetch` just does an empty fetch. You need to 
actually perform a fetch that contains the starting voters in the response. The 
code you currently have sets up a state where the initial local voter set state 
defined by `withStartingVoters` is "out-of-date" with the first fetch loop 
performed, so the node thinks it is in `HAS_NOT_JOINED` and tries to auto-join, 
which is the intended behavior of the design.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to