ahuang98 commented on code in PR #18240:
URL: https://github.com/apache/kafka/pull/18240#discussion_r1899341945
##########
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:
##########
@@ -2935,14 +3014,18 @@ private long pollResigned(long currentTimeMs) {
// until either the shutdown expires or an election bumps the epoch
stateTimeoutMs = shutdown.remainingTimeMs();
} else if (state.hasElectionTimeoutExpired(currentTimeMs)) {
- if (quorum.isVoter()) {
- transitionToCandidate(currentTimeMs);
- } else {
+// if (quorum.isVoter()) {
+ // canElectNewLeaderAfterOldLeaderPartitioned fails if we do
not bump epoch since it is possible
+ // that the replica ends up as follower in the same epoch.
+ // resigned(leaderId=local) -> prospective(leaderId=local) ->
follower(leaderId=local) which is illegal
+// transitionToProspective(quorum.epoch() + 1, currentTimeMs);
+// transitionToCandidate(currentTimeMs);
+// } else {
Review Comment:
the existing raft event simulation tests picked up on a new bug in
pollResigned - if we simply replace the transitionToCandidate(currentTimeMs)
with transitionToProspective(currentTimeMs), a cordoned leader in epoch 5 could
resign in epoch 5, transition to prospective in epoch 5 (with
leaderId=localId), fail election and then attempt to become follower of itself
in epoch 5.
there are a few alternatives which have their pros/cons
- resigned voter in epoch X should transition to prospective in epoch X+1
- cons: need to create a special code path just for this case to allow
becoming prospective in epoch+1 (would also add trivial complexity for
determining if votedKey or leaderId should be kept from prior transition).
transitioning to prospective in epoch + 1 is almost as disruptive as
transitioning directly to candidate since it involves an epoch bump
- pro: probably the option which follows intentions of past logic most
closely
- resigned voter in epoch X should simply transition to unattached in epoch
X+1 (current version)
- con: resigned replica has to wait two election timeouts after
resignation to become prospective
- pro: simplified logic. unless this is the only replica eligible for
leadership in the quorum (e.g. due to network partitioning), the impact of
waiting two election timeouts after resignation is small - all other replicas
should be starting their own elections within a single fetch timeout/election
timeout
- resigned voter in epoch X instead waits a smaller backoffTimeMs before
transitioning to unattached in epoch X+1
- con: scope creep, additional changes to resignedState
- pro: resigned voter waits less time before becoming eligible to start
a new election.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]