mumrah commented on code in PR #14593: URL: https://github.com/apache/kafka/pull/14593#discussion_r1368626043
########## metadata/src/main/java/org/apache/kafka/controller/PartitionChangeBuilder.java: ########## @@ -93,6 +93,9 @@ public enum Election { private boolean eligibleLeaderReplicasEnabled; private int minISR; + // Whether allow electing last known leader when ELR is enabled. Note, the last known leader will be stored in the + // lastKnownElr field. + private boolean lastKnownLeaderEnabled = true; Review Comment: When will this be set to false? Will it be based on the different types of unclean recovery? ########## metadata/src/main/java/org/apache/kafka/controller/PartitionChangeBuilder.java: ########## @@ -253,17 +269,30 @@ private ElectionResult electAnyLeader() { return new ElectionResult(NO_LEADER, false); } + private boolean canElectLastKnownLeader() { + return eligibleLeaderReplicasEnabled && lastKnownLeaderEnabled && targetElr.isEmpty() && targetIsr.isEmpty() + && partition.lastKnownElr.length == 1 && isAcceptableLeader.test(partition.lastKnownElr[0]); + } + private boolean isValidNewLeader(int replica) { - return targetIsr.contains(replica) && isAcceptableLeader.test(replica); + // The valid new leader should be in either ISR or in ELR when ISR is empty. + return (targetIsr.contains(replica) || (targetIsr.isEmpty() && targetElr.contains(replica))) && isAcceptableLeader.test(replica); } private void tryElection(PartitionChangeRecord record) { ElectionResult electionResult = electLeader(); if (electionResult.node != partition.leader) { // generating log messages for partition elections can get expensive on large clusters, // so only log clean elections at TRACE level; log unclean elections at INFO level - // to ensure the message is emitted since an unclean election can lead to data loss. - if (electionResult.unclean) { + // to ensure the message is emitted since an unclean election can lead to data loss; + if (targetElr.contains(electionResult.node)) { + targetIsr = Collections.singletonList(electionResult.node); + targetElr = targetElr.stream().filter(replica -> replica != electionResult.node).collect(Collectors.toList()); + targetLastKnownElr = targetLastKnownElr.stream() + .filter(replica -> replica != electionResult.node).collect(Collectors.toList()); Review Comment: Are these used anywhere? ########## metadata/src/main/java/org/apache/kafka/controller/PartitionChangeBuilder.java: ########## @@ -253,17 +269,30 @@ private ElectionResult electAnyLeader() { return new ElectionResult(NO_LEADER, false); } + private boolean canElectLastKnownLeader() { Review Comment: Not necessarily for this PR, but I wonder how we can surface more of our logic results to the operator. In this case, we're putting a lot of logic behind one boolean. It might be nice if we could attach more information to ElectionResult throughout the `tryElection` code. For example, if a user expects the last known leader to be elected based on their configs, but a LKL is _not_ elected -- how can they debug the situation? ########## metadata/src/main/java/org/apache/kafka/controller/PartitionChangeBuilder.java: ########## @@ -253,17 +269,30 @@ private ElectionResult electAnyLeader() { return new ElectionResult(NO_LEADER, false); } + private boolean canElectLastKnownLeader() { + return eligibleLeaderReplicasEnabled && lastKnownLeaderEnabled && targetElr.isEmpty() && targetIsr.isEmpty() + && partition.lastKnownElr.length == 1 && isAcceptableLeader.test(partition.lastKnownElr[0]); Review Comment: Making sure I follow this. We can elect the last known leader if: * ELR is enabled * Last known leader election is enabled (how does this get set?) * target ELR and ISR is empty (meaning the last replica with the HWM has gone away?) * previous ELR had just a single member Do we have an invariant that the ELR can only change by a single replica in one PartitionChangeRecord? For example, could we have an empty ISR and empty ELR, but have two replicas in `lastKnownElr`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org