ahuang98 commented on code in PR #18987:
URL: https://github.com/apache/kafka/pull/18987#discussion_r1968254346
##########
raft/src/main/java/org/apache/kafka/raft/VoterSet.java:
##########
@@ -343,6 +343,10 @@ public ReplicaKey voterKey() {
return voterKey;
}
+ public int voterId() {
Review Comment:
can this be derived from voterKey().id() instead?
##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -487,6 +555,12 @@ private EventScheduler
schedulerWithDefaultInvariants(Cluster cluster) {
return scheduler;
}
+ private EventScheduler schedulerWithKip853Invariants(Cluster cluster) {
+ EventScheduler scheduler = schedulerWithDefaultInvariants(cluster);
+ scheduler.addInvariant(new AtMostOneUncommittedVoterSet(cluster));
Review Comment:
this invariant doesn't apply to all the tests? (if there is a reason this
invariant can't be part of the default invariant set let's add a comment to
explain why)
##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -1127,14 +1331,75 @@ private MajorityReachedHighWatermark(Cluster cluster) {
@Override
public void verify() {
- cluster.leaderHighWatermark().ifPresent(highWatermark -> {
- long numReachedHighWatermark =
cluster.nodes.entrySet().stream()
- .filter(entry ->
cluster.voters.containsKey(entry.getKey()))
- .filter(entry -> entry.getValue().log.endOffset().offset()
>= highWatermark)
- .count();
- assertTrue(
- numReachedHighWatermark >= cluster.majoritySize(),
- "Insufficient nodes have reached current high watermark");
+ if (cluster.withKip853) {
+ /*
+ * For clusters running in KIP-853 mode, we check that a
majority of at least one of:
+ * 1. the leader's voter set at the HWM
+ * 2. the leader's lastVoterSet()
+ * has reached the HWM. We need to perform a more elaborate
check here because in clusters where
+ * an Add/RemoveVoter request increases/decreases the majority
of voters value by 1, the leader
+ * could have used either majority value to update its HWM
value. This is because depending on
+ * whether the leader read the most recent VotersRecord prior
to updating its HWM value, the number
+ * of nodes (the majority) used to calculate that HWM value is
different. This matters for invariant
+ * checking because we perform this verification on every
message delivery.
+ * */
+ cluster.leaderWithMaxEpoch().ifPresent(leaderNode -> {
+ leaderNode.client.highWatermark().ifPresent(highWatermark
-> {
+ VoterSet voterSet =
leaderNode.client.partitionState().lastVoterSet();
+ long numReachedHighWatermark =
numReachedHighWatermark(highWatermark, voterSet.voterIds());
+ if (numReachedHighWatermark <
cluster.majoritySize(voterSet.size())) {
+
leaderNode.client.partitionState().voterSetAtOffset(highWatermark -
1).ifPresent(otherVoterSet -> {
Review Comment:
I'm having a hard time understanding when this would be necessary - we check
verify on every "event" (e.g. new record written to leader? what else
qualifies) - and we're concerned that we might be checking majority has reached
HWM before the HWM reflects the current voter set?
##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -1127,14 +1331,75 @@ private MajorityReachedHighWatermark(Cluster cluster) {
@Override
public void verify() {
- cluster.leaderHighWatermark().ifPresent(highWatermark -> {
- long numReachedHighWatermark =
cluster.nodes.entrySet().stream()
- .filter(entry ->
cluster.voters.containsKey(entry.getKey()))
- .filter(entry -> entry.getValue().log.endOffset().offset()
>= highWatermark)
- .count();
- assertTrue(
- numReachedHighWatermark >= cluster.majoritySize(),
- "Insufficient nodes have reached current high watermark");
+ if (cluster.withKip853) {
+ /*
+ * For clusters running in KIP-853 mode, we check that a
majority of at least one of:
+ * 1. the leader's voter set at the HWM
+ * 2. the leader's lastVoterSet()
+ * has reached the HWM. We need to perform a more elaborate
check here because in clusters where
+ * an Add/RemoveVoter request increases/decreases the majority
of voters value by 1, the leader
+ * could have used either majority value to update its HWM
value. This is because depending on
+ * whether the leader read the most recent VotersRecord prior
to updating its HWM value, the number
Review Comment:
correct me if I'm wrong - wouldn't this always be the case? that the leader
read its most recent votersRecord prior to updating HWM?
`appendAsLeader -> updateState -> maybeLoadLog (will catch the latest
votersRecord)`
is called before
`flushLeaderLog -> maybeUpdateHighWatermark`
##########
raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java:
##########
@@ -1127,14 +1331,75 @@ private MajorityReachedHighWatermark(Cluster cluster) {
@Override
public void verify() {
- cluster.leaderHighWatermark().ifPresent(highWatermark -> {
- long numReachedHighWatermark =
cluster.nodes.entrySet().stream()
- .filter(entry ->
cluster.voters.containsKey(entry.getKey()))
- .filter(entry -> entry.getValue().log.endOffset().offset()
>= highWatermark)
- .count();
- assertTrue(
- numReachedHighWatermark >= cluster.majoritySize(),
- "Insufficient nodes have reached current high watermark");
+ if (cluster.withKip853) {
+ /*
+ * For clusters running in KIP-853 mode, we check that a
majority of at least one of:
+ * 1. the leader's voter set at the HWM
Review Comment:
nit: wording is a bit unclear (e.g. reading "check that a majority of at
least one of the leader's voter set at the HWM" doesn't make sense to me)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]