[GitHub] [kafka] jsancio commented on a change in pull request #10309: KAFKA-12181; Loosen raft fetch offset validation of remote replicas

2021-03-12 Thread GitBox


jsancio commented on a change in pull request #10309:
URL: https://github.com/apache/kafka/pull/10309#discussion_r593492361



##
File path: raft/src/main/java/org/apache/kafka/raft/LeaderState.java
##
@@ -247,7 +267,7 @@ private boolean isVoter(int remoteNodeId) {
 return voterReplicaStates.containsKey(remoteNodeId);
 }
 
-private static class ReplicaState implements Comparable {
+private static abstract class ReplicaState implements 
Comparable {

Review comment:
   Do we really need to distinguish between `VoterState` and 
`ObserverState`? For example, the only different is `hasAcknowledgedLeader`. I 
would argue that we could just move this field to `ReplicateState` and say that 
observers will have this value always false or the value is ignored.
   
   I am leaning towards just updating the value irrespective of if it is a 
voter or observer. This is probably useful to have it when we implement quorum 
reassignment. We can document whatever semantic you decide as a comment for 
this type.

##
File path: raft/src/test/java/org/apache/kafka/raft/LeaderStateTest.java
##
@@ -140,6 +160,22 @@ public void testUpdateHighWatermarkQuorumSizeThree() {
 assertEquals(Optional.of(new LogOffsetMetadata(20L)), 
state.highWatermark());
 }
 
+@Test
+public void testNonMonotonicHighWatermarkUpdate() {
+MockTime time = new MockTime();
+int node1 = 1;
+LeaderState state = newLeaderState(mkSet(localId, node1), 0L);
+state.updateLocalState(time.milliseconds(), new 
LogOffsetMetadata(10L));
+state.updateReplicaState(node1, time.milliseconds(), new 
LogOffsetMetadata(10L));
+assertEquals(Optional.of(new LogOffsetMetadata(10L)), 
state.highWatermark());
+
+// Follower crashes and disk is lost. It fetches an earlier offset to 
rebuild state.
+state.updateReplicaState(node1, time.milliseconds(), new 
LogOffsetMetadata(5L));

Review comment:
   Let's check that this calls returns `false`.
   
   Let's also add a test that calls `getVoterEndOffsets` and checks the 
returned map is correct.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jsancio commented on a change in pull request #10309: KAFKA-12181; Loosen raft fetch offset validation of remote replicas

2021-03-14 Thread GitBox


jsancio commented on a change in pull request #10309:
URL: https://github.com/apache/kafka/pull/10309#discussion_r594031740



##
File path: raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java
##
@@ -401,6 +401,51 @@ private void checkBackToBackLeaderFailures(QuorumConfig 
config) {
 }
 }
 
+@Test
+public void checkSingleNodeCommittedDataLossQuorumSizeThree() {
+checkSingleNodeCommittedDataLoss(new QuorumConfig(3, 0));
+}
+
+private void checkSingleNodeCommittedDataLoss(QuorumConfig config) {
+assertTrue(config.numVoters > 2,
+"This test requires the cluster to be able to recover from one 
failed node");
+
+for (int seed = 0; seed < 100; seed++) {
+// We run this test without the `MonotonicEpoch` and 
`MajorityReachedHighWatermark`
+// invariants since the loss of committed data on one node can 
violate them.
+
+Cluster cluster = new Cluster(config, seed);
+EventScheduler scheduler = new EventScheduler(cluster.random, 
cluster.time);
+scheduler.addInvariant(new MonotonicHighWatermark(cluster));
+scheduler.addInvariant(new SingleLeader(cluster));
+scheduler.addValidation(new ConsistentCommittedData(cluster));
+
+MessageRouter router = new MessageRouter(cluster);
+
+cluster.startAll();
+schedulePolling(scheduler, cluster, 3, 5);
+scheduler.schedule(router::deliverAll, 0, 2, 5);
+scheduler.schedule(new SequentialAppendAction(cluster), 0, 2, 3);
+scheduler.runUntil(() -> cluster.anyReachedHighWatermark(10));
+
+RaftNode node = cluster.randomRunning().orElseThrow(() ->
+new AssertionError("Failed to find running node")
+);
+
+// Kill a random node and drop all of its persistent state. The 
Raft
+// protocol guarantees should still ensure we lose no committed 
data
+// as long as a new leader is elected before the failed node is 
restarted.
+cluster.kill(node.nodeId);
+cluster.deletePersistentState(node.nodeId);
+scheduler.runUntil(() -> !cluster.hasLeader(node.nodeId) && 
cluster.hasConsistentLeader());

Review comment:
   Hmm. I was looking at the implementation for `hasConsistentLeader`. It 
checks that all of the `LeaderState` match. Which means that all of the 
replicas need to vote for the same leader. This is not strictly required for 
having a consistent leader.
   
   Maybe this works in this test because the number of voters is 3 and one of 
the nodes was killed.

##
File path: raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java
##
@@ -401,6 +401,51 @@ private void checkBackToBackLeaderFailures(QuorumConfig 
config) {
 }
 }
 
+@Test
+public void checkSingleNodeCommittedDataLossQuorumSizeThree() {
+checkSingleNodeCommittedDataLoss(new QuorumConfig(3, 0));
+}
+
+private void checkSingleNodeCommittedDataLoss(QuorumConfig config) {
+assertTrue(config.numVoters > 2,
+"This test requires the cluster to be able to recover from one 
failed node");
+
+for (int seed = 0; seed < 100; seed++) {
+// We run this test without the `MonotonicEpoch` and 
`MajorityReachedHighWatermark`
+// invariants since the loss of committed data on one node can 
violate them.
+
+Cluster cluster = new Cluster(config, seed);
+EventScheduler scheduler = new EventScheduler(cluster.random, 
cluster.time);
+scheduler.addInvariant(new MonotonicHighWatermark(cluster));
+scheduler.addInvariant(new SingleLeader(cluster));
+scheduler.addValidation(new ConsistentCommittedData(cluster));
+
+MessageRouter router = new MessageRouter(cluster);
+
+cluster.startAll();
+schedulePolling(scheduler, cluster, 3, 5);
+scheduler.schedule(router::deliverAll, 0, 2, 5);
+scheduler.schedule(new SequentialAppendAction(cluster), 0, 2, 3);
+scheduler.runUntil(() -> cluster.anyReachedHighWatermark(10));
+
+RaftNode node = cluster.randomRunning().orElseThrow(() ->
+new AssertionError("Failed to find running node")
+);
+
+// Kill a random node and drop all of its persistent state. The 
Raft
+// protocol guarantees should still ensure we lose no committed 
data
+// as long as a new leader is elected before the failed node is 
restarted.
+cluster.kill(node.nodeId);
+cluster.deletePersistentState(node.nodeId);

Review comment:
   The implementation for `deletePersistentState` assumes that `kill` was 
or will be called for the change to take effect. Should we instead have tests 
call a method called `killAndDeletePersistentState`?




-

[GitHub] [kafka] jsancio commented on a change in pull request #10309: KAFKA-12181; Loosen raft fetch offset validation of remote replicas

2021-03-15 Thread GitBox


jsancio commented on a change in pull request #10309:
URL: https://github.com/apache/kafka/pull/10309#discussion_r594638669



##
File path: raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java
##
@@ -401,6 +401,51 @@ private void checkBackToBackLeaderFailures(QuorumConfig 
config) {
 }
 }
 
+@Test
+public void checkSingleNodeCommittedDataLossQuorumSizeThree() {
+checkSingleNodeCommittedDataLoss(new QuorumConfig(3, 0));
+}
+
+private void checkSingleNodeCommittedDataLoss(QuorumConfig config) {
+assertTrue(config.numVoters > 2,
+"This test requires the cluster to be able to recover from one 
failed node");
+
+for (int seed = 0; seed < 100; seed++) {
+// We run this test without the `MonotonicEpoch` and 
`MajorityReachedHighWatermark`
+// invariants since the loss of committed data on one node can 
violate them.
+
+Cluster cluster = new Cluster(config, seed);
+EventScheduler scheduler = new EventScheduler(cluster.random, 
cluster.time);
+scheduler.addInvariant(new MonotonicHighWatermark(cluster));
+scheduler.addInvariant(new SingleLeader(cluster));
+scheduler.addValidation(new ConsistentCommittedData(cluster));
+
+MessageRouter router = new MessageRouter(cluster);
+
+cluster.startAll();
+schedulePolling(scheduler, cluster, 3, 5);
+scheduler.schedule(router::deliverAll, 0, 2, 5);
+scheduler.schedule(new SequentialAppendAction(cluster), 0, 2, 3);
+scheduler.runUntil(() -> cluster.anyReachedHighWatermark(10));
+
+RaftNode node = cluster.randomRunning().orElseThrow(() ->
+new AssertionError("Failed to find running node")
+);
+
+// Kill a random node and drop all of its persistent state. The 
Raft
+// protocol guarantees should still ensure we lose no committed 
data
+// as long as a new leader is elected before the failed node is 
restarted.
+cluster.kill(node.nodeId);
+cluster.deletePersistentState(node.nodeId);
+scheduler.runUntil(() -> !cluster.hasLeader(node.nodeId) && 
cluster.hasConsistentLeader());

Review comment:
   Yes. I meant to say `ElectionState` instead of `LeaderState`. 
`ElectionState` has a field called `votedIdOpt` for which `equals` checks for 
equality. This is not strictly required for having a "consistent" leader. I 
think for having a consistent leader for an epoch, only the `epoch` and 
`leaderIdOpt` need to match for all of the replicas.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka] jsancio commented on a change in pull request #10309: KAFKA-12181; Loosen raft fetch offset validation of remote replicas

2021-03-15 Thread GitBox


jsancio commented on a change in pull request #10309:
URL: https://github.com/apache/kafka/pull/10309#discussion_r594787145



##
File path: raft/src/test/java/org/apache/kafka/raft/RaftEventSimulationTest.java
##
@@ -401,6 +401,51 @@ private void checkBackToBackLeaderFailures(QuorumConfig 
config) {
 }
 }
 
+@Test
+public void checkSingleNodeCommittedDataLossQuorumSizeThree() {
+checkSingleNodeCommittedDataLoss(new QuorumConfig(3, 0));
+}
+
+private void checkSingleNodeCommittedDataLoss(QuorumConfig config) {
+assertTrue(config.numVoters > 2,
+"This test requires the cluster to be able to recover from one 
failed node");
+
+for (int seed = 0; seed < 100; seed++) {
+// We run this test without the `MonotonicEpoch` and 
`MajorityReachedHighWatermark`
+// invariants since the loss of committed data on one node can 
violate them.
+
+Cluster cluster = new Cluster(config, seed);
+EventScheduler scheduler = new EventScheduler(cluster.random, 
cluster.time);
+scheduler.addInvariant(new MonotonicHighWatermark(cluster));
+scheduler.addInvariant(new SingleLeader(cluster));
+scheduler.addValidation(new ConsistentCommittedData(cluster));
+
+MessageRouter router = new MessageRouter(cluster);
+
+cluster.startAll();
+schedulePolling(scheduler, cluster, 3, 5);
+scheduler.schedule(router::deliverAll, 0, 2, 5);
+scheduler.schedule(new SequentialAppendAction(cluster), 0, 2, 3);
+scheduler.runUntil(() -> cluster.anyReachedHighWatermark(10));
+
+RaftNode node = cluster.randomRunning().orElseThrow(() ->
+new AssertionError("Failed to find running node")
+);
+
+// Kill a random node and drop all of its persistent state. The 
Raft
+// protocol guarantees should still ensure we lose no committed 
data
+// as long as a new leader is elected before the failed node is 
restarted.
+cluster.kill(node.nodeId);
+cluster.deletePersistentState(node.nodeId);
+scheduler.runUntil(() -> !cluster.hasLeader(node.nodeId) && 
cluster.hasConsistentLeader());

Review comment:
   Got it. I missed that `votedIdOpt` is set to `empty` by the leader and 
the followers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org