[ https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329364#comment-17329364 ]
Jose Armando Garcia Sancio commented on KAFKA-10800: ---------------------------------------------------- 1) What does "the state machine" mean here? I assume it's the KafkaRaftClient? And "attempts to create a snapshot writer", I assume this refers to `log.createSnapshot(snapshotId)`? Sorry, by state machine, I mean users of `interface RaftClient`. This basically means snapshots created through `SnapshotWriter`. In general there are two ways of creating a snapshot. One is by the state machine through `RaftClient::createSnapshot` and `SnapshotWriter`. Another way is by the `KafkaRaftClient` itself downloading the snapshot from the quorum leader. In the second case we want to trust the leader's snapshot and not perform the validation described in this issue. 2) "The end offset and epoch of the snapshot is less than the high-watermark", does the "high-watermark" refer to the leader's highwatermark or the follower's highwatermark? If it is the former, shouldn't it be the leader's responsibility to satisfy this ? If it's the latter, then I think the snapshotId can actually be larger than itself's highwatermark, say the follower has been lagged too much, and its highwatermark == its logEndOffset, which is smaller than the leader's logStartOffset, in this case, the follower's highwatermark will be updated to the snapshotId's endOffset when the snapshot fetching has completed, did I miss anything? See my answer to 1) but in this issue we are only concern with snapshot created locally by either the leader or the follower. Note that both the leader and the followers are responsible for creating snapshot based on the state of the local log. Having said that, high watermark means the local high watermark this is the high watermark reported by the quorum state object. 3) "validation should not be performed when the raft client creates the snapshot writer ", if my assumption in Question 1) is correct, then this seems to be in conflict with 1) The KafkaRaftClient can download a snapshot from the leader when it is too far behind. In this case, those snapshots don't need to get validated against the local quorum state and the local log. When KafkaRaftClient downloads snapshots from the leader the snapshotId will always be greater than the local LEO (and high-watermark). Instead the KafkaRaftClient will write the snapshot to local disk, fully truncate the local log and update the high watermark accordingly. > Validate the snapshot id when the state machine creates a snapshot > ------------------------------------------------------------------ > > Key: KAFKA-10800 > URL: https://issues.apache.org/jira/browse/KAFKA-10800 > Project: Kafka > Issue Type: Sub-task > Components: replication > Reporter: Jose Armando Garcia Sancio > Assignee: Haoran Xuan > Priority: Major > > When the state machine attempts to create a snapshot writer we should > validate that the following is true: > # The end offset and epoch of the snapshot is less than the high-watermark. > # The end offset and epoch of the snapshot is valid based on the leader > epoch cache. > Note that this validation should not be performed when the raft client > creates the snapshot writer because in that case the local log is out of date > and the follower should trust the snapshot id sent by the partition leader. -- This message was sent by Atlassian Jira (v8.3.4#803005)