[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329364#comment-17329364
 ] 

Jose Armando Garcia Sancio commented on KAFKA-10800:
----------------------------------------------------

1) What does "the state machine" mean here?  I assume it's the KafkaRaftClient? 
And "attempts to create a snapshot writer", I assume this refers to 
`log.createSnapshot(snapshotId)`?

Sorry, by state machine, I mean users of `interface RaftClient`. This basically 
means snapshots created through `SnapshotWriter`.

In general there are two ways of creating a snapshot. One is by the state 
machine through `RaftClient::createSnapshot` and `SnapshotWriter`. Another way 
is by the `KafkaRaftClient` itself downloading the snapshot from the quorum 
leader. In the second case we want to trust the leader's snapshot and not 
perform the validation described in this issue.

2) "The end offset and epoch of the snapshot is less than the high-watermark", 
does the "high-watermark" refer to the leader's highwatermark or the follower's 
highwatermark? If it is the former, shouldn't it be the leader's responsibility 
to satisfy this ? If it's the latter, then I think the snapshotId can actually 
be larger than itself's highwatermark, say the follower has been lagged too 
much, and its highwatermark == its logEndOffset, which is smaller than the 
leader's logStartOffset, in this case, the follower's highwatermark will be 
updated to the snapshotId's endOffset when the snapshot fetching has completed, 
did I miss anything?

See my answer to 1) but in this issue we are only concern with snapshot created 
locally by either the leader or the follower. Note that both the leader and the 
followers are responsible for creating snapshot based on the state of the local 
log. Having said that, high watermark means the local high watermark this is 
the high watermark reported by the quorum state object.

3) "validation should not be performed when the raft client creates the 
snapshot writer ", if my assumption in Question 1) is correct, then this seems 
to be in conflict with 1)

The KafkaRaftClient can download a snapshot from the leader when it is too far 
behind. In this case, those snapshots don't need to get validated against the 
local quorum  state and the local log. When KafkaRaftClient downloads snapshots 
from the leader the snapshotId will always be greater than the local LEO (and 
high-watermark). Instead the KafkaRaftClient will write the snapshot to local 
disk, fully truncate the local log and update the high watermark accordingly.

 

> Validate the snapshot id when the state machine creates a snapshot
> ------------------------------------------------------------------
>
>                 Key: KAFKA-10800
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10800
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: replication
>            Reporter: Jose Armando Garcia Sancio
>            Assignee: Haoran Xuan
>            Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to