[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-25 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331498#comment-17331498
 ] 

Haoran Xuan commented on KAFKA-10800:
-

[~jagsancio] The PR [GitHub Pull Request 
#10593|https://github.com/apache/kafka/pull/10593] is ready for review now, 
thank you!

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-24 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331232#comment-17331232
 ] 

Haoran Xuan commented on KAFKA-10800:
-

I got you~

And yeah, I was also thinking that we might need to make some change to the 
BatchAccumulator to make this happen. And since this is independent of the 
Jira, I think we can cover it in another separate task and I'm happy to take 
it. 

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-24 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331194#comment-17331194
 ] 

Jose Armando Garcia Sancio commented on KAFKA-10800:


{quote}I think it's easy to understand 3, but I'm curious why would we need to 
do 1 and 2? I guess there should be some benefit or restriction which I'm not 
realized? Thanks!{quote}

While operating this feature it may be possible that users may have the content 
of the snapshot on disk but not the original file name. In that case we may 
want to identify the snapshot id for that file. We can do that if we have the 
`endOffset - 1` as the baseOffset of every batch and the `epoch` on every 
batch. Actually, this may not be easy to do with the current code since we use 
`BatchAccumulator`. I am okay doing this as part of another PR.

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-23 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331149#comment-17331149
 ] 

Haoran Xuan commented on KAFKA-10800:
-

[~jagsancio]  Thanks for the clarification, now I'm clear about the validation 
part. :)

About these:
{code:java}
1. Set the baseOffset to the snapshotId's endOffset - 1 for every record batch.
2. Set the relative offset for every record to 0.
3. Se the epoch of every batch to the snapshotId's epoch.{code}
I think it's easy to understand 3, but I'm curious why would we need to do 1 
and 2? I guess there should be some benefit or restriction which I'm not 
realized? Thanks!

 

 

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-22 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329364#comment-17329364
 ] 

Jose Armando Garcia Sancio commented on KAFKA-10800:


1) What does "the state machine" mean here?  I assume it's the KafkaRaftClient? 
And "attempts to create a snapshot writer", I assume this refers to 
`log.createSnapshot(snapshotId)`?

Sorry, by state machine, I mean users of `interface RaftClient`. This basically 
means snapshots created through `SnapshotWriter`.

In general there are two ways of creating a snapshot. One is by the state 
machine through `RaftClient::createSnapshot` and `SnapshotWriter`. Another way 
is by the `KafkaRaftClient` itself downloading the snapshot from the quorum 
leader. In the second case we want to trust the leader's snapshot and not 
perform the validation described in this issue.

2) "The end offset and epoch of the snapshot is less than the high-watermark", 
does the "high-watermark" refer to the leader's highwatermark or the follower's 
highwatermark? If it is the former, shouldn't it be the leader's responsibility 
to satisfy this ? If it's the latter, then I think the snapshotId can actually 
be larger than itself's highwatermark, say the follower has been lagged too 
much, and its highwatermark == its logEndOffset, which is smaller than the 
leader's logStartOffset, in this case, the follower's highwatermark will be 
updated to the snapshotId's endOffset when the snapshot fetching has completed, 
did I miss anything?

See my answer to 1) but in this issue we are only concern with snapshot created 
locally by either the leader or the follower. Note that both the leader and the 
followers are responsible for creating snapshot based on the state of the local 
log. Having said that, high watermark means the local high watermark this is 
the high watermark reported by the quorum state object.

3) "validation should not be performed when the raft client creates the 
snapshot writer ", if my assumption in Question 1) is correct, then this seems 
to be in conflict with 1)

The KafkaRaftClient can download a snapshot from the leader when it is too far 
behind. In this case, those snapshots don't need to get validated against the 
local quorum  state and the local log. When KafkaRaftClient downloads snapshots 
from the leader the snapshotId will always be greater than the local LEO (and 
high-watermark). Instead the KafkaRaftClient will write the snapshot to local 
disk, fully truncate the local log and update the high watermark accordingly.

 

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-21 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326611#comment-17326611
 ] 

Haoran Xuan commented on KAFKA-10800:
-

Hi, [~jagsancio], I just started working on this task, and want to make sure I 
have the correct understanding before going too far.

-

"When the state machine attempts to create a snapshot writer we should validate 
that the following is true:
 # The end offset and epoch of the snapshot is less than the high-watermark.
 # The end offset and epoch of the snapshot is valid based on the leader epoch 
cache.

Note that this validation should not be performed when the raft client creates 
the snapshot writer because in that case the local log is out of date and the 
follower should trust the snapshot id sent by the partition leader."



Questions are:

1) What does "the state machine" mean here?  I assume it's the KafkaRaftClient? 
And "attempts to create a snapshot writer", I assume this refers to 
`log.createSnapshot(snapshotId)`?

2) "The end offset and epoch of the snapshot is less than the high-watermark", 
does the "high-watermark" refer to the leader's highwatermark or the follower's 
highwatermark? If it is the former, shouldn't it be the leader's responsibility 
to satisfy this ? If it's the latter, then I think the snapshotId can actually 
be larger than itself's highwatermark, say the follower has been lagged too 
much, and its highwatermark == its logEndOffset, which is smaller than the 
leader's logStartOffset, in this case, the follower's highwatermark will be 
updated to the snapshotId's endOffset when the snapshot fetching has completed, 
did I miss anything?

3) "validation should not be performed when the raft client creates the 
snapshot writer ", if my assumption in Question 1) is correct, then this seems 
to be in conflict with 1)

 

Thanks!

 

 

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-19 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325423#comment-17325423
 ] 

Haoran Xuan commented on KAFKA-10800:
-

[~jagsancio] Thanks so much for the additional information, I will make sure to 
take care of these points you mentioned. 

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Assignee: Haoran Xuan
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-19 Thread Jose Armando Garcia Sancio (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325140#comment-17325140
 ] 

Jose Armando Garcia Sancio commented on KAFKA-10800:


Yes. Please take it. Can you also look into the following when the snapshot 
gets created using {{SnaphsotWriter}}:
1. Set the {{baseOffset}} to the snapshotId's {{endOffset - 1}} for every 
record batch.
2. Set the relative offset for every record to 0.
3. Se the epoch of every batch to the snapshotId's epoch.

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10800) Validate the snapshot id when the state machine creates a snapshot

2021-04-18 Thread Haoran Xuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324536#comment-17324536
 ] 

Haoran Xuan commented on KAFKA-10800:
-

Hi, [~jagsancio], seems that the dependencies have been done, can I take this 
JIRA? Thanks!

> Validate the snapshot id when the state machine creates a snapshot
> --
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
>  Issue Type: Sub-task
>  Components: replication
>Reporter: Jose Armando Garcia Sancio
>Priority: Major
>
> When the state machine attempts to create a snapshot writer we should 
> validate that the following is true:
>  # The end offset and epoch of the snapshot is less than the high-watermark.
>  # The end offset and epoch of the snapshot is valid based on the leader 
> epoch cache.
> Note that this validation should not be performed when the raft client 
> creates the snapshot writer because in that case the local log is out of date 
> and the follower should trust the snapshot id sent by the partition leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)