[
https://issues.apache.org/jira/browse/KAFKA-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson resolved KAFKA-7164.
------------------------------------
Resolution: Fixed
Fix Version/s: 2.1.0
2.0.1
1.1.2
> Follower should truncate after every leader epoch change
> --------------------------------------------------------
>
> Key: KAFKA-7164
> URL: https://issues.apache.org/jira/browse/KAFKA-7164
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Gustafson
> Assignee: Bob Barrett
> Priority: Major
> Fix For: 1.1.2, 2.0.1, 2.1.0
>
>
> Currently we skip log truncation for followers if a LeaderAndIsr request is
> received, but the leader does not change. This can lead to log divergence if
> the follower missed a leader change before the current known leader was
> reelected. Basically the problem is that the leader may truncate its own log
> prior to becoming leader again, so the follower would need to reconcile its
> log again.
> For example, suppose that we have three replicas: r1, r2, and r3. Initially,
> r1 is the leader in epoch 0 and writes one record at offset 0. r3 replicates
> this successfully.
> {code}
> r1:
> status: leader
> epoch: 0
> log: [{id: 0, offset: 0, epoch:0}]
> r2:
> status: follower
> epoch: 0
> log: []
> r3:
> status: follower
> epoch: 0
> log: [{id: 0, offset: 0, epoch:0}]
> {code}
> Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change
> and truncates, but r3 for whatever reason, does not.
> {code}
> r1:
> status: follower
> epoch: 1
> log: []
> r2:
> status: leader
> epoch: 1
> log: []
> r3:
> status: follower
> epoch: 0
> log: [{offset: 0, epoch:0}]
> {code}
> Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately
> it writes a new record:
> {code}
> r1:
> status: leader
> epoch: 2
> log: [{id: 1, offset: 0, epoch:2}]
> r2:
> status: follower
> epoch: 2
> log: []
> r3:
> status: follower
> epoch: 0
> log: [{id: 0, offset: 0, epoch:0}]
> {code}
> If the replica continues fetching with the old epoch, we can have log
> divergence as noted in KAFKA-6880. However, if r3 successfully receives the
> new LeaderAndIsr request which updates the epoch to 2, but skips the
> truncation, then the logs will stay inconsistent.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)