[
https://issues.apache.org/jira/browse/KAFKA-14704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Jacot resolved KAFKA-14704.
---------------------------------
Fix Version/s: 3.5.0
3.4.1
3.3.3
Reviewer: Jason Gustafson
Resolution: Fixed
> Follower should truncate before incrementing high watermark
> -----------------------------------------------------------
>
> Key: KAFKA-14704
> URL: https://issues.apache.org/jira/browse/KAFKA-14704
> Project: Kafka
> Issue Type: Bug
> Reporter: David Jacot
> Assignee: David Jacot
> Priority: Major
> Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> When a leader becomes a follower, it is likely that it has uncommitted
> records in its log. When it reaches out to the leader, the leader will detect
> that they have diverged and it will return the diverging epoch and offset.
> The follower truncates it log based on this.
> There is a small caveat in this process. When the leader return the diverging
> epoch and offset, it also includes its high watermark, low watermark, start
> offset and end offset. The current code in the `AbstractFetcherThread` works
> as follow. First it process the partition data and then it checks whether
> there is a diverging epoch/offset. The former may accidentally expose
> uncommitted records as this step updates the local watermark to whatever is
> received from the leader. As the follower, or the former leader, may have
> uncommitted records, it will be able to updated the high watermark to a
> larger offset if the leader has a higher watermark than the current local
> one. This result in exposing uncommitted records until the log is finally
> truncated. The time window is short but a fetch requests coming at the right
> time to the follower could read those records. This is especially true for
> clients out there which uses recent versions of the fetch request but without
> implementing KIP-320.
> When this happens, the follower logs the following message: `Non-monotonic
> update of high watermark from (offset=21437 segment=[20998:98390]) to
> (offset=21434 segment=[20998:97843])`.
> This patch proposes to mitigate the issue by starting by checking on whether
> a diverging epoch/offset is provided by the leader and skip processing the
> partition data if it is. This basically means that the first fetch request
> will result in truncating the log and a subsequent fetch request will update
> the log/high watermarks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)