[
https://issues.apache.org/jira/browse/KAFKA-20554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Bergner updated KAFKA-20554:
-----------------------------------
Description:
With acks=1, Kafka can lose producer-acknowledged records during planned leader
transitions, not only during hard leader failure.
I reproduced this on Kafka 4.1.2. In the affected cases, the producer received
successful acknowledgements, but some acknowledged records were not later
readable from the topic when consuming from the beginning. Running the same
harness with acks=all produced no record loss.
This appears to be separate from
[KAFKA-19148|https://issues.apache.org/jira/browse/KAFKA-19148]. The new leader
is not necessarily unclean in Kafka's ISR sense. The issue is that acks=1
allows the current leader to acknowledge records after writing them only to its
local log. If those records have not reached the high watermark, a later clean
leader election can still choose an ISR replica that does not contain that
local suffix. Once leadership moves, the demoted broker follows the new leader
epoch and truncates the divergent tail.
In other words, unclean.leader.election.enable=false does not protect acks=1
acknowledgements, and min.insync.replicas only affects producer success
semantics when the producer uses acks=all.
The purpose of this ticket is to clarify the expected behavior and discuss the
appropriate project follow-up. Possible outcomes may include documenting this
more explicitly as expected acks=1 behavior during planned leader transitions,
or considering whether planned leader-transition semantics should be changed
separately.
was:
With acks=1, Kafka can lose producer-acknowledged records during planned leader
transitions, not only during hard leader failure.
I reproduced this on Kafka 4.1.2. In the affected cases, the producer received
successful acknowledgements, but some acknowledged records were not later
readable from the topic when consuming from the beginning. Running the same
harness with acks=all produced no record loss.
This appears to be separate from [#KAFKA-19148]. The new leader is not
necessarily unclean in Kafka's ISR sense. The issue is that acks=1 allows the
current leader to acknowledge records after writing them only to its local log.
If those records have not reached the high watermark, a later clean leader
election can still choose an ISR replica that does not contain that local
suffix. Once leadership moves, the demoted broker follows the new leader epoch
and truncates the divergent tail.
In other words, unclean.leader.election.enable=false does not protect acks=1
acknowledgements, and min.insync.replicas only affects producer success
semantics when the producer uses acks=all.
The purpose of this ticket is to clarify the expected behavior and discuss the
appropriate project follow-up. Possible outcomes may include documenting this
more explicitly as expected acks=1 behavior during planned leader transitions,
or considering whether planned leader-transition semantics should be changed
separately.
> Document or clarify acks=1 durability during planned leader transitions
> -----------------------------------------------------------------------
>
> Key: KAFKA-20554
> URL: https://issues.apache.org/jira/browse/KAFKA-20554
> Project: Kafka
> Issue Type: Bug
> Components: core, documentation, replication
> Reporter: Julian Bergner
> Assignee: Julian Bergner
> Priority: Major
>
> With acks=1, Kafka can lose producer-acknowledged records during planned
> leader transitions, not only during hard leader failure.
> I reproduced this on Kafka 4.1.2. In the affected cases, the producer
> received successful acknowledgements, but some acknowledged records were not
> later readable from the topic when consuming from the beginning. Running the
> same harness with acks=all produced no record loss.
> This appears to be separate from
> [KAFKA-19148|https://issues.apache.org/jira/browse/KAFKA-19148]. The new
> leader is not necessarily unclean in Kafka's ISR sense. The issue is that
> acks=1 allows the current leader to acknowledge records after writing them
> only to its local log. If those records have not reached the high watermark,
> a later clean leader election can still choose an ISR replica that does not
> contain that local suffix. Once leadership moves, the demoted broker follows
> the new leader epoch and truncates the divergent tail.
> In other words, unclean.leader.election.enable=false does not protect acks=1
> acknowledgements, and min.insync.replicas only affects producer success
> semantics when the producer uses acks=all.
> The purpose of this ticket is to clarify the expected behavior and discuss
> the appropriate project follow-up. Possible outcomes may include documenting
> this more explicitly as expected acks=1 behavior during planned leader
> transitions, or considering whether planned leader-transition semantics
> should be changed separately.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)