[ 
https://issues.apache.org/jira/browse/KAFKA-20554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Bergner updated KAFKA-20554:
-----------------------------------
    Description: 
With acks=1, Kafka can lose producer-acknowledged records during planned leader 
transitions, not only during hard leader failure.

I reproduced this on Kafka 4.1.2. In the affected cases, the producer received 
successful acknowledgements, but some acknowledged records were not later 
readable from the topic when consuming from the beginning. Running the same 
harness with acks=all produced no record loss.

This appears to be separate from 
[KAFKA-19148|https://issues.apache.org/jira/browse/KAFKA-19148]. The new leader 
is not necessarily unclean in Kafka's ISR sense. The issue is that acks=1 
allows the current leader to acknowledge records after writing them only to its 
local log. If those records have not reached the high watermark, a later clean 
leader election can still choose an ISR replica that does not contain that 
local suffix. Once leadership moves, the demoted broker follows the new leader 
epoch and truncates the divergent tail.

In other words, unclean.leader.election.enable=false does not protect acks=1 
acknowledgements, and min.insync.replicas only affects producer success 
semantics when the producer uses acks=all.

The purpose of this ticket is to clarify the expected behavior and discuss the 
appropriate project follow-up. Possible outcomes may include documenting this 
more explicitly as expected acks=1 behavior during planned leader transitions, 
or considering whether planned leader-transition semantics should be changed 
separately.

  was:
With acks=1, Kafka can lose producer-acknowledged records during planned leader 
transitions, not only during hard leader failure.

I reproduced this on Kafka 4.1.2. In the affected cases, the producer received 
successful acknowledgements, but some acknowledged records were not later 
readable from the topic when consuming from the beginning. Running the same 
harness with acks=all produced no record loss.

This appears to be separate from [#KAFKA-19148]. The new leader is not 
necessarily unclean in Kafka's ISR sense. The issue is that acks=1 allows the 
current leader to acknowledge records after writing them only to its local log. 
If those records have not reached the high watermark, a later clean leader 
election can still choose an ISR replica that does not contain that local 
suffix. Once leadership moves, the demoted broker follows the new leader epoch 
and truncates the divergent tail.

In other words, unclean.leader.election.enable=false does not protect acks=1 
acknowledgements, and min.insync.replicas only affects producer success 
semantics when the producer uses acks=all.

The purpose of this ticket is to clarify the expected behavior and discuss the 
appropriate project follow-up. Possible outcomes may include documenting this 
more explicitly as expected acks=1 behavior during planned leader transitions, 
or considering whether planned leader-transition semantics should be changed 
separately.


> Document or clarify acks=1 durability during planned leader transitions
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-20554
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20554
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, documentation, replication
>            Reporter: Julian Bergner
>            Assignee: Julian Bergner
>            Priority: Major
>
> With acks=1, Kafka can lose producer-acknowledged records during planned 
> leader transitions, not only during hard leader failure.
> I reproduced this on Kafka 4.1.2. In the affected cases, the producer 
> received successful acknowledgements, but some acknowledged records were not 
> later readable from the topic when consuming from the beginning. Running the 
> same harness with acks=all produced no record loss.
> This appears to be separate from 
> [KAFKA-19148|https://issues.apache.org/jira/browse/KAFKA-19148]. The new 
> leader is not necessarily unclean in Kafka's ISR sense. The issue is that 
> acks=1 allows the current leader to acknowledge records after writing them 
> only to its local log. If those records have not reached the high watermark, 
> a later clean leader election can still choose an ISR replica that does not 
> contain that local suffix. Once leadership moves, the demoted broker follows 
> the new leader epoch and truncates the divergent tail.
> In other words, unclean.leader.election.enable=false does not protect acks=1 
> acknowledgements, and min.insync.replicas only affects producer success 
> semantics when the producer uses acks=all.
> The purpose of this ticket is to clarify the expected behavior and discuss 
> the appropriate project follow-up. Possible outcomes may include documenting 
> this more explicitly as expected acks=1 behavior during planned leader 
> transitions, or considering whether planned leader-transition semantics 
> should be changed separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to