[jira] [Commented] (KAFKA-13574) NotLeaderOrFollowerException thrown for a successful send

Laurenceau Julien (Jira) Thu, 27 Jun 2024 13:08:15 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860637#comment-17860637
 ]


Laurenceau Julien commented on KAFKA-13574:
-------------------------------------------

We are 2 years later this bug report that assess exactly once processing is 
broken.

I see some ideas, but I see no beginning of a solution. Is there any workaround 
or fix ? 

Do you guys think that this is not important ?

Maybe some warning notice should be added on the documentation, because people 
choosing to pay the price of exactly-once generally care a lot about 
consistency !

> NotLeaderOrFollowerException thrown for a successful send
> ---------------------------------------------------------
>
>                 Key: KAFKA-13574
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13574
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 3.0.0
>         Environment: openjdk version "11.0.13" 2021-10-19
>            Reporter: Kyle Kingsbury
>            Priority: Minor
>              Labels: error-handling
>
> With org.apache.kafka/kafka-clients 3.0.0, under rare circumstances involving 
> multiple node and network failures, I've observed a call to `producer.send()` 
> throw `NotLeaderOrFollowerException` for a message which later appears in 
> `consumer.poll()` return values.
> I don't have a reliable repro case for this yet, but the case I hit involved 
> retries=1000, acks=all, and idempotence enabled. I suspect what might be 
> happening here is that an initial attempt to send the message makes it to the 
> server and is committed, but the acknowledgement is lost e.g. due to timeout; 
> the Kafka producer then automatically retries the send attempt, and on that 
> retry hits a NotLeaderOrFollowerException, which is thrown back to the 
> caller. If we interpret NotLeaderOrFollowerException as a definite failure, 
> then this would constitute an aborted read.
> I've seen issues like this in a number of databases around client or 
> server-internal retry mechanisms, and I think the thing to do is: rather than 
> throwing the most *recent* error, throw the {*}most indefinite{*}. That way 
> clients know that their request may have actually succeeded, and they won't 
> (e.g.) attempt to re-submit a non-idempotent request again.
> As a side note: is there... perhaps documentation on which errors in Kafka 
> are supposed to be definite vs indefinite? NotLeaderOrFollowerException is a 
> subclass of RetriableException, but it looks like RetriableException is more 
> about transient vs permanent errors than whether it's safe to retry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-13574) NotLeaderOrFollowerException thrown for a successful send

Reply via email to