[ https://issues.apache.org/jira/browse/KAFKA-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860637#comment-17860637 ]
Laurenceau Julien commented on KAFKA-13574: ------------------------------------------- We are 2 years later this bug report that assess exactly once processing is broken. I see some ideas, but I see no beginning of a solution. Is there any workaround or fix ? Do you guys think that this is not important ? Maybe some warning notice should be added on the documentation, because people choosing to pay the price of exactly-once generally care a lot about consistency ! > NotLeaderOrFollowerException thrown for a successful send > --------------------------------------------------------- > > Key: KAFKA-13574 > URL: https://issues.apache.org/jira/browse/KAFKA-13574 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 3.0.0 > Environment: openjdk version "11.0.13" 2021-10-19 > Reporter: Kyle Kingsbury > Priority: Minor > Labels: error-handling > > With org.apache.kafka/kafka-clients 3.0.0, under rare circumstances involving > multiple node and network failures, I've observed a call to `producer.send()` > throw `NotLeaderOrFollowerException` for a message which later appears in > `consumer.poll()` return values. > I don't have a reliable repro case for this yet, but the case I hit involved > retries=1000, acks=all, and idempotence enabled. I suspect what might be > happening here is that an initial attempt to send the message makes it to the > server and is committed, but the acknowledgement is lost e.g. due to timeout; > the Kafka producer then automatically retries the send attempt, and on that > retry hits a NotLeaderOrFollowerException, which is thrown back to the > caller. If we interpret NotLeaderOrFollowerException as a definite failure, > then this would constitute an aborted read. > I've seen issues like this in a number of databases around client or > server-internal retry mechanisms, and I think the thing to do is: rather than > throwing the most *recent* error, throw the {*}most indefinite{*}. That way > clients know that their request may have actually succeeded, and they won't > (e.g.) attempt to re-submit a non-idempotent request again. > As a side note: is there... perhaps documentation on which errors in Kafka > are supposed to be definite vs indefinite? NotLeaderOrFollowerException is a > subclass of RetriableException, but it looks like RetriableException is more > about transient vs permanent errors than whether it's safe to retry. -- This message was sent by Atlassian Jira (v8.20.10#820010)