Hey Greg, thanks for the feedback.
I can understand your concern about atomicity, but what does a user do after a transaction is failed due to a `too-large-record `exception? They will submit the same batch without the problematic record again. What we are providing is actually a better/more convenient way of doing that. Regarding your solution to solve the issue application-side: I am a bit hesitant to keep all sent records in memory since I think buffering records twice (both in Streams and Producer) would not be an efficient solution. Cheers, Alieh On Fri, Jul 12, 2024 at 10:07 PM Greg Harris <greg.har...@aiven.io.invalid> wrote: > Hi all, > > Alieh, thanks for the KIP! And everyone else, thanks for the robust > discussion. > > I understand that there are situations in which users desire that the > pipeline "just keep working" and skip errors. However, I question whether > it is appropriate to support/encourage this behavior via inclusion in the > Producer API. > This feature is essentially a "non-atomic transaction", as it allows > commits in which not all records passed to send() ultimately get committed. > As atomicity is one of the most important semantics associated with > transactions, I question whether there are users other than Streams that > would choose non-atomic transactions over a traditional/idempotent > producer. > Some cursory research shows that non-atomic transactions may be present in > other databases, but is actively discouraged due to the complexity they add > to error-handling. [1] > > I'd like to invoke the End-to-End Arguments in System Design [2] here, and > recommend that this behavior may be present in Streams, but should not be > in the Producer. > 1. Dropping records that cause errors is already expressible via the > current Producer API. You can store the records in-memory after calling > send(), wait for a successful no-error flush() before calling > commitTransaction() and allowing the record to be garbage collected. If > errors occur, abortTransaction() and re-submit the records. > 2. Implementing this inside the Producer API is complex and difficult to > holistically define in a way that we won't regret or need to change later. > I think some of the disagreement in this thread originates from this, and I > don't find the proposed API satisfactory. > 3. The performance improvement of including this change in the lower level > needs to be quantified in order to be a justification, and I don't see any > analysis about this. > > I imagine that the alternative implementation I suggested in (1) would also > enable more expressive error handlers in Streams, if such a thing was > desired. Keeping the record around until after the transaction is committed > would enable a DLQ or passing the erroneous record to the error handler. > > I think that the current pattern of the application being responsible for > providing good data to the producer is very reasonable; Having the producer > responsible for implementing the application's error handling of bad data > is not something I can support. > > Thanks, > Greg > > [1] https://www.sommarskog.se/error_handling/Part1.html > [2] https://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf > > On Fri, Jul 12, 2024 at 8:52 AM Justine Olshan > <jols...@confluent.io.invalid> > wrote: > > > Can we update the KIP to clearly document these decisions? > > > > Thanks, > > > > Justine > > > > On Tue, Jul 9, 2024 at 9:25 AM Andrew Schofield < > andrew_schofi...@live.com > > > > > wrote: > > > > > Hi Chris, > > > As it stands, the error handling for transactions in KafkaProducer is > not > > > ideal. There’s no reason why a failed operation should fail a > transaction > > > provided that the application can tell that the operation was not > > included > > > in the transaction and then make its own decision whether to continue > or > > > back out. So, I think I disagree with the original premise of a > > client-side > > > error state for a transaction, but we are where we are. > > > > > > When I voted, I did not expect the KIP to handle ALL errors which could > > > conceivably be handled. I did expect it to handle client-side send > errors > > > that would cause a record to be rejected from a batch before sending > to a > > > broker. I think that it does make the KafkaProducer interface very > > slightly > > > more complicated, but the new option is a clear improvement and I > > > don’t see anyone getting into a mess using it. > > > > > > I think broker-side errors are more tricky and I don’t think an > overload > > > on the send() method is going to do the job. I don’t see that as a > > problem > > > with the KIP, just that the underlying RPCs and behaviour is not very > > > amenable to record-specific error handling. The Produce RPC is a > > > complicated beast which can include a set of records for mutiple > > > topic-partitions. Although ProduceResponse v10 does include record > > > errors, I don’t believe this is surfaced in the client. Let’s imagine > > > something > > > like broker-side record validation which barfs on one record. Failing > an > > > entire batch is easier, but less useful if the problem is related to > one > > > record. > > > > > > In summary, I’m happy that my vote stands, and I am happy with the KIP > > > only supporting client-side record errors. > > > > > > Thanks, > > > Andrew > > > > > > > On 8 Jul 2024, at 16:37, Chris Egerton <chr...@aiven.io.INVALID> > > wrote: > > > > > > > > Hi Alieh, > > > > > > > > Can you clarify why broker-side errors shouldn't be covered? The only > > > real > > > > rationale I can come up with is that it's easier to implement. > > > > > > > > "Things were better for Kafka Streams before KAFKA-9279 was fixed" > > isn't > > > > very convincing, because Kafka Streams is not the only user of the > Java > > > > producer client. And for others, especially new users, I doubt that > > this > > > > new API we're proposing would make sense without having to consult a > > lot > > > of > > > > historical context. > > > > > > > > I also don't think that most users will know or even care about the > > > > distinction between errors that cause a record to fail before it's > > added > > > to > > > > a batch vs. after. If you were writing a producer application of your > > > own, > > > > and you wanted to handle RecordTooLargeException instances by > dropping > > a > > > > record without aborting a transaction, would you care about whether > it > > > was > > > > your client or your broker that balked? Would you be happy if you > wrote > > > > logic expecting that that problem was solved once and for all, only > to > > > > learn that it could still affect you in other circumstances? Or, > > > > alternatively, would you be happy if you wanted to solve that problem > > and > > > > found an API that seemed to do exactly what you wanted, but after > > reading > > > > the fine print, realized you'd have to do it yourself instead? > > > > > > > > Ultimately, the more I think about this, the more I believe that > we're > > > > adding noise to the API (with the new overloaded variant of send) > for a > > > > feature that will likely bring confusion and even frustration to > anyone > > > > besides maintainers of Kafka Streams who tries to use it. > > > > > > > > If the only concern about covering broker-side errors is that it > would > > be > > > > more difficult to implement, I believe we should strongly reconsider > > that > > > > alternative. That said, if there is a straightforward way to explain > > this > > > > feature to new users that won't mislead them or require them to do > > > research > > > > on producer internals, then I can still live with it. > > > > > > > > Regarding a list of recoverable vs. irrecoverable errors, this is > > > actually > > > > the subject of another recently-introduced KIP: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions > > > > > > > > Finally, I'd also like to ask the people who have already voted > > (Andrew, > > > > Matthias) if, at the time they voted, they believed that the API > would > > > > handle all errors, or only the subset of errors that would cause a > > record > > > > to be rejected from a batch before it can be sent to a broker. > > > > > > > > Best, > > > > > > > > Chris > > > > > > > > On Thu, Jul 4, 2024 at 12:43 PM Alieh Saeedi > > > <asae...@confluent.io.invalid> > > > > wrote: > > > > > > > >> Salut from the KIP’s author > > > >> > > > >> > > > >> Clarifying two points: > > > >> > > > >> > > > >> 1) broker side errors: > > > >> > > > >> As far as I remember we are not going to cover the errors > originating > > > from > > > >> the broker! > > > >> > > > >> A historical fact: One of the debate points in KIP-1038 was that by > > > >> defining a producer custom handler, the user may assume that > > broker-side > > > >> errors must be covered as well. They may define a handler for > handling > > > >> `RecordTooLargeException` and still see such errors not being > handled > > as > > > >> they wish. > > > >> > > > >> > > > >> 2) Regarding irrecoverable/recoverable errors: > > > >> > > > >> Before the fix of `KAFKA-9279`, errors such as > > > `RecordTooLargeException` > > > >> or errors related to missing meta data (both originating from > Producer > > > >> `send()`) were considered as recoverable but after that they turned > > into > > > >> being irrecoverable without changing any Javadocs or having any KIP. > > > All > > > >> the effort made in this KIP and the former one have been towards > > > returning > > > >> to the former state. > > > >> > > > >> > > > >> I am sure that it is clear for you that which sort of errors we are > > > going > > > >> to cover: A single record may happen to NOT get added to the batch > due > > > to > > > >> the issues with the record or its corresponding topic. The point was > > > that > > > >> if the record is not added to the batch let ’s don’t fail the whole > > > batch > > > >> because of that non-existing record. We never intended to do sth in > > > broker > > > >> side or ignore more important errors. But I agree with you Chris. > If > > we > > > >> are adding a new API, we must have good documentation for that. The > > > >> sentence `all irrecoverable transactional errors will still be > fatal` > > as > > > >> you suggested is good. What do you think? I am totally against > > > enumerating > > > >> errors in Javadocs since these sort of errors can be changing during > > > >> time. More > > > >> over, have you ever seen any list of recoverable or irrecoverable > > errors > > > >> somewhere so far? > > > >> > > > >> > > > >> Bests, > > > >> > > > >> Alieh > > > >> > > > >> On Wed, Jul 3, 2024 at 6:07 PM Chris Egerton > <chr...@aiven.io.invalid > > > > > > >> wrote: > > > >> > > > >>> Hi Justine, > > > >>> > > > >>> I agree that enumerating a list of errors that should be covered by > > the > > > >> KIP > > > >>> is difficult; I was thinking it might be easier if we list the > errors > > > >> that > > > >>> should _not_ be covered by the KIP, and only if we can't define a > > > >>> reasonable heuristic that would cover them without having to > > explicitly > > > >>> list them. Could it be enough to say "all irrecoverable > transactional > > > >>> errors will still be fatal", or even just "all transactional errors > > (as > > > >>> opposed to errors related to this specific record) will still be > > > fatal"? > > > >>> > > > >>> Cheers, > > > >>> > > > >>> Chris > > > >>> > > > >>> On Wed, Jul 3, 2024 at 11:56 AM Justine Olshan > > > >>> <jols...@confluent.io.invalid> > > > >>> wrote: > > > >>> > > > >>>> Hey Chris, > > > >>>> > > > >>>> I think what you say makes sense. I agree that defining the > behavior > > > >>> based > > > >>>> on code that can possibly change is not a good idea, and I was > > trying > > > >> to > > > >>>> get a clearer definition from the KIP's author :) > > > >>>> > > > >>>> I think it can always be hard to ensure that only specific errors > > are > > > >>>> handled unless they are explicitly enumerated in code as the code > > can > > > >>>> change and can be changed by folks who are not aware of this KIP > or > > > >>>> conversation. > > > >>>> I personally don't have the bandwidth to do this > > > definition/enumeration > > > >>> of > > > >>>> errors, so hopefully Alieh can expand upon this. > > > >>>> > > > >>>> Justine > > > >>>> > > > >>>> On Wed, Jul 3, 2024 at 8:28 AM Chris Egerton > > <chr...@aiven.io.invalid > > > > > > > >>>> wrote: > > > >>>> > > > >>>>> Hi Alieh, > > > >>>>> > > > >>>>> I don't love defining the changes for this KIP in terms of a > catch > > > >>> clause > > > >>>>> in the KafkaProducer class, for two reasons. First, the set of > > errors > > > >>>> that > > > >>>>> are handled by that clause may shift over time as the code base > is > > > >>>>> modified, and second, it would be fairly opaque to users who want > > to > > > >>>>> understand whether an error would be affected by using this API > or > > > >> not. > > > >>>>> > > > >>>>> It also seems strange that we'd handle some types of > > > >>>>> RecordTooLargeException (i.e., ones reported client-side) with > this > > > >>> API, > > > >>>>> but not others (i.e., ones reported by a broker). > > > >>>>> > > > >>>>> I think this kind of API would be most powerful, most intuitive > to > > > >>> users, > > > >>>>> and easiest to document if we expanded the scope to all > > > >>>> record-send-related > > > >>>>> errors, except anything indicating issues with exactly-once > > > >> semantics. > > > >>>> That > > > >>>>> would include records that are too large (when caught both > client- > > > >> and > > > >>>>> server-side), records that can't be sent due to authorization > > > >> failures, > > > >>>>> records sent to nonexistent topics/topic partitions, and keyless > > > >>> records > > > >>>>> sent to compacted topics. It would not include > > > >>>>> ProducerFencedException, InvalidProducerEpochException, > > > >>>>> UnsupportedVersionException, > > > >>>>> and possibly others. > > > >>>>> > > > >>>>> @Justine -- do you think it would be possible to develop either a > > > >>> better > > > >>>>> definition for the kinds of "excluded" errors that should not be > > > >>> covered > > > >>>> by > > > >>>>> this API, or, barring that, a comprehensive list of exact error > > > >> types? > > > >>>> And > > > >>>>> do you think this would be acceptable in terms of risk and > > > >> complexity? > > > >>>>> > > > >>>>> Cheers, > > > >>>>> > > > >>>>> Chris > > > >>>>> > > > >>>>> On Tue, Jul 2, 2024 at 5:05 PM Alieh Saeedi > > > >>> <asae...@confluent.io.invalid > > > >>>>> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hey Justine, > > > >>>>>> > > > >>>>>> About the consequences: the consequences will be like when we > did > > > >> not > > > >>>>> have > > > >>>>>> the fix made in `KAFKA-9279`: silent loss of data! Obviously, > when > > > >>> the > > > >>>>> user > > > >>>>>> intentionally chose to ignore errors, that would not be silent > any > > > >>>> more. > > > >>>>>> Right? > > > >>>>>> Of course, considering all types of `ApiException`s would be too > > > >>> broad. > > > >>>>> But > > > >>>>>> are the exceptions caught in `catch(ApiException e)` of the > > > >>> `doSend()` > > > >>>>>> method also too broad? > > > >>>>>> > > > >>>>>> -Alieh > > > >>>>>> > > > >>>>>> On Tue, Jul 2, 2024 at 9:45 PM Justine Olshan > > > >>>>> <jols...@confluent.io.invalid > > > >>>>>>> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> Hey Alieh, > > > >>>>>>> > > > >>>>>>> If we want to allow any error to be ignored we should probably > > > >> run > > > >>>>>> through > > > >>>>>>> all the errors to make sure they make sense. > > > >>>>>>> I just want to feel confident that we aren't just making a > > > >> decision > > > >>>>>> without > > > >>>>>>> considering the consequences carefully. > > > >>>>>>> > > > >>>>>>> Justine > > > >>>>>>> > > > >>>>>>> On Tue, Jul 2, 2024 at 12:30 PM Alieh Saeedi > > > >>>>>> <asae...@confluent.io.invalid > > > >>>>>>>> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Hey Justine, > > > >>>>>>>> > > > >>>>>>>> yes we talked about `RecordTooLargeException` as an example, > > > >> but > > > >>>> did > > > >>>>> we > > > >>>>>>>> ever limit ourselves to only this specific exception? I think > > > >>>> neither > > > >>>>>> in > > > >>>>>>>> the KIP nor in the PR. As Chris mentioned, this KIP is going > > > >> to > > > >>>> undo > > > >>>>>>> what > > > >>>>>>>> we have done in `KAFKA-9279` in case 1) the user is in a > > > >>>> transaction > > > >>>>>> and > > > >>>>>>> 2) > > > >>>>>>>> he decides to ignore the errors in which the record was not > > > >> even > > > >>>>> added > > > >>>>>> to > > > >>>>>>>> the batch. Yes, and we suggested some methods for undoing or, > > > >> in > > > >>>>> fact, > > > >>>>>>>> moving back the transaction from the error state in `flush` or > > > >> in > > > >>>>>>>> `commitTnx` and we finally came to the idea of not even doing > > > >> the > > > >>>>>> changes > > > >>>>>>>> (better than undoing) in `send`. > > > >>>>>>>> > > > >>>>>>>> Bests, > > > >>>>>>>> Alieh > > > >>>>>>>> > > > >>>>>>>> On Tue, Jul 2, 2024 at 8:03 PM Justine Olshan > > > >>>>>>> <jols...@confluent.io.invalid > > > >>>>>>>>> > > > >>>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Hey folks, > > > >>>>>>>>> > > > >>>>>>>>> I understand where you are coming from by asking for specific > > > >>> use > > > >>>>>>> cases. > > > >>>>>>>> My > > > >>>>>>>>> understanding based on previous conversations was that there > > > >>>> were a > > > >>>>>> few > > > >>>>>>>>> different errors that have been seen. > > > >>>>>>>>> One example I heard some information about was when the > > > >> record > > > >>>> was > > > >>>>>> too > > > >>>>>>>>> large and it fails the batch. Besides that, I'm not really > > > >> sure > > > >>>> if > > > >>>>>>> there > > > >>>>>>>>> are cases in mind, though it is fair to ask on those and > > > >> bring > > > >>>> them > > > >>>>>> up. > > > >>>>>>>>> > > > >>>>>>>>>> Does a record qualify as a poison pill if it targets a > > > >> topic > > > >>>> that > > > >>>>>>>>> doesn't exist? Or if it targets a topic that the producer > > > >>>> principal > > > >>>>>>> lacks > > > >>>>>>>>> ACLs for? What if it fails broker-side validation (e.g., has > > > >> a > > > >>>> null > > > >>>>>> key > > > >>>>>>>> for > > > >>>>>>>>> a compacted topic)? > > > >>>>>>>>> > > > >>>>>>>>> I think there was some parallel work with addressing the > > > >>>>>>>>> UnknownTopicOrPartitionError in another way. As for the other > > > >>>>> checks, > > > >>>>>>>> acls, > > > >>>>>>>>> validation etc. I am not aware of that being in Alieh's > > > >> scope, > > > >>>> but > > > >>>>> we > > > >>>>>>>>> should be clear about exactly what we are doing. > > > >>>>>>>>> > > > >>>>>>>>> All errors that fall into ApiException seems too broad to me. > > > >>>>>>>>> > > > >>>>>>>>> Justine > > > >>>>>>>>> > > > >>>>>>>>> On Tue, Jul 2, 2024 at 10:51 AM Alieh Saeedi > > > >>>>>>>> <asae...@confluent.io.invalid > > > >>>>>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Hey Chris, > > > >>>>>>>>>> thanks for sharing your concerns. > > > >>>>>>>>>> > > > >>>>>>>>>> 1) About the language of KIP (or maybe later in Javadocs): > > > >> Is > > > >>>>> that > > > >>>>>>>>> alright > > > >>>>>>>>>> if I write all errors that fall into the `ApiException` > > > >>>> category > > > >>>>>>> thrown > > > >>>>>>>>>> (actually returned) by Producer? > > > >>>>>>>>>> 2) About future expansion: do you have any better > > > >> suggestions > > > >>>> for > > > >>>>>>> enum > > > >>>>>>>>>> names? Do you think `IGNORE_API_EXEPTIONS` or something > > > >> like > > > >>>> that > > > >>>>>> is > > > >>>>>>> a > > > >>>>>>>>>> "better/more accurate" one? > > > >>>>>>>>>> > > > >>>>>>>>>> Bests, > > > >>>>>>>>>> Alieh > > > >>>>>>>>>> > > > >>>>>>>>>> On Tue, Jul 2, 2024 at 7:29 PM Chris Egerton > > > >>>>>> <chr...@aiven.io.invalid > > > >>>>>>>> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> Hi Alieh and Justine, > > > >>>>>>>>>>> > > > >>>>>>>>>>> I'm concerned that we're settling on a definition of > > > >>> "poison > > > >>>>>> pill" > > > >>>>>>>>> that's > > > >>>>>>>>>>> easiest to tackle right now but may lead to shortcomings > > > >>> down > > > >>>>> the > > > >>>>>>>>> road. I > > > >>>>>>>>>>> understand the relationship between this KIP and > > > >>> KAFKA-9279, > > > >>>>> and > > > >>>>>> I > > > >>>>>>>> can > > > >>>>>>>>>>> totally get behind the desire to keep things small, > > > >>> focused, > > > >>>>> and > > > >>>>>>>> simple > > > >>>>>>>>>> in > > > >>>>>>>>>>> the name of avoiding bugs. However, what I don't think is > > > >>>> clear > > > >>>>>> at > > > >>>>>>>> all > > > >>>>>>>>> is > > > >>>>>>>>>>> what the "specific circumstances" are that Justine > > > >>>> mentioned. I > > > >>>>>>> had a > > > >>>>>>>>>>> drastically different idea of what the intended > > > >> behavioral > > > >>>>> change > > > >>>>>>>> would > > > >>>>>>>>>> be > > > >>>>>>>>>>> before looking at the draft PR. > > > >>>>>>>>>>> > > > >>>>>>>>>>> I would like 1) for us to be clearer about the categories > > > >>> of > > > >>>>>> errors > > > >>>>>>>>> that > > > >>>>>>>>>> we > > > >>>>>>>>>>> want to cover with this new API (especially since we'll > > > >>> have > > > >>>> to > > > >>>>>>> find > > > >>>>>>>> a > > > >>>>>>>>>>> clear, succinct way to document this for users), and 2) > > > >> to > > > >>>> make > > > >>>>>>> sure > > > >>>>>>>>> that > > > >>>>>>>>>>> if we do try to expand this API in the future, that we > > > >>> won't > > > >>>> be > > > >>>>>>>> painted > > > >>>>>>>>>>> into a corner. > > > >>>>>>>>>>> > > > >>>>>>>>>>> For item 1, hopefully we can agree that the language in > > > >> the > > > >>>> KIP > > > >>>>>>>>>>> for IGNORE_SEND_ERRORS ("The records causing > > > >> irrecoverable > > > >>>>> errors > > > >>>>>>> are > > > >>>>>>>>>>> excluded from the batch and the transaction is committed > > > >>>>>>>>> successfully.") > > > >>>>>>>>>> is > > > >>>>>>>>>>> pretty vague. If we start using the phrase "poison pill > > > >>>> record" > > > >>>>>>> that > > > >>>>>>>>>> could > > > >>>>>>>>>>> help, but IMO more detail would still be needed. We know > > > >>> that > > > >>>>> we > > > >>>>>>> want > > > >>>>>>>>> to > > > >>>>>>>>>>> include records that are so large that they can be > > > >>>> immediately > > > >>>>>>>> rejected > > > >>>>>>>>>> by > > > >>>>>>>>>>> the producer. But there are other cases that users might > > > >>>> expect > > > >>>>>> to > > > >>>>>>> be > > > >>>>>>>>>>> handled. Does a record qualify as a poison pill if it > > > >>>> targets a > > > >>>>>>> topic > > > >>>>>>>>>> that > > > >>>>>>>>>>> doesn't exist? Or if it targets a topic that the producer > > > >>>>>> principal > > > >>>>>>>>> lacks > > > >>>>>>>>>>> ACLs for? What if it fails broker-side validation (e.g., > > > >>> has > > > >>>> a > > > >>>>>> null > > > >>>>>>>> key > > > >>>>>>>>>> for > > > >>>>>>>>>>> a compacted topic)? > > > >>>>>>>>>>> > > > >>>>>>>>>>> For item 2, this really depends on how narrow the scope > > > >> of > > > >>>> what > > > >>>>>>> we're > > > >>>>>>>>>> doing > > > >>>>>>>>>>> right now is. If we only handle a subset of the examples > > > >> I > > > >>>> laid > > > >>>>>> out > > > >>>>>>>>> above > > > >>>>>>>>>>> that could possibly be considered poison pills with this > > > >>> KIP, > > > >>>>> do > > > >>>>>> we > > > >>>>>>>>> want > > > >>>>>>>>>> to > > > >>>>>>>>>>> lock ourselves in to never addressing more in the future, > > > >>> or > > > >>>>> can > > > >>>>>> we > > > >>>>>>>>>> choose > > > >>>>>>>>>>> an API (probably just enum names would be the only > > > >>> important > > > >>>>>>> decision > > > >>>>>>>>>> here) > > > >>>>>>>>>>> that leaves room for more later? > > > >>>>>>>>>>> > > > >>>>>>>>>>> Best, > > > >>>>>>>>>>> > > > >>>>>>>>>>> Chris > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Tue, Jul 2, 2024 at 12:28 PM Justine Olshan > > > >>>>>>>>>>> <jols...@confluent.io.invalid> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Chris and Alieh, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> My understanding is that this KIP is really only trying > > > >>> to > > > >>>>>> solve > > > >>>>>>> an > > > >>>>>>>>>> issue > > > >>>>>>>>>>>> of a "poison pill" record that fails send(). > > > >>>>>>>>>>>> We've talked a lot about having a generic framework for > > > >>> all > > > >>>>>>> errors, > > > >>>>>>>>>> but I > > > >>>>>>>>>>>> don't think that is what this KIP is trying to do. > > > >>>>> Essentially > > > >>>>>>> the > > > >>>>>>>>>>> request > > > >>>>>>>>>>>> is to undo the change from KAFKA-9279 > > > >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-9279> but > > > >>>> under > > > >>>>>>>>> specific > > > >>>>>>>>>>>> circumstances that are controlled. I really am > > > >> concerned > > > >>>>> about > > > >>>>>>>>> opening > > > >>>>>>>>>>> new > > > >>>>>>>>>>>> avenues for bugs with EOS and hesitate to handle any > > > >>> other > > > >>>>>> types > > > >>>>>>> of > > > >>>>>>>>>>> errors. > > > >>>>>>>>>>>> I think if we all agree on the problem that we are > > > >> trying > > > >>>> to > > > >>>>>>> solve, > > > >>>>>>>>> it > > > >>>>>>>>>> is > > > >>>>>>>>>>>> easier to agree on solutions. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Justine > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Mon, Jul 1, 2024 at 2:20 AM Alieh Saeedi > > > >>>>>>>>>> <asae...@confluent.io.invalid > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Hi Matthias, > > > >>>>>>>>>>>>> Thanks for the valid points you mentioned. I updated > > > >>> the > > > >>>>> KIP > > > >>>>>>> and > > > >>>>>>>>> the > > > >>>>>>>>>> PR > > > >>>>>>>>>>>>> with: > > > >>>>>>>>>>>>> 1) mentioning that the new overloaded `send` throws > > > >>>>>>>>>>>> `IllegalStateException` > > > >>>>>>>>>>>>> if the user tries to ignore `send()` errors outside > > > >> of > > > >>> a > > > >>>>>>>>> transaction. > > > >>>>>>>>>>>>> 2) the default implementation in `Producer` interface > > > >>>>> throws > > > >>>>>> an > > > >>>>>>>>>>>>> `UnsupportedOperationException` > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Hi Chris, > > > >>>>>>>>>>>>> Thanks for the feedback. I tried to clarify the > > > >> points > > > >>>> you > > > >>>>>>>> listed: > > > >>>>>>>>>>>>> -------> we've narrowed the scope from any error that > > > >>>> might > > > >>>>>>> take > > > >>>>>>>>>> place > > > >>>>>>>>>>>> with > > > >>>>>>>>>>>>> producing a record to Kafka, to only the ones that > > > >> are > > > >>>>> thrown > > > >>>>>>>>>> directly > > > >>>>>>>>>>>> from > > > >>>>>>>>>>>>> Producer::send; > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> From the very beginning and even since KIP-1038, the > > > >>> main > > > >>>>>>> purpose > > > >>>>>>>>> was > > > >>>>>>>>>>> to > > > >>>>>>>>>>>>> have "more flexibility," or, in other words, "giving > > > >>> the > > > >>>>> user > > > >>>>>>> the > > > >>>>>>>>>>>>> authority" to handle some specific exceptions thrown > > > >>> from > > > >>>>> the > > > >>>>>>>>>>> `Producer`. > > > >>>>>>>>>>>>> Due to the specific cases we had in mind, KIP-1038 > > > >> was > > > >>>>>>> discarded > > > >>>>>>>>> and > > > >>>>>>>>>> we > > > >>>>>>>>>>>>> decided to not define a `CustomExceptionHandler` for > > > >>>>>> `Producer` > > > >>>>>>>> and > > > >>>>>>>>>>>> instead > > > >>>>>>>>>>>>> treat the `send` failures in a different way. The > > > >> main > > > >>>>> issue > > > >>>>>> is > > > >>>>>>>>> that > > > >>>>>>>>>>>> `send` > > > >>>>>>>>>>>>> makes a transition to error state, which is undoable. > > > >>> In > > > >>>>>> fact, > > > >>>>>>>> one > > > >>>>>>>>>>> single > > > >>>>>>>>>>>>> poison pill record makes the whole batch fail. The > > > >>> former > > > >>>>>>>>> suggestions > > > >>>>>>>>>>>> that > > > >>>>>>>>>>>>> you agreed with have been all about un-doing this > > > >>>>> transition > > > >>>>>> in > > > >>>>>>>>>> `flush` > > > >>>>>>>>>>>> or > > > >>>>>>>>>>>>> `commit`. The new suggestion is to un-do (or better, > > > >>> NOT > > > >>>>> do) > > > >>>>>> in > > > >>>>>>>>>> `send` > > > >>>>>>>>>>>> due > > > >>>>>>>>>>>>> to the reasons listed in the discussions above. > > > >>>>>>>>>>>>> Moreover, I would say that having such a large scope > > > >> as > > > >>>> you > > > >>>>>>>>> mentioned > > > >>>>>>>>>>> is > > > >>>>>>>>>>>>> impossible. In the best case, we may have control > > > >> over > > > >>>> the > > > >>>>>>>>>> `Producer`. > > > >>>>>>>>>>>> What > > > >>>>>>>>>>>>> shall we do with the broker? The `any error that > > > >> might > > > >>>> take > > > >>>>>>> place > > > >>>>>>>>>> with > > > >>>>>>>>>>>>> producing a record to Kafka` is too much, I think. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> -------> is this all we want to handle, and will it > > > >>>> prevent > > > >>>>>> us > > > >>>>>>>> from > > > >>>>>>>>>>>>> handling more in the future in an intuitive way? > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I think yes. This is all we want. Other sorts of > > > >> errors > > > >>>>> such > > > >>>>>> as > > > >>>>>>>>>> having > > > >>>>>>>>>>>>> problem with partition addition, producer fenced > > > >>>> exception, > > > >>>>>> etc > > > >>>>>>>>> seem > > > >>>>>>>>>> to > > > >>>>>>>>>>>> be > > > >>>>>>>>>>>>> more serious issues. The intention was to handle > > > >>> problems > > > >>>>>>> created > > > >>>>>>>>> by > > > >>>>>>>>>>>>> (maybe) a single poison pill record. BTW, I do not > > > >> see > > > >>>> any > > > >>>>>>>>> obstacles > > > >>>>>>>>>> to > > > >>>>>>>>>>>>> future changes. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Bests, > > > >>>>>>>>>>>>> Alieh > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> On Sat, Jun 29, 2024 at 3:03 AM Chris Egerton > > > >>>>>>>>>> <chr...@aiven.io.invalid > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Ah, sorry--spoke too soon. The PR doesn't show that > > > >>>>> errors > > > >>>>>>>> thrown > > > >>>>>>>>>>> from > > > >>>>>>>>>>>>>> Producer::send are handled, but instead, > > > >> ApiException > > > >>>>>>> instances > > > >>>>>>>>>> that > > > >>>>>>>>>>>> are > > > >>>>>>>>>>>>>> caught inside KafkaProducer::doSend and are handled > > > >>> by > > > >>>>>>>> returning > > > >>>>>>>>> an > > > >>>>>>>>>>>>>> already-failed future are. I think the same > > > >> question > > > >>>>> still > > > >>>>>>>>> applies > > > >>>>>>>>>>> (is > > > >>>>>>>>>>>>> this > > > >>>>>>>>>>>>>> all we want to handle, and will it prevent us from > > > >>>>> handling > > > >>>>>>>> more > > > >>>>>>>>> in > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>> future in an intuitive way), though. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Fri, Jun 28, 2024 at 8:57 PM Chris Egerton < > > > >>>>>>> chr...@aiven.io > > > >>>>>>>>> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Hi Alieh, > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> This KIP has evolved a lot since I last looked at > > > >>> it, > > > >>>>> but > > > >>>>>>> the > > > >>>>>>>>>>> changes > > > >>>>>>>>>>>>>> seem > > > >>>>>>>>>>>>>>> well thought-out both in semantics and API. One > > > >>>>>> clarifying > > > >>>>>>>>>>> question I > > > >>>>>>>>>>>>>> have > > > >>>>>>>>>>>>>>> is that it looks based on the draft PR that we've > > > >>>>>> narrowed > > > >>>>>>>> the > > > >>>>>>>>>>> scope > > > >>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>> any error that might take place with producing a > > > >>>> record > > > >>>>>> to > > > >>>>>>>>> Kafka, > > > >>>>>>>>>>> to > > > >>>>>>>>>>>>> only > > > >>>>>>>>>>>>>>> the ones that are thrown directly from > > > >>>> Producer::send; > > > >>>>> is > > > >>>>>>>> that > > > >>>>>>>>>> the > > > >>>>>>>>>>>>>> intended > > > >>>>>>>>>>>>>>> behavior here? And if so, do you have thoughts on > > > >>> how > > > >>>>> we > > > >>>>>>>> might > > > >>>>>>>>>>>> design a > > > >>>>>>>>>>>>>>> follow-up KIP that would catch all errors > > > >>> (including > > > >>>>> ones > > > >>>>>>>>>> reported > > > >>>>>>>>>>>>>>> asynchronously instead of synchronously)? I'd > > > >> like > > > >>> it > > > >>>>> if > > > >>>>>> we > > > >>>>>>>>> could > > > >>>>>>>>>>>> leave > > > >>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>> door open for that without painting ourselves > > > >> into > > > >>>> too > > > >>>>>> much > > > >>>>>>>> of > > > >>>>>>>>> a > > > >>>>>>>>>>>> corner > > > >>>>>>>>>>>>>>> with the API design for this KIP. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Chris > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax < > > > >>>>>>>>>> mj...@apache.org> > > > >>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Thanks Alieh, > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> it seems this KIP can just pick between a couple > > > >>> of > > > >>>>>>>> tradeoffs. > > > >>>>>>>>>>>> Adding > > > >>>>>>>>>>>>> an > > > >>>>>>>>>>>>>>>> overloaded `send()` as the KIP propose makes > > > >> sense > > > >>>> to > > > >>>>> me > > > >>>>>>> and > > > >>>>>>>>>> seems > > > >>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>> provides the cleanest solution compare to there > > > >>>>> options > > > >>>>>> we > > > >>>>>>>>>>>> discussed. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Given the explicit name of the passed-in option > > > >>> that > > > >>>>>>>>> highlights > > > >>>>>>>>>>> that > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>> option is for TX only make is pretty clear and > > > >>>> avoids > > > >>>>>> the > > > >>>>>>>>> issue > > > >>>>>>>>>> of > > > >>>>>>>>>>>>>>>> `flush()` ambiguity. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Nit: We should make clear on the KIP though, > > > >> that > > > >>>> the > > > >>>>>> new > > > >>>>>>>>>> `send()` > > > >>>>>>>>>>>>>>>> overload would throw an `IllegalStateException` > > > >> if > > > >>>> TX > > > >>>>>> are > > > >>>>>>>> not > > > >>>>>>>>>> used > > > >>>>>>>>>>>>>>>> (similar to other TX methods like initTx(), etc) > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> About the `Producer` interface, I am not sure > > > >> how > > > >>>> this > > > >>>>>> was > > > >>>>>>>>> done > > > >>>>>>>>>> in > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>> past (eg, KIP-266 added > > > >> `Consumer.poll(Duration)` > > > >>>> w/o > > > >>>>> a > > > >>>>>>>>> default > > > >>>>>>>>>>>>>>>> implementation), if we need a default > > > >>> implementation > > > >>>>> for > > > >>>>>>>>>> backward > > > >>>>>>>>>>>>>>>> compatibility or not? If we do want to add one, > > > >> I > > > >>>>> think > > > >>>>>> it > > > >>>>>>>>> would > > > >>>>>>>>>>> be > > > >>>>>>>>>>>>>>>> appropriate to throw an > > > >>>>> `UnsupportedOperationException` > > > >>>>>> by > > > >>>>>>>>>>> default, > > > >>>>>>>>>>>>>>>> instead of just keeping the default impl empty? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> My points are rather minor, and should not block > > > >>>> this > > > >>>>>> KIP > > > >>>>>>>>>> though. > > > >>>>>>>>>>>>>>>> Overall LGTM. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> -Matthias > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On 6/27/24 1:28 PM, Alieh Saeedi wrote: > > > >>>>>>>>>>>>>>>>> Hi Justine, > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Thanks for the suggestion. > > > >>>>>>>>>>>>>>>>> Making applications to validate every single > > > >>>> record > > > >>>>> is > > > >>>>>>> not > > > >>>>>>>>> the > > > >>>>>>>>>>>> best > > > >>>>>>>>>>>>>> way, > > > >>>>>>>>>>>>>>>>> from an efficiency point of view. > > > >>>>>>>>>>>>>>>>> Moreover, between changing the behavior of the > > > >>>>>> Producer > > > >>>>>>> in > > > >>>>>>>>>>> `send` > > > >>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>> `commitTnx`, the former seems more reasonable > > > >>> and > > > >>>>>> clean. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Bests, > > > >>>>>>>>>>>>>>>>> Alieh > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan > > > >>>>>>>>>>>>>>>> <jols...@confluent.io.invalid> > > > >>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Hey Alieh, > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I see there are two options now. So folks > > > >> will > > > >>> be > > > >>>>>>>>> discussing > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>>>> approaches > > > >>>>>>>>>>>>>>>>>> and deciding the best way forward before we > > > >>> vote? > > > >>>>>>>>>>>>>>>>>> I do think there could be a problem with the > > > >>>>> approach > > > >>>>>>> on > > > >>>>>>>>>> commit > > > >>>>>>>>>>>> if > > > >>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>> get > > > >>>>>>>>>>>>>>>>>> stuck on an earlier error and have more > > > >> records > > > >>>>>>>>> (potentially > > > >>>>>>>>>> on > > > >>>>>>>>>>>> new > > > >>>>>>>>>>>>>>>>>> partitions) to commit as the current PR is > > > >>>>>> implemented. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I guess this takes us back to the question of > > > >>>>> whether > > > >>>>>>> the > > > >>>>>>>>>> error > > > >>>>>>>>>>>>>> should > > > >>>>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>> cleared on send. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> (And I guess at the back of my mind, I'm > > > >>>> wondering > > > >>>>> if > > > >>>>>>>> there > > > >>>>>>>>>> is > > > >>>>>>>>>>> a > > > >>>>>>>>>>>>> way > > > >>>>>>>>>>>>>>>> we can > > > >>>>>>>>>>>>>>>>>> validate the "posion pill" records > > > >> application > > > >>>> side > > > >>>>>>>> before > > > >>>>>>>>> we > > > >>>>>>>>>>>> even > > > >>>>>>>>>>>>>> try > > > >>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>> send them) > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Justine > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi > > > >>>>>>>>>>>>>>>> <asae...@confluent.io.invalid > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Hi Justine, > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> I did not update the KIP with > > > >> `TxnSendOption` > > > >>>>> since > > > >>>>>> I > > > >>>>>>>>>> thought > > > >>>>>>>>>>>> it'd > > > >>>>>>>>>>>>>> be > > > >>>>>>>>>>>>>>>>>>> better discussed here beforehand. > > > >>>>>>>>>>>>>>>>>>> right now, there are 2 PRs: > > > >>>>>>>>>>>>>>>>>>> - the PR that implements the current version > > > >>> of > > > >>>>> the > > > >>>>>>> KIP: > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/16332 > > > >>>>>>>>>>>>>>>>>>> - the POC PR that clarifies the > > > >>> `TxnSendOption`: > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/16465 > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Bests, > > > >>>>>>>>>>>>>>>>>>> Alieh > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> On Thu, Jun 27, 2024 at 12:42 AM Justine > > > >>> Olshan > > > >>>>>>>>>>>>>>>>>>> <jols...@confluent.io.invalid> wrote: > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Hey Alieh, > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> I think I am a little confused. Are the 3 > > > >>>> points > > > >>>>>>> above > > > >>>>>>>>>>>> addressed > > > >>>>>>>>>>>>> by > > > >>>>>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> KIP > > > >>>>>>>>>>>>>>>>>>>> or did something change? The PR seems to > > > >> not > > > >>>>>> include > > > >>>>>>>> this > > > >>>>>>>>>>>> change > > > >>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>> still > > > >>>>>>>>>>>>>>>>>>>> has the CommitOption as well. > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>>>>>>> Justine > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 2:15 PM Alieh > > > >> Saeedi > > > >>>>>>>>>>>>>>>>>>> <asae...@confluent.io.invalid > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Hi all, > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Looking at the PR < > > > >>>>>>>>>>> https://github.com/apache/kafka/pull/16332 > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> corresponding to the KIP, there are some > > > >>>> points > > > >>>>>>> worthy > > > >>>>>>>>> of > > > >>>>>>>>>>>>> mention: > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> 1) clearing the error ends up dirty/messy > > > >>> code > > > >>>>> in > > > >>>>>>>>>>>>>>>>>> `TransactionManager`. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> 2) By clearing the error, we are actually > > > >>>> doing > > > >>>>> an > > > >>>>>>>>> illegal > > > >>>>>>>>>>>>>>>> transition > > > >>>>>>>>>>>>>>>>>>>> from > > > >>>>>>>>>>>>>>>>>>>>> `ABORTABLE_ERROR` to `IN_TRANSACTION` > > > >> which > > > >>> is > > > >>>>>>>>>> conceptually > > > >>>>>>>>>>>> not > > > >>>>>>>>>>>>>>>>>>>> acceptable. > > > >>>>>>>>>>>>>>>>>>>>> This can be the root cause of some issues, > > > >>>> with > > > >>>>>>>> perhaps > > > >>>>>>>>>>>> further > > > >>>>>>>>>>>>>>>>>> future > > > >>>>>>>>>>>>>>>>>>>>> changes by others. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> 3) If the poison pill record `r1` causes a > > > >>>>>>> transition > > > >>>>>>>> to > > > >>>>>>>>>> the > > > >>>>>>>>>>>>> error > > > >>>>>>>>>>>>>>>>>>> state > > > >>>>>>>>>>>>>>>>>>>>> and then the next record `r2` requires > > > >>> adding > > > >>>> a > > > >>>>>>>>> partition > > > >>>>>>>>>> to > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>>>> transaction, the action fails due to being > > > >>> in > > > >>>>> the > > > >>>>>>>> error > > > >>>>>>>>>>> state. > > > >>>>>>>>>>>>> In > > > >>>>>>>>>>>>>>>>>> this > > > >>>>>>>>>>>>>>>>>>>>> case, clearing errors during > > > >>>>>>>>> `commitTnx(CLEAR_SEND_ERROR)` > > > >>>>>>>>>>> is > > > >>>>>>>>>>>>> too > > > >>>>>>>>>>>>>>>>>> late. > > > >>>>>>>>>>>>>>>>>>>>> However, this case can NOT be the main > > > >>> concern > > > >>>>> as > > > >>>>>>> soon > > > >>>>>>>>> as > > > >>>>>>>>>>>>> KIP-890 > > > >>>>>>>>>>>>>> is > > > >>>>>>>>>>>>>>>>>>>> fully > > > >>>>>>>>>>>>>>>>>>>>> implemented. > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> My suggestion is to solve the problem > > > >> where > > > >>> it > > > >>>>>>> arises. > > > >>>>>>>>> If > > > >>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>>> transition > > > >>>>>>>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>>>> the error state does not happen during > > > >>>> `send()`, > > > >>>>>> we > > > >>>>>>> do > > > >>>>>>>>> not > > > >>>>>>>>>>>> need > > > >>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>>> clear > > > >>>>>>>>>>>>>>>>>>>>> the error later. Therefore, instead of > > > >>>>>>> `CommitOption`, > > > >>>>>>>>> we > > > >>>>>>>>>>> can > > > >>>>>>>>>>>>>> define > > > >>>>>>>>>>>>>>>>>> a > > > >>>>>>>>>>>>>>>>>>>>> `TxnSendOption` and prevent the `send()` > > > >>>> method > > > >>>>>> from > > > >>>>>>>>> going > > > >>>>>>>>>>> to > > > >>>>>>>>>>>>> the > > > >>>>>>>>>>>>>>>>>> error > > > >>>>>>>>>>>>>>>>>>>>> state in case 1) we're in a transaction > > > >> and > > > >>> 2) > > > >>>>> the > > > >>>>>>>> user > > > >>>>>>>>>>> asked > > > >>>>>>>>>>>>> for > > > >>>>>>>>>>>>>>>>>>>>> IGONRE_SEND_ERRORS. For more clarity, you > > > >>> can > > > >>>>>> take a > > > >>>>>>>>> look > > > >>>>>>>>>> at > > > >>>>>>>>>>>> the > > > >>>>>>>>>>>>>> POC > > > >>>>>>>>>>>>>>>>>> PR > > > >>>>>>>>>>>>>>>>>>>>> < > > > >> https://github.com/apache/kafka/pull/16465 > > > >>>> . > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>>>>>>>>>>> Alieh > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >