Hey Greg,

thanks for the feedback.

I can understand your concern about atomicity, but what does a user do
after a transaction is failed due to a `too-large-record `exception? They
will submit the same batch without the problematic record again. What we
are providing is actually a better/more convenient way of doing that.

Regarding your solution to solve the issue application-side:  I am a
bit hesitant to keep all sent records in memory since I think buffering
records twice (both in Streams and Producer) would not be an efficient
solution.

Cheers,
Alieh

On Fri, Jul 12, 2024 at 10:07 PM Greg Harris <greg.har...@aiven.io.invalid>
wrote:

> Hi all,
>
> Alieh, thanks for the KIP! And everyone else, thanks for the robust
> discussion.
>
> I understand that there are situations in which users desire that the
> pipeline "just keep working" and skip errors. However, I question whether
> it is appropriate to support/encourage this behavior via inclusion in the
> Producer API.
> This feature is essentially a "non-atomic transaction", as it allows
> commits in which not all records passed to send() ultimately get committed.
> As atomicity is one of the most important semantics associated with
> transactions, I question whether there are users other than Streams that
> would choose non-atomic transactions over a traditional/idempotent
> producer.
> Some cursory research shows that non-atomic transactions may be present in
> other databases, but is actively discouraged due to the complexity they add
> to error-handling. [1]
>
> I'd like to invoke the End-to-End Arguments in System Design [2] here, and
> recommend that this behavior may be present in Streams, but should not be
> in the Producer.
> 1. Dropping records that cause errors is already expressible via the
> current Producer API. You can store the records in-memory after calling
> send(), wait for a successful no-error flush() before calling
> commitTransaction() and allowing the record to be garbage collected. If
> errors occur, abortTransaction() and re-submit the records.
> 2. Implementing this inside the Producer API is complex and difficult to
> holistically define in a way that we won't regret or need to change later.
> I think some of the disagreement in this thread originates from this, and I
> don't find the proposed API satisfactory.
> 3. The performance improvement of including this change in the lower level
> needs to be quantified in order to be a justification, and I don't see any
> analysis about this.
>
> I imagine that the alternative implementation I suggested in (1) would also
> enable more expressive error handlers in Streams, if such a thing was
> desired. Keeping the record around until after the transaction is committed
> would enable a DLQ or passing the erroneous record to the error handler.
>
> I think that the current pattern of the application being responsible for
> providing good data to the producer is very reasonable; Having the producer
> responsible for implementing the application's error handling of bad data
> is not something I can support.
>
> Thanks,
> Greg
>
> [1] https://www.sommarskog.se/error_handling/Part1.html
> [2] https://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf
>
> On Fri, Jul 12, 2024 at 8:52 AM Justine Olshan
> <jols...@confluent.io.invalid>
> wrote:
>
> > Can we update the KIP to clearly document these decisions?
> >
> > Thanks,
> >
> > Justine
> >
> > On Tue, Jul 9, 2024 at 9:25 AM Andrew Schofield <
> andrew_schofi...@live.com
> > >
> > wrote:
> >
> > > Hi Chris,
> > > As it stands, the error handling for transactions in KafkaProducer is
> not
> > > ideal. There’s no reason why a failed operation should fail a
> transaction
> > > provided that the application can tell that the operation was not
> > included
> > > in the transaction and then make its own decision whether to continue
> or
> > > back out. So, I think I disagree with the original premise of a
> > client-side
> > > error state for a transaction, but we are where we are.
> > >
> > > When I voted, I did not expect the KIP to handle ALL errors which could
> > > conceivably be handled. I did expect it to handle client-side send
> errors
> > > that would cause a record to be rejected from a batch before sending
> to a
> > > broker. I think that it does make the KafkaProducer interface very
> > slightly
> > > more complicated, but the new option is a clear improvement and I
> > > don’t see anyone getting into a mess using it.
> > >
> > > I think broker-side errors are more tricky and I don’t think an
> overload
> > > on the send() method is going to do the job. I don’t see that as a
> > problem
> > > with the KIP, just that the underlying RPCs and behaviour is not very
> > > amenable to record-specific error handling. The Produce RPC is a
> > > complicated beast which can include a set of records for mutiple
> > > topic-partitions. Although ProduceResponse v10 does include record
> > > errors, I don’t believe this is surfaced in the client. Let’s imagine
> > > something
> > > like broker-side record validation which barfs on one record. Failing
> an
> > > entire batch is easier, but less useful if the problem is related to
> one
> > > record.
> > >
> > > In summary, I’m happy that my vote stands, and I am happy with the KIP
> > > only supporting client-side record errors.
> > >
> > > Thanks,
> > > Andrew
> > >
> > > > On 8 Jul 2024, at 16:37, Chris Egerton <chr...@aiven.io.INVALID>
> > wrote:
> > > >
> > > > Hi Alieh,
> > > >
> > > > Can you clarify why broker-side errors shouldn't be covered? The only
> > > real
> > > > rationale I can come up with is that it's easier to implement.
> > > >
> > > > "Things were better for Kafka Streams before KAFKA-9279 was fixed"
> > isn't
> > > > very convincing, because Kafka Streams is not the only user of the
> Java
> > > > producer client. And for others, especially new users, I doubt that
> > this
> > > > new API we're proposing would make sense without having to consult a
> > lot
> > > of
> > > > historical context.
> > > >
> > > > I also don't think that most users will know or even care about the
> > > > distinction between errors that cause a record to fail before it's
> > added
> > > to
> > > > a batch vs. after. If you were writing a producer application of your
> > > own,
> > > > and you wanted to handle RecordTooLargeException instances by
> dropping
> > a
> > > > record without aborting a transaction, would you care about whether
> it
> > > was
> > > > your client or your broker that balked? Would you be happy if you
> wrote
> > > > logic expecting that that problem was solved once and for all, only
> to
> > > > learn that it could still affect you in other circumstances? Or,
> > > > alternatively, would you be happy if you wanted to solve that problem
> > and
> > > > found an API that seemed to do exactly what you wanted, but after
> > reading
> > > > the fine print, realized you'd have to do it yourself instead?
> > > >
> > > > Ultimately, the more I think about this, the more I believe that
> we're
> > > > adding noise to the API (with the new overloaded variant of send)
> for a
> > > > feature that will likely bring confusion and even frustration to
> anyone
> > > > besides maintainers of Kafka Streams who tries to use it.
> > > >
> > > > If the only concern about covering broker-side errors is that it
> would
> > be
> > > > more difficult to implement, I believe we should strongly reconsider
> > that
> > > > alternative. That said, if there is a straightforward way to explain
> > this
> > > > feature to new users that won't mislead them or require them to do
> > > research
> > > > on producer internals, then I can still live with it.
> > > >
> > > > Regarding a list of recoverable vs. irrecoverable errors, this is
> > > actually
> > > > the subject of another recently-introduced KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions
> > > >
> > > > Finally, I'd also like to ask the people who have already voted
> > (Andrew,
> > > > Matthias) if, at the time they voted, they believed that the API
> would
> > > > handle all errors, or only the subset of errors that would cause a
> > record
> > > > to be rejected from a batch before it can be sent to a broker.
> > > >
> > > > Best,
> > > >
> > > > Chris
> > > >
> > > > On Thu, Jul 4, 2024 at 12:43 PM Alieh Saeedi
> > > <asae...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > >> Salut from the KIP’s author
> > > >>
> > > >>
> > > >> Clarifying two points:
> > > >>
> > > >>
> > > >> 1) broker side errors:
> > > >>
> > > >> As far as I remember we are not going to cover the errors
> originating
> > > from
> > > >> the broker!
> > > >>
> > > >> A historical fact: One of the debate points in KIP-1038 was that by
> > > >> defining a producer custom handler, the user may assume that
> > broker-side
> > > >> errors must be covered as well. They may define a handler for
> handling
> > > >> `RecordTooLargeException` and still see such errors not being
> handled
> > as
> > > >> they wish.
> > > >>
> > > >>
> > > >> 2) Regarding irrecoverable/recoverable errors:
> > > >>
> > > >> Before the fix of `KAFKA-9279`,  errors such as
> > > `RecordTooLargeException`
> > > >> or errors related to missing meta data (both originating from
> Producer
> > > >> `send()`) were considered as recoverable but after that they turned
> > into
> > > >> being irrecoverable without changing any Javadocs or having any KIP.
> > > All
> > > >> the effort made in this KIP and the former one have been towards
> > > returning
> > > >> to the former state.
> > > >>
> > > >>
> > > >> I am sure that it is clear for you that which sort of errors we are
> > > going
> > > >> to cover: A single record may happen to NOT get added to the batch
> due
> > > to
> > > >> the issues with the record or its corresponding topic. The point was
> > > that
> > > >> if the record is not added to the batch let ’s don’t fail the whole
> > > batch
> > > >> because of that non-existing record. We never intended to do sth in
> > > broker
> > > >> side or ignore more important errors.  But I agree with you Chris.
> If
> > we
> > > >> are adding a new API, we must have good documentation for that. The
> > > >> sentence `all irrecoverable transactional errors will still be
> fatal`
> > as
> > > >> you suggested is good. What do you think? I am totally against
> > > enumerating
> > > >> errors in Javadocs since these sort of errors can be changing during
> > > >> time.  More
> > > >> over, have you ever seen any list of recoverable or irrecoverable
> > errors
> > > >> somewhere so far?
> > > >>
> > > >>
> > > >> Bests,
> > > >>
> > > >> Alieh
> > > >>
> > > >> On Wed, Jul 3, 2024 at 6:07 PM Chris Egerton
> <chr...@aiven.io.invalid
> > >
> > > >> wrote:
> > > >>
> > > >>> Hi Justine,
> > > >>>
> > > >>> I agree that enumerating a list of errors that should be covered by
> > the
> > > >> KIP
> > > >>> is difficult; I was thinking it might be easier if we list the
> errors
> > > >> that
> > > >>> should _not_ be covered by the KIP, and only if we can't define a
> > > >>> reasonable heuristic that would cover them without having to
> > explicitly
> > > >>> list them. Could it be enough to say "all irrecoverable
> transactional
> > > >>> errors will still be fatal", or even just "all transactional errors
> > (as
> > > >>> opposed to errors related to this specific record) will still be
> > > fatal"?
> > > >>>
> > > >>> Cheers,
> > > >>>
> > > >>> Chris
> > > >>>
> > > >>> On Wed, Jul 3, 2024 at 11:56 AM Justine Olshan
> > > >>> <jols...@confluent.io.invalid>
> > > >>> wrote:
> > > >>>
> > > >>>> Hey Chris,
> > > >>>>
> > > >>>> I think what you say makes sense. I agree that defining the
> behavior
> > > >>> based
> > > >>>> on code that can possibly change is not a good idea, and I was
> > trying
> > > >> to
> > > >>>> get a clearer definition from the KIP's author :)
> > > >>>>
> > > >>>> I think it can always be hard to ensure that only specific errors
> > are
> > > >>>> handled unless they are explicitly enumerated in code as the code
> > can
> > > >>>> change and can be changed by folks who are not aware of this KIP
> or
> > > >>>> conversation.
> > > >>>> I personally don't have the bandwidth to do this
> > > definition/enumeration
> > > >>> of
> > > >>>> errors, so hopefully Alieh can expand upon this.
> > > >>>>
> > > >>>> Justine
> > > >>>>
> > > >>>> On Wed, Jul 3, 2024 at 8:28 AM Chris Egerton
> > <chr...@aiven.io.invalid
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi Alieh,
> > > >>>>>
> > > >>>>> I don't love defining the changes for this KIP in terms of a
> catch
> > > >>> clause
> > > >>>>> in the KafkaProducer class, for two reasons. First, the set of
> > errors
> > > >>>> that
> > > >>>>> are handled by that clause may shift over time as the code base
> is
> > > >>>>> modified, and second, it would be fairly opaque to users who want
> > to
> > > >>>>> understand whether an error would be affected by using this API
> or
> > > >> not.
> > > >>>>>
> > > >>>>> It also seems strange that we'd handle some types of
> > > >>>>> RecordTooLargeException (i.e., ones reported client-side) with
> this
> > > >>> API,
> > > >>>>> but not others (i.e., ones reported by a broker).
> > > >>>>>
> > > >>>>> I think this kind of API would be most powerful, most intuitive
> to
> > > >>> users,
> > > >>>>> and easiest to document if we expanded the scope to all
> > > >>>> record-send-related
> > > >>>>> errors, except anything indicating issues with exactly-once
> > > >> semantics.
> > > >>>> That
> > > >>>>> would include records that are too large (when caught both
> client-
> > > >> and
> > > >>>>> server-side), records that can't be sent due to authorization
> > > >> failures,
> > > >>>>> records sent to nonexistent topics/topic partitions, and keyless
> > > >>> records
> > > >>>>> sent to compacted topics. It would not include
> > > >>>>> ProducerFencedException, InvalidProducerEpochException,
> > > >>>>> UnsupportedVersionException,
> > > >>>>> and possibly others.
> > > >>>>>
> > > >>>>> @Justine -- do you think it would be possible to develop either a
> > > >>> better
> > > >>>>> definition for the kinds of "excluded" errors that should not be
> > > >>> covered
> > > >>>> by
> > > >>>>> this API, or, barring that, a comprehensive list of exact error
> > > >> types?
> > > >>>> And
> > > >>>>> do you think this would be acceptable in terms of risk and
> > > >> complexity?
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>>
> > > >>>>> Chris
> > > >>>>>
> > > >>>>> On Tue, Jul 2, 2024 at 5:05 PM Alieh Saeedi
> > > >>> <asae...@confluent.io.invalid
> > > >>>>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hey Justine,
> > > >>>>>>
> > > >>>>>> About the consequences: the consequences will be like when we
> did
> > > >> not
> > > >>>>> have
> > > >>>>>> the fix made in `KAFKA-9279`: silent loss of data! Obviously,
> when
> > > >>> the
> > > >>>>> user
> > > >>>>>> intentionally chose to ignore errors, that would not be silent
> any
> > > >>>> more.
> > > >>>>>> Right?
> > > >>>>>> Of course, considering all types of `ApiException`s would be too
> > > >>> broad.
> > > >>>>> But
> > > >>>>>> are the exceptions caught in `catch(ApiException e)` of the
> > > >>> `doSend()`
> > > >>>>>> method also too broad?
> > > >>>>>>
> > > >>>>>> -Alieh
> > > >>>>>>
> > > >>>>>> On Tue, Jul 2, 2024 at 9:45 PM Justine Olshan
> > > >>>>> <jols...@confluent.io.invalid
> > > >>>>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hey Alieh,
> > > >>>>>>>
> > > >>>>>>> If we want to allow any error to be ignored we should probably
> > > >> run
> > > >>>>>> through
> > > >>>>>>> all the errors to make sure they make sense.
> > > >>>>>>> I just want to feel confident that we aren't just making a
> > > >> decision
> > > >>>>>> without
> > > >>>>>>> considering the consequences carefully.
> > > >>>>>>>
> > > >>>>>>> Justine
> > > >>>>>>>
> > > >>>>>>> On Tue, Jul 2, 2024 at 12:30 PM Alieh Saeedi
> > > >>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hey Justine,
> > > >>>>>>>>
> > > >>>>>>>> yes we talked about `RecordTooLargeException` as an example,
> > > >> but
> > > >>>> did
> > > >>>>> we
> > > >>>>>>>> ever limit ourselves to only this specific exception? I think
> > > >>>> neither
> > > >>>>>> in
> > > >>>>>>>> the KIP nor in the PR.  As Chris mentioned, this KIP is going
> > > >> to
> > > >>>> undo
> > > >>>>>>> what
> > > >>>>>>>> we have done in `KAFKA-9279` in case 1) the user is in a
> > > >>>> transaction
> > > >>>>>> and
> > > >>>>>>> 2)
> > > >>>>>>>> he decides to ignore the errors in which the record was not
> > > >> even
> > > >>>>> added
> > > >>>>>> to
> > > >>>>>>>> the batch. Yes, and we suggested some methods for undoing or,
> > > >> in
> > > >>>>> fact,
> > > >>>>>>>> moving back the transaction from the error state in `flush` or
> > > >> in
> > > >>>>>>>> `commitTnx` and we finally came to the idea of not even doing
> > > >> the
> > > >>>>>> changes
> > > >>>>>>>> (better than undoing) in `send`.
> > > >>>>>>>>
> > > >>>>>>>> Bests,
> > > >>>>>>>> Alieh
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Jul 2, 2024 at 8:03 PM Justine Olshan
> > > >>>>>>> <jols...@confluent.io.invalid
> > > >>>>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hey folks,
> > > >>>>>>>>>
> > > >>>>>>>>> I understand where you are coming from by asking for specific
> > > >>> use
> > > >>>>>>> cases.
> > > >>>>>>>> My
> > > >>>>>>>>> understanding based on previous conversations was that there
> > > >>>> were a
> > > >>>>>> few
> > > >>>>>>>>> different errors that have been seen.
> > > >>>>>>>>> One example I heard some information about was when the
> > > >> record
> > > >>>> was
> > > >>>>>> too
> > > >>>>>>>>> large and it fails the batch. Besides that, I'm not really
> > > >> sure
> > > >>>> if
> > > >>>>>>> there
> > > >>>>>>>>> are cases in mind, though it is fair to ask on those and
> > > >> bring
> > > >>>> them
> > > >>>>>> up.
> > > >>>>>>>>>
> > > >>>>>>>>>> Does a record qualify as a poison pill if it targets a
> > > >> topic
> > > >>>> that
> > > >>>>>>>>> doesn't exist? Or if it targets a topic that the producer
> > > >>>> principal
> > > >>>>>>> lacks
> > > >>>>>>>>> ACLs for? What if it fails broker-side validation (e.g., has
> > > >> a
> > > >>>> null
> > > >>>>>> key
> > > >>>>>>>> for
> > > >>>>>>>>> a compacted topic)?
> > > >>>>>>>>>
> > > >>>>>>>>> I think there was some parallel work with addressing the
> > > >>>>>>>>> UnknownTopicOrPartitionError in another way. As for the other
> > > >>>>> checks,
> > > >>>>>>>> acls,
> > > >>>>>>>>> validation etc. I am not aware of that being in Alieh's
> > > >> scope,
> > > >>>> but
> > > >>>>> we
> > > >>>>>>>>> should be clear about exactly what we are doing.
> > > >>>>>>>>>
> > > >>>>>>>>> All errors that fall into ApiException seems too broad to me.
> > > >>>>>>>>>
> > > >>>>>>>>> Justine
> > > >>>>>>>>>
> > > >>>>>>>>> On Tue, Jul 2, 2024 at 10:51 AM Alieh Saeedi
> > > >>>>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hey Chris,
> > > >>>>>>>>>> thanks for sharing your concerns.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1) About the language of KIP (or maybe later in Javadocs):
> > > >> Is
> > > >>>>> that
> > > >>>>>>>>> alright
> > > >>>>>>>>>> if I write all errors that fall into the `ApiException`
> > > >>>> category
> > > >>>>>>> thrown
> > > >>>>>>>>>> (actually returned) by Producer?
> > > >>>>>>>>>> 2) About future expansion: do you have any better
> > > >> suggestions
> > > >>>> for
> > > >>>>>>> enum
> > > >>>>>>>>>> names? Do you think `IGNORE_API_EXEPTIONS` or something
> > > >> like
> > > >>>> that
> > > >>>>>> is
> > > >>>>>>> a
> > > >>>>>>>>>> "better/more accurate" one?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Bests,
> > > >>>>>>>>>> Alieh
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Tue, Jul 2, 2024 at 7:29 PM Chris Egerton
> > > >>>>>> <chr...@aiven.io.invalid
> > > >>>>>>>>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Alieh and Justine,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I'm concerned that we're settling on a definition of
> > > >>> "poison
> > > >>>>>> pill"
> > > >>>>>>>>> that's
> > > >>>>>>>>>>> easiest to tackle right now but may lead to shortcomings
> > > >>> down
> > > >>>>> the
> > > >>>>>>>>> road. I
> > > >>>>>>>>>>> understand the relationship between this KIP and
> > > >>> KAFKA-9279,
> > > >>>>> and
> > > >>>>>> I
> > > >>>>>>>> can
> > > >>>>>>>>>>> totally get behind the desire to keep things small,
> > > >>> focused,
> > > >>>>> and
> > > >>>>>>>> simple
> > > >>>>>>>>>> in
> > > >>>>>>>>>>> the name of avoiding bugs. However, what I don't think is
> > > >>>> clear
> > > >>>>>> at
> > > >>>>>>>> all
> > > >>>>>>>>> is
> > > >>>>>>>>>>> what the "specific circumstances" are that Justine
> > > >>>> mentioned. I
> > > >>>>>>> had a
> > > >>>>>>>>>>> drastically different idea of what the intended
> > > >> behavioral
> > > >>>>> change
> > > >>>>>>>> would
> > > >>>>>>>>>> be
> > > >>>>>>>>>>> before looking at the draft PR.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I would like 1) for us to be clearer about the categories
> > > >>> of
> > > >>>>>> errors
> > > >>>>>>>>> that
> > > >>>>>>>>>> we
> > > >>>>>>>>>>> want to cover with this new API (especially since we'll
> > > >>> have
> > > >>>> to
> > > >>>>>>> find
> > > >>>>>>>> a
> > > >>>>>>>>>>> clear, succinct way to document this for users), and 2)
> > > >> to
> > > >>>> make
> > > >>>>>>> sure
> > > >>>>>>>>> that
> > > >>>>>>>>>>> if we do try to expand this API in the future, that we
> > > >>> won't
> > > >>>> be
> > > >>>>>>>> painted
> > > >>>>>>>>>>> into a corner.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> For item 1, hopefully we can agree that the language in
> > > >> the
> > > >>>> KIP
> > > >>>>>>>>>>> for IGNORE_SEND_ERRORS ("The records causing
> > > >> irrecoverable
> > > >>>>> errors
> > > >>>>>>> are
> > > >>>>>>>>>>> excluded from the batch and the transaction is committed
> > > >>>>>>>>> successfully.")
> > > >>>>>>>>>> is
> > > >>>>>>>>>>> pretty vague. If we start using the phrase "poison pill
> > > >>>> record"
> > > >>>>>>> that
> > > >>>>>>>>>> could
> > > >>>>>>>>>>> help, but IMO more detail would still be needed. We know
> > > >>> that
> > > >>>>> we
> > > >>>>>>> want
> > > >>>>>>>>> to
> > > >>>>>>>>>>> include records that are so large that they can be
> > > >>>> immediately
> > > >>>>>>>> rejected
> > > >>>>>>>>>> by
> > > >>>>>>>>>>> the producer. But there are other cases that users might
> > > >>>> expect
> > > >>>>>> to
> > > >>>>>>> be
> > > >>>>>>>>>>> handled. Does a record qualify as a poison pill if it
> > > >>>> targets a
> > > >>>>>>> topic
> > > >>>>>>>>>> that
> > > >>>>>>>>>>> doesn't exist? Or if it targets a topic that the producer
> > > >>>>>> principal
> > > >>>>>>>>> lacks
> > > >>>>>>>>>>> ACLs for? What if it fails broker-side validation (e.g.,
> > > >>> has
> > > >>>> a
> > > >>>>>> null
> > > >>>>>>>> key
> > > >>>>>>>>>> for
> > > >>>>>>>>>>> a compacted topic)?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> For item 2, this really depends on how narrow the scope
> > > >> of
> > > >>>> what
> > > >>>>>>> we're
> > > >>>>>>>>>> doing
> > > >>>>>>>>>>> right now is. If we only handle a subset of the examples
> > > >> I
> > > >>>> laid
> > > >>>>>> out
> > > >>>>>>>>> above
> > > >>>>>>>>>>> that could possibly be considered poison pills with this
> > > >>> KIP,
> > > >>>>> do
> > > >>>>>> we
> > > >>>>>>>>> want
> > > >>>>>>>>>> to
> > > >>>>>>>>>>> lock ourselves in to never addressing more in the future,
> > > >>> or
> > > >>>>> can
> > > >>>>>> we
> > > >>>>>>>>>> choose
> > > >>>>>>>>>>> an API (probably just enum names would be the only
> > > >>> important
> > > >>>>>>> decision
> > > >>>>>>>>>> here)
> > > >>>>>>>>>>> that leaves room for more later?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Best,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Chris
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Tue, Jul 2, 2024 at 12:28 PM Justine Olshan
> > > >>>>>>>>>>> <jols...@confluent.io.invalid>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Chris and Alieh,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> My understanding is that this KIP is really only trying
> > > >>> to
> > > >>>>>> solve
> > > >>>>>>> an
> > > >>>>>>>>>> issue
> > > >>>>>>>>>>>> of a "poison pill" record that fails send().
> > > >>>>>>>>>>>> We've talked a lot about having a generic framework for
> > > >>> all
> > > >>>>>>> errors,
> > > >>>>>>>>>> but I
> > > >>>>>>>>>>>> don't think that is what this KIP is trying to do.
> > > >>>>> Essentially
> > > >>>>>>> the
> > > >>>>>>>>>>> request
> > > >>>>>>>>>>>> is to undo the change from KAFKA-9279
> > > >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-9279> but
> > > >>>> under
> > > >>>>>>>>> specific
> > > >>>>>>>>>>>> circumstances that are controlled. I really am
> > > >> concerned
> > > >>>>> about
> > > >>>>>>>>> opening
> > > >>>>>>>>>>> new
> > > >>>>>>>>>>>> avenues for bugs with EOS and hesitate to handle any
> > > >>> other
> > > >>>>>> types
> > > >>>>>>> of
> > > >>>>>>>>>>> errors.
> > > >>>>>>>>>>>> I think if we all agree on the problem that we are
> > > >> trying
> > > >>>> to
> > > >>>>>>> solve,
> > > >>>>>>>>> it
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>> easier to agree on solutions.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Justine
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Mon, Jul 1, 2024 at 2:20 AM Alieh Saeedi
> > > >>>>>>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi Matthias,
> > > >>>>>>>>>>>>> Thanks for the valid points you mentioned. I updated
> > > >>> the
> > > >>>>> KIP
> > > >>>>>>> and
> > > >>>>>>>>> the
> > > >>>>>>>>>> PR
> > > >>>>>>>>>>>>> with:
> > > >>>>>>>>>>>>> 1) mentioning that the new overloaded `send` throws
> > > >>>>>>>>>>>> `IllegalStateException`
> > > >>>>>>>>>>>>> if the user tries to ignore `send()` errors outside
> > > >> of
> > > >>> a
> > > >>>>>>>>> transaction.
> > > >>>>>>>>>>>>> 2) the default implementation in `Producer` interface
> > > >>>>> throws
> > > >>>>>> an
> > > >>>>>>>>>>>>> `UnsupportedOperationException`
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi Chris,
> > > >>>>>>>>>>>>> Thanks for the feedback. I tried to clarify the
> > > >> points
> > > >>>> you
> > > >>>>>>>> listed:
> > > >>>>>>>>>>>>> -------> we've narrowed the scope from any error that
> > > >>>> might
> > > >>>>>>> take
> > > >>>>>>>>>> place
> > > >>>>>>>>>>>> with
> > > >>>>>>>>>>>>> producing a record to Kafka, to only the ones that
> > > >> are
> > > >>>>> thrown
> > > >>>>>>>>>> directly
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>> Producer::send;
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> From the very beginning and even since KIP-1038, the
> > > >>> main
> > > >>>>>>> purpose
> > > >>>>>>>>> was
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> have "more flexibility," or, in other words, "giving
> > > >>> the
> > > >>>>> user
> > > >>>>>>> the
> > > >>>>>>>>>>>>> authority" to handle some specific exceptions thrown
> > > >>> from
> > > >>>>> the
> > > >>>>>>>>>>> `Producer`.
> > > >>>>>>>>>>>>> Due to the specific cases we had in mind, KIP-1038
> > > >> was
> > > >>>>>>> discarded
> > > >>>>>>>>> and
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>>> decided to not define a `CustomExceptionHandler` for
> > > >>>>>> `Producer`
> > > >>>>>>>> and
> > > >>>>>>>>>>>> instead
> > > >>>>>>>>>>>>> treat the `send` failures in a different way. The
> > > >> main
> > > >>>>> issue
> > > >>>>>> is
> > > >>>>>>>>> that
> > > >>>>>>>>>>>> `send`
> > > >>>>>>>>>>>>> makes a transition to error state, which is undoable.
> > > >>> In
> > > >>>>>> fact,
> > > >>>>>>>> one
> > > >>>>>>>>>>> single
> > > >>>>>>>>>>>>> poison pill record makes the whole batch fail. The
> > > >>> former
> > > >>>>>>>>> suggestions
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>> you agreed with have been all about un-doing this
> > > >>>>> transition
> > > >>>>>> in
> > > >>>>>>>>>> `flush`
> > > >>>>>>>>>>>> or
> > > >>>>>>>>>>>>> `commit`. The new suggestion is to un-do (or better,
> > > >>> NOT
> > > >>>>> do)
> > > >>>>>> in
> > > >>>>>>>>>> `send`
> > > >>>>>>>>>>>> due
> > > >>>>>>>>>>>>> to the reasons listed in the discussions above.
> > > >>>>>>>>>>>>> Moreover, I would say that having such a large scope
> > > >> as
> > > >>>> you
> > > >>>>>>>>> mentioned
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> impossible. In the best case, we may have control
> > > >> over
> > > >>>> the
> > > >>>>>>>>>> `Producer`.
> > > >>>>>>>>>>>> What
> > > >>>>>>>>>>>>> shall we do with the broker? The `any error that
> > > >> might
> > > >>>> take
> > > >>>>>>> place
> > > >>>>>>>>>> with
> > > >>>>>>>>>>>>> producing a record to Kafka` is too much, I think.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> -------> is this all we want to handle, and will it
> > > >>>> prevent
> > > >>>>>> us
> > > >>>>>>>> from
> > > >>>>>>>>>>>>> handling more in the future in an intuitive way?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I think yes. This is all we want. Other sorts of
> > > >> errors
> > > >>>>> such
> > > >>>>>> as
> > > >>>>>>>>>> having
> > > >>>>>>>>>>>>> problem with partition addition, producer fenced
> > > >>>> exception,
> > > >>>>>> etc
> > > >>>>>>>>> seem
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>> more serious issues. The intention was to handle
> > > >>> problems
> > > >>>>>>> created
> > > >>>>>>>>> by
> > > >>>>>>>>>>>>> (maybe) a single poison pill record. BTW, I do not
> > > >> see
> > > >>>> any
> > > >>>>>>>>> obstacles
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> future changes.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Bests,
> > > >>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Sat, Jun 29, 2024 at 3:03 AM Chris Egerton
> > > >>>>>>>>>> <chr...@aiven.io.invalid
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Ah, sorry--spoke too soon. The PR doesn't show that
> > > >>>>> errors
> > > >>>>>>>> thrown
> > > >>>>>>>>>>> from
> > > >>>>>>>>>>>>>> Producer::send are handled, but instead,
> > > >> ApiException
> > > >>>>>>> instances
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>> are
> > > >>>>>>>>>>>>>> caught inside KafkaProducer::doSend and are handled
> > > >>> by
> > > >>>>>>>> returning
> > > >>>>>>>>> an
> > > >>>>>>>>>>>>>> already-failed future are. I think the same
> > > >> question
> > > >>>>> still
> > > >>>>>>>>> applies
> > > >>>>>>>>>>> (is
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>> all we want to handle, and will it prevent us from
> > > >>>>> handling
> > > >>>>>>>> more
> > > >>>>>>>>> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> future in an intuitive way), though.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Fri, Jun 28, 2024 at 8:57 PM Chris Egerton <
> > > >>>>>>> chr...@aiven.io
> > > >>>>>>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> This KIP has evolved a lot since I last looked at
> > > >>> it,
> > > >>>>> but
> > > >>>>>>> the
> > > >>>>>>>>>>> changes
> > > >>>>>>>>>>>>>> seem
> > > >>>>>>>>>>>>>>> well thought-out both in semantics and API. One
> > > >>>>>> clarifying
> > > >>>>>>>>>>> question I
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>> is that it looks based on the draft PR that we've
> > > >>>>>> narrowed
> > > >>>>>>>> the
> > > >>>>>>>>>>> scope
> > > >>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>> any error that might take place with producing a
> > > >>>> record
> > > >>>>>> to
> > > >>>>>>>>> Kafka,
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>> the ones that are thrown directly from
> > > >>>> Producer::send;
> > > >>>>> is
> > > >>>>>>>> that
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> intended
> > > >>>>>>>>>>>>>>> behavior here? And if so, do you have thoughts on
> > > >>> how
> > > >>>>> we
> > > >>>>>>>> might
> > > >>>>>>>>>>>> design a
> > > >>>>>>>>>>>>>>> follow-up KIP that would catch all errors
> > > >>> (including
> > > >>>>> ones
> > > >>>>>>>>>> reported
> > > >>>>>>>>>>>>>>> asynchronously instead of synchronously)? I'd
> > > >> like
> > > >>> it
> > > >>>>> if
> > > >>>>>> we
> > > >>>>>>>>> could
> > > >>>>>>>>>>>> leave
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> door open for that without painting ourselves
> > > >> into
> > > >>>> too
> > > >>>>>> much
> > > >>>>>>>> of
> > > >>>>>>>>> a
> > > >>>>>>>>>>>> corner
> > > >>>>>>>>>>>>>>> with the API design for this KIP.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax <
> > > >>>>>>>>>> mj...@apache.org>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks Alieh,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> it seems this KIP can just pick between a couple
> > > >>> of
> > > >>>>>>>> tradeoffs.
> > > >>>>>>>>>>>> Adding
> > > >>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>> overloaded `send()` as the KIP propose makes
> > > >> sense
> > > >>>> to
> > > >>>>> me
> > > >>>>>>> and
> > > >>>>>>>>>> seems
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> provides the cleanest solution compare to there
> > > >>>>> options
> > > >>>>>> we
> > > >>>>>>>>>>>> discussed.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Given the explicit name of the passed-in option
> > > >>> that
> > > >>>>>>>>> highlights
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> option is for TX only make is pretty clear and
> > > >>>> avoids
> > > >>>>>> the
> > > >>>>>>>>> issue
> > > >>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> `flush()` ambiguity.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Nit: We should make clear on the KIP though,
> > > >> that
> > > >>>> the
> > > >>>>>> new
> > > >>>>>>>>>> `send()`
> > > >>>>>>>>>>>>>>>> overload would throw an `IllegalStateException`
> > > >> if
> > > >>>> TX
> > > >>>>>> are
> > > >>>>>>>> not
> > > >>>>>>>>>> used
> > > >>>>>>>>>>>>>>>> (similar to other TX methods like initTx(), etc)
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> About the `Producer` interface, I am not sure
> > > >> how
> > > >>>> this
> > > >>>>>> was
> > > >>>>>>>>> done
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> past (eg, KIP-266 added
> > > >> `Consumer.poll(Duration)`
> > > >>>> w/o
> > > >>>>> a
> > > >>>>>>>>> default
> > > >>>>>>>>>>>>>>>> implementation), if we need a default
> > > >>> implementation
> > > >>>>> for
> > > >>>>>>>>>> backward
> > > >>>>>>>>>>>>>>>> compatibility or not? If we do want to add one,
> > > >> I
> > > >>>>> think
> > > >>>>>> it
> > > >>>>>>>>> would
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> appropriate to throw an
> > > >>>>> `UnsupportedOperationException`
> > > >>>>>> by
> > > >>>>>>>>>>> default,
> > > >>>>>>>>>>>>>>>> instead of just keeping the default impl empty?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> My points are rather minor, and should not block
> > > >>>> this
> > > >>>>>> KIP
> > > >>>>>>>>>> though.
> > > >>>>>>>>>>>>>>>> Overall LGTM.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> -Matthias
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On 6/27/24 1:28 PM, Alieh Saeedi wrote:
> > > >>>>>>>>>>>>>>>>> Hi Justine,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks for the suggestion.
> > > >>>>>>>>>>>>>>>>> Making applications to validate every single
> > > >>>> record
> > > >>>>> is
> > > >>>>>>> not
> > > >>>>>>>>> the
> > > >>>>>>>>>>>> best
> > > >>>>>>>>>>>>>> way,
> > > >>>>>>>>>>>>>>>>> from an efficiency point of view.
> > > >>>>>>>>>>>>>>>>> Moreover, between changing the behavior of the
> > > >>>>>> Producer
> > > >>>>>>> in
> > > >>>>>>>>>>> `send`
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> `commitTnx`, the former seems more reasonable
> > > >>> and
> > > >>>>>> clean.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Bests,
> > > >>>>>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan
> > > >>>>>>>>>>>>>>>> <jols...@confluent.io.invalid>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hey Alieh,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I see there are two options now. So folks
> > > >> will
> > > >>> be
> > > >>>>>>>>> discussing
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> approaches
> > > >>>>>>>>>>>>>>>>>> and deciding the best way forward before we
> > > >>> vote?
> > > >>>>>>>>>>>>>>>>>> I do think there could be a problem with the
> > > >>>>> approach
> > > >>>>>>> on
> > > >>>>>>>>>> commit
> > > >>>>>>>>>>>> if
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>> get
> > > >>>>>>>>>>>>>>>>>> stuck on an earlier error and have more
> > > >> records
> > > >>>>>>>>> (potentially
> > > >>>>>>>>>> on
> > > >>>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>>> partitions) to commit as the current PR is
> > > >>>>>> implemented.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I guess this takes us back to the question of
> > > >>>>> whether
> > > >>>>>>> the
> > > >>>>>>>>>> error
> > > >>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>> cleared on send.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> (And I guess at the back of my mind, I'm
> > > >>>> wondering
> > > >>>>> if
> > > >>>>>>>> there
> > > >>>>>>>>>> is
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>> we can
> > > >>>>>>>>>>>>>>>>>> validate the "posion pill" records
> > > >> application
> > > >>>> side
> > > >>>>>>>> before
> > > >>>>>>>>> we
> > > >>>>>>>>>>>> even
> > > >>>>>>>>>>>>>> try
> > > >>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> send them)
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Justine
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi
> > > >>>>>>>>>>>>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi Justine,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I did not update the KIP with
> > > >> `TxnSendOption`
> > > >>>>> since
> > > >>>>>> I
> > > >>>>>>>>>> thought
> > > >>>>>>>>>>>> it'd
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>> better discussed here beforehand.
> > > >>>>>>>>>>>>>>>>>>> right now, there are 2 PRs:
> > > >>>>>>>>>>>>>>>>>>> - the PR that implements the current version
> > > >>> of
> > > >>>>> the
> > > >>>>>>> KIP:
> > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/16332
> > > >>>>>>>>>>>>>>>>>>> - the POC PR that clarifies the
> > > >>> `TxnSendOption`:
> > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/16465
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Bests,
> > > >>>>>>>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Thu, Jun 27, 2024 at 12:42 AM Justine
> > > >>> Olshan
> > > >>>>>>>>>>>>>>>>>>> <jols...@confluent.io.invalid> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hey Alieh,
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I think I am a little confused. Are the 3
> > > >>>> points
> > > >>>>>>> above
> > > >>>>>>>>>>>> addressed
> > > >>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>>>>>>> or did something change? The PR seems to
> > > >> not
> > > >>>>>> include
> > > >>>>>>>> this
> > > >>>>>>>>>>>> change
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>>> has the CommitOption as well.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>> Justine
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2024 at 2:15 PM Alieh
> > > >> Saeedi
> > > >>>>>>>>>>>>>>>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Looking at the PR <
> > > >>>>>>>>>>> https://github.com/apache/kafka/pull/16332
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> corresponding to the KIP, there are some
> > > >>>> points
> > > >>>>>>> worthy
> > > >>>>>>>>> of
> > > >>>>>>>>>>>>> mention:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 1) clearing the error ends up dirty/messy
> > > >>> code
> > > >>>>> in
> > > >>>>>>>>>>>>>>>>>> `TransactionManager`.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 2) By clearing the error, we are actually
> > > >>>> doing
> > > >>>>> an
> > > >>>>>>>>> illegal
> > > >>>>>>>>>>>>>>>> transition
> > > >>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>> `ABORTABLE_ERROR` to `IN_TRANSACTION`
> > > >> which
> > > >>> is
> > > >>>>>>>>>> conceptually
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>>>> acceptable.
> > > >>>>>>>>>>>>>>>>>>>>> This can be the root cause of some issues,
> > > >>>> with
> > > >>>>>>>> perhaps
> > > >>>>>>>>>>>> further
> > > >>>>>>>>>>>>>>>>>> future
> > > >>>>>>>>>>>>>>>>>>>>> changes by others.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> 3) If the poison pill record `r1` causes a
> > > >>>>>>> transition
> > > >>>>>>>> to
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>> error
> > > >>>>>>>>>>>>>>>>>>> state
> > > >>>>>>>>>>>>>>>>>>>>> and then the next record `r2` requires
> > > >>> adding
> > > >>>> a
> > > >>>>>>>>> partition
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> transaction, the action fails due to being
> > > >>> in
> > > >>>>> the
> > > >>>>>>>> error
> > > >>>>>>>>>>> state.
> > > >>>>>>>>>>>>> In
> > > >>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>> case, clearing errors during
> > > >>>>>>>>> `commitTnx(CLEAR_SEND_ERROR)`
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> too
> > > >>>>>>>>>>>>>>>>>> late.
> > > >>>>>>>>>>>>>>>>>>>>> However, this case can NOT be the main
> > > >>> concern
> > > >>>>> as
> > > >>>>>>> soon
> > > >>>>>>>>> as
> > > >>>>>>>>>>>>> KIP-890
> > > >>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>> fully
> > > >>>>>>>>>>>>>>>>>>>>> implemented.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> My suggestion is to solve the problem
> > > >> where
> > > >>> it
> > > >>>>>>> arises.
> > > >>>>>>>>> If
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> transition
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> the error state does not happen during
> > > >>>> `send()`,
> > > >>>>>> we
> > > >>>>>>> do
> > > >>>>>>>>> not
> > > >>>>>>>>>>>> need
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> clear
> > > >>>>>>>>>>>>>>>>>>>>> the error later. Therefore, instead of
> > > >>>>>>> `CommitOption`,
> > > >>>>>>>>> we
> > > >>>>>>>>>>> can
> > > >>>>>>>>>>>>>> define
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>> `TxnSendOption` and prevent the `send()`
> > > >>>> method
> > > >>>>>> from
> > > >>>>>>>>> going
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> error
> > > >>>>>>>>>>>>>>>>>>>>> state in case 1) we're in a transaction
> > > >> and
> > > >>> 2)
> > > >>>>> the
> > > >>>>>>>> user
> > > >>>>>>>>>>> asked
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>> IGONRE_SEND_ERRORS. For more clarity, you
> > > >>> can
> > > >>>>>> take a
> > > >>>>>>>>> look
> > > >>>>>>>>>> at
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> POC
> > > >>>>>>>>>>>>>>>>>> PR
> > > >>>>>>>>>>>>>>>>>>>>> <
> > > >> https://github.com/apache/kafka/pull/16465
> > > >>>> .
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to