Hi Justine,

I agree that enumerating a list of errors that should be covered by the KIP
is difficult; I was thinking it might be easier if we list the errors that
should _not_ be covered by the KIP, and only if we can't define a
reasonable heuristic that would cover them without having to explicitly
list them. Could it be enough to say "all irrecoverable transactional
errors will still be fatal", or even just "all transactional errors (as
opposed to errors related to this specific record) will still be fatal"?

Cheers,

Chris

On Wed, Jul 3, 2024 at 11:56 AM Justine Olshan <jols...@confluent.io.invalid>
wrote:

> Hey Chris,
>
> I think what you say makes sense. I agree that defining the behavior based
> on code that can possibly change is not a good idea, and I was trying to
> get a clearer definition from the KIP's author :)
>
> I think it can always be hard to ensure that only specific errors are
> handled unless they are explicitly enumerated in code as the code can
> change and can be changed by folks who are not aware of this KIP or
> conversation.
> I personally don't have the bandwidth to do this definition/enumeration of
> errors, so hopefully Alieh can expand upon this.
>
> Justine
>
> On Wed, Jul 3, 2024 at 8:28 AM Chris Egerton <chr...@aiven.io.invalid>
> wrote:
>
> > Hi Alieh,
> >
> > I don't love defining the changes for this KIP in terms of a catch clause
> > in the KafkaProducer class, for two reasons. First, the set of errors
> that
> > are handled by that clause may shift over time as the code base is
> > modified, and second, it would be fairly opaque to users who want to
> > understand whether an error would be affected by using this API or not.
> >
> > It also seems strange that we'd handle some types of
> > RecordTooLargeException (i.e., ones reported client-side) with this API,
> > but not others (i.e., ones reported by a broker).
> >
> > I think this kind of API would be most powerful, most intuitive to users,
> > and easiest to document if we expanded the scope to all
> record-send-related
> > errors, except anything indicating issues with exactly-once semantics.
> That
> > would include records that are too large (when caught both client- and
> > server-side), records that can't be sent due to authorization failures,
> > records sent to nonexistent topics/topic partitions, and keyless records
> > sent to compacted topics. It would not include
> > ProducerFencedException, InvalidProducerEpochException,
> > UnsupportedVersionException,
> > and possibly others.
> >
> > @Justine -- do you think it would be possible to develop either a better
> > definition for the kinds of "excluded" errors that should not be covered
> by
> > this API, or, barring that, a comprehensive list of exact error types?
> And
> > do you think this would be acceptable in terms of risk and complexity?
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Jul 2, 2024 at 5:05 PM Alieh Saeedi <asae...@confluent.io.invalid
> >
> > wrote:
> >
> > > Hey Justine,
> > >
> > > About the consequences: the consequences will be like when we did not
> > have
> > > the fix made in `KAFKA-9279`: silent loss of data! Obviously, when the
> > user
> > > intentionally chose to ignore errors, that would not be silent any
> more.
> > > Right?
> > > Of course, considering all types of `ApiException`s would be too broad.
> > But
> > > are the exceptions caught in `catch(ApiException e)` of the `doSend()`
> > > method also too broad?
> > >
> > > -Alieh
> > >
> > > On Tue, Jul 2, 2024 at 9:45 PM Justine Olshan
> > <jols...@confluent.io.invalid
> > > >
> > > wrote:
> > >
> > > > Hey Alieh,
> > > >
> > > > If we want to allow any error to be ignored we should probably run
> > > through
> > > > all the errors to make sure they make sense.
> > > > I just want to feel confident that we aren't just making a decision
> > > without
> > > > considering the consequences carefully.
> > > >
> > > > Justine
> > > >
> > > > On Tue, Jul 2, 2024 at 12:30 PM Alieh Saeedi
> > > <asae...@confluent.io.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Hey Justine,
> > > > >
> > > > > yes we talked about `RecordTooLargeException` as an example, but
> did
> > we
> > > > > ever limit ourselves to only this specific exception? I think
> neither
> > > in
> > > > > the KIP nor in the PR.  As Chris mentioned, this KIP is going to
> undo
> > > > what
> > > > > we have done in `KAFKA-9279` in case 1) the user is in a
> transaction
> > > and
> > > > 2)
> > > > > he decides to ignore the errors in which the record was not even
> > added
> > > to
> > > > > the batch. Yes, and we suggested some methods for undoing or, in
> > fact,
> > > > > moving back the transaction from the error state in `flush` or in
> > > > > `commitTnx` and we finally came to the idea of not even doing the
> > > changes
> > > > > (better than undoing) in `send`.
> > > > >
> > > > > Bests,
> > > > > Alieh
> > > > >
> > > > > On Tue, Jul 2, 2024 at 8:03 PM Justine Olshan
> > > > <jols...@confluent.io.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey folks,
> > > > > >
> > > > > > I understand where you are coming from by asking for specific use
> > > > cases.
> > > > > My
> > > > > > understanding based on previous conversations was that there
> were a
> > > few
> > > > > > different errors that have been seen.
> > > > > > One example I heard some information about was when the record
> was
> > > too
> > > > > > large and it fails the batch. Besides that, I'm not really sure
> if
> > > > there
> > > > > > are cases in mind, though it is fair to ask on those and bring
> them
> > > up.
> > > > > >
> > > > > > > Does a record qualify as a poison pill if it targets a topic
> that
> > > > > > doesn't exist? Or if it targets a topic that the producer
> principal
> > > > lacks
> > > > > > ACLs for? What if it fails broker-side validation (e.g., has a
> null
> > > key
> > > > > for
> > > > > > a compacted topic)?
> > > > > >
> > > > > > I think there was some parallel work with addressing the
> > > > > > UnknownTopicOrPartitionError in another way. As for the other
> > checks,
> > > > > acls,
> > > > > > validation etc. I am not aware of that being in Alieh's scope,
> but
> > we
> > > > > > should be clear about exactly what we are doing.
> > > > > >
> > > > > > All errors that fall into ApiException seems too broad to me.
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > > On Tue, Jul 2, 2024 at 10:51 AM Alieh Saeedi
> > > > > <asae...@confluent.io.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Chris,
> > > > > > > thanks for sharing your concerns.
> > > > > > >
> > > > > > > 1) About the language of KIP (or maybe later in Javadocs): Is
> > that
> > > > > > alright
> > > > > > > if I write all errors that fall into the `ApiException`
> category
> > > > thrown
> > > > > > > (actually returned) by Producer?
> > > > > > > 2) About future expansion: do you have any better suggestions
> for
> > > > enum
> > > > > > > names? Do you think `IGNORE_API_EXEPTIONS` or something like
> that
> > > is
> > > > a
> > > > > > > "better/more accurate" one?
> > > > > > >
> > > > > > > Bests,
> > > > > > > Alieh
> > > > > > >
> > > > > > > On Tue, Jul 2, 2024 at 7:29 PM Chris Egerton
> > > <chr...@aiven.io.invalid
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Alieh and Justine,
> > > > > > > >
> > > > > > > > I'm concerned that we're settling on a definition of "poison
> > > pill"
> > > > > > that's
> > > > > > > > easiest to tackle right now but may lead to shortcomings down
> > the
> > > > > > road. I
> > > > > > > > understand the relationship between this KIP and KAFKA-9279,
> > and
> > > I
> > > > > can
> > > > > > > > totally get behind the desire to keep things small, focused,
> > and
> > > > > simple
> > > > > > > in
> > > > > > > > the name of avoiding bugs. However, what I don't think is
> clear
> > > at
> > > > > all
> > > > > > is
> > > > > > > > what the "specific circumstances" are that Justine
> mentioned. I
> > > > had a
> > > > > > > > drastically different idea of what the intended behavioral
> > change
> > > > > would
> > > > > > > be
> > > > > > > > before looking at the draft PR.
> > > > > > > >
> > > > > > > > I would like 1) for us to be clearer about the categories of
> > > errors
> > > > > > that
> > > > > > > we
> > > > > > > > want to cover with this new API (especially since we'll have
> to
> > > > find
> > > > > a
> > > > > > > > clear, succinct way to document this for users), and 2) to
> make
> > > > sure
> > > > > > that
> > > > > > > > if we do try to expand this API in the future, that we won't
> be
> > > > > painted
> > > > > > > > into a corner.
> > > > > > > >
> > > > > > > > For item 1, hopefully we can agree that the language in the
> KIP
> > > > > > > > for IGNORE_SEND_ERRORS ("The records causing irrecoverable
> > errors
> > > > are
> > > > > > > > excluded from the batch and the transaction is committed
> > > > > > successfully.")
> > > > > > > is
> > > > > > > > pretty vague. If we start using the phrase "poison pill
> record"
> > > > that
> > > > > > > could
> > > > > > > > help, but IMO more detail would still be needed. We know that
> > we
> > > > want
> > > > > > to
> > > > > > > > include records that are so large that they can be
> immediately
> > > > > rejected
> > > > > > > by
> > > > > > > > the producer. But there are other cases that users might
> expect
> > > to
> > > > be
> > > > > > > > handled. Does a record qualify as a poison pill if it
> targets a
> > > > topic
> > > > > > > that
> > > > > > > > doesn't exist? Or if it targets a topic that the producer
> > > principal
> > > > > > lacks
> > > > > > > > ACLs for? What if it fails broker-side validation (e.g., has
> a
> > > null
> > > > > key
> > > > > > > for
> > > > > > > > a compacted topic)?
> > > > > > > >
> > > > > > > > For item 2, this really depends on how narrow the scope of
> what
> > > > we're
> > > > > > > doing
> > > > > > > > right now is. If we only handle a subset of the examples I
> laid
> > > out
> > > > > > above
> > > > > > > > that could possibly be considered poison pills with this KIP,
> > do
> > > we
> > > > > > want
> > > > > > > to
> > > > > > > > lock ourselves in to never addressing more in the future, or
> > can
> > > we
> > > > > > > choose
> > > > > > > > an API (probably just enum names would be the only important
> > > > decision
> > > > > > > here)
> > > > > > > > that leaves room for more later?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Chris
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jul 2, 2024 at 12:28 PM Justine Olshan
> > > > > > > > <jols...@confluent.io.invalid>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Chris and Alieh,
> > > > > > > > >
> > > > > > > > > My understanding is that this KIP is really only trying to
> > > solve
> > > > an
> > > > > > > issue
> > > > > > > > > of a "poison pill" record that fails send().
> > > > > > > > > We've talked a lot about having a generic framework for all
> > > > errors,
> > > > > > > but I
> > > > > > > > > don't think that is what this KIP is trying to do.
> > Essentially
> > > > the
> > > > > > > > request
> > > > > > > > > is to undo the change from KAFKA-9279
> > > > > > > > > <https://issues.apache.org/jira/browse/KAFKA-9279> but
> under
> > > > > > specific
> > > > > > > > > circumstances that are controlled. I really am concerned
> > about
> > > > > > opening
> > > > > > > > new
> > > > > > > > > avenues for bugs with EOS and hesitate to handle any other
> > > types
> > > > of
> > > > > > > > errors.
> > > > > > > > > I think if we all agree on the problem that we are trying
> to
> > > > solve,
> > > > > > it
> > > > > > > is
> > > > > > > > > easier to agree on solutions.
> > > > > > > > >
> > > > > > > > > Justine
> > > > > > > > >
> > > > > > > > > On Mon, Jul 1, 2024 at 2:20 AM Alieh Saeedi
> > > > > > > <asae...@confluent.io.invalid
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Matthias,
> > > > > > > > > > Thanks for the valid points you mentioned. I updated the
> > KIP
> > > > and
> > > > > > the
> > > > > > > PR
> > > > > > > > > > with:
> > > > > > > > > > 1) mentioning that the new overloaded `send` throws
> > > > > > > > > `IllegalStateException`
> > > > > > > > > > if the user tries to ignore `send()` errors outside of a
> > > > > > transaction.
> > > > > > > > > > 2) the default implementation in `Producer` interface
> > throws
> > > an
> > > > > > > > > > `UnsupportedOperationException`
> > > > > > > > > >
> > > > > > > > > > Hi Chris,
> > > > > > > > > > Thanks for the feedback. I tried to clarify the points
> you
> > > > > listed:
> > > > > > > > > > -------> we've narrowed the scope from any error that
> might
> > > > take
> > > > > > > place
> > > > > > > > > with
> > > > > > > > > > producing a record to Kafka, to only the ones that are
> > thrown
> > > > > > > directly
> > > > > > > > > from
> > > > > > > > > > Producer::send;
> > > > > > > > > >
> > > > > > > > > > From the very beginning and even since KIP-1038, the main
> > > > purpose
> > > > > > was
> > > > > > > > to
> > > > > > > > > > have "more flexibility," or, in other words, "giving the
> > user
> > > > the
> > > > > > > > > > authority" to handle some specific exceptions thrown from
> > the
> > > > > > > > `Producer`.
> > > > > > > > > > Due to the specific cases we had in mind, KIP-1038 was
> > > > discarded
> > > > > > and
> > > > > > > we
> > > > > > > > > > decided to not define a `CustomExceptionHandler` for
> > > `Producer`
> > > > > and
> > > > > > > > > instead
> > > > > > > > > > treat the `send` failures in a different way. The main
> > issue
> > > is
> > > > > > that
> > > > > > > > > `send`
> > > > > > > > > > makes a transition to error state, which is undoable. In
> > > fact,
> > > > > one
> > > > > > > > single
> > > > > > > > > > poison pill record makes the whole batch fail. The former
> > > > > > suggestions
> > > > > > > > > that
> > > > > > > > > > you agreed with have been all about un-doing this
> > transition
> > > in
> > > > > > > `flush`
> > > > > > > > > or
> > > > > > > > > > `commit`. The new suggestion is to un-do (or better, NOT
> > do)
> > > in
> > > > > > > `send`
> > > > > > > > > due
> > > > > > > > > > to the reasons listed in the discussions above.
> > > > > > > > > > Moreover, I would say that having such a large scope as
> you
> > > > > > mentioned
> > > > > > > > is
> > > > > > > > > > impossible. In the best case, we may have control over
> the
> > > > > > > `Producer`.
> > > > > > > > > What
> > > > > > > > > > shall we do with the broker? The `any error that might
> take
> > > > place
> > > > > > > with
> > > > > > > > > > producing a record to Kafka` is too much, I think.
> > > > > > > > > >
> > > > > > > > > > -------> is this all we want to handle, and will it
> prevent
> > > us
> > > > > from
> > > > > > > > > > handling more in the future in an intuitive way?
> > > > > > > > > >
> > > > > > > > > > I think yes. This is all we want. Other sorts of errors
> > such
> > > as
> > > > > > > having
> > > > > > > > > > problem with partition addition, producer fenced
> exception,
> > > etc
> > > > > > seem
> > > > > > > to
> > > > > > > > > be
> > > > > > > > > > more serious issues. The intention was to handle problems
> > > > created
> > > > > > by
> > > > > > > > > > (maybe) a single poison pill record. BTW, I do not see
> any
> > > > > > obstacles
> > > > > > > to
> > > > > > > > > > future changes.
> > > > > > > > > >
> > > > > > > > > > Bests,
> > > > > > > > > > Alieh
> > > > > > > > > >
> > > > > > > > > > On Sat, Jun 29, 2024 at 3:03 AM Chris Egerton
> > > > > > > <chr...@aiven.io.invalid
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Ah, sorry--spoke too soon. The PR doesn't show that
> > errors
> > > > > thrown
> > > > > > > > from
> > > > > > > > > > > Producer::send are handled, but instead, ApiException
> > > > instances
> > > > > > > that
> > > > > > > > > are
> > > > > > > > > > > caught inside KafkaProducer::doSend and are handled by
> > > > > returning
> > > > > > an
> > > > > > > > > > > already-failed future are. I think the same question
> > still
> > > > > > applies
> > > > > > > > (is
> > > > > > > > > > this
> > > > > > > > > > > all we want to handle, and will it prevent us from
> > handling
> > > > > more
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > future in an intuitive way), though.
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 28, 2024 at 8:57 PM Chris Egerton <
> > > > chr...@aiven.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Alieh,
> > > > > > > > > > > >
> > > > > > > > > > > > This KIP has evolved a lot since I last looked at it,
> > but
> > > > the
> > > > > > > > changes
> > > > > > > > > > > seem
> > > > > > > > > > > > well thought-out both in semantics and API. One
> > > clarifying
> > > > > > > > question I
> > > > > > > > > > > have
> > > > > > > > > > > > is that it looks based on the draft PR that we've
> > > narrowed
> > > > > the
> > > > > > > > scope
> > > > > > > > > > from
> > > > > > > > > > > > any error that might take place with producing a
> record
> > > to
> > > > > > Kafka,
> > > > > > > > to
> > > > > > > > > > only
> > > > > > > > > > > > the ones that are thrown directly from
> Producer::send;
> > is
> > > > > that
> > > > > > > the
> > > > > > > > > > > intended
> > > > > > > > > > > > behavior here? And if so, do you have thoughts on how
> > we
> > > > > might
> > > > > > > > > design a
> > > > > > > > > > > > follow-up KIP that would catch all errors (including
> > ones
> > > > > > > reported
> > > > > > > > > > > > asynchronously instead of synchronously)? I'd like it
> > if
> > > we
> > > > > > could
> > > > > > > > > leave
> > > > > > > > > > > the
> > > > > > > > > > > > door open for that without painting ourselves into
> too
> > > much
> > > > > of
> > > > > > a
> > > > > > > > > corner
> > > > > > > > > > > > with the API design for this KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > >
> > > > > > > > > > > > Chris
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax <
> > > > > > > mj...@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Thanks Alieh,
> > > > > > > > > > > >>
> > > > > > > > > > > >> it seems this KIP can just pick between a couple of
> > > > > tradeoffs.
> > > > > > > > > Adding
> > > > > > > > > > an
> > > > > > > > > > > >> overloaded `send()` as the KIP propose makes sense
> to
> > me
> > > > and
> > > > > > > seems
> > > > > > > > > to
> > > > > > > > > > > >> provides the cleanest solution compare to there
> > options
> > > we
> > > > > > > > > discussed.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Given the explicit name of the passed-in option that
> > > > > > highlights
> > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > > >> option is for TX only make is pretty clear and
> avoids
> > > the
> > > > > > issue
> > > > > > > of
> > > > > > > > > > > >> `flush()` ambiguity.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> Nit: We should make clear on the KIP though, that
> the
> > > new
> > > > > > > `send()`
> > > > > > > > > > > >> overload would throw an `IllegalStateException` if
> TX
> > > are
> > > > > not
> > > > > > > used
> > > > > > > > > > > >> (similar to other TX methods like initTx(), etc)
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> About the `Producer` interface, I am not sure how
> this
> > > was
> > > > > > done
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > >> past (eg, KIP-266 added `Consumer.poll(Duration)`
> w/o
> > a
> > > > > > default
> > > > > > > > > > > >> implementation), if we need a default implementation
> > for
> > > > > > > backward
> > > > > > > > > > > >> compatibility or not? If we do want to add one, I
> > think
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > > > >> appropriate to throw an
> > `UnsupportedOperationException`
> > > by
> > > > > > > > default,
> > > > > > > > > > > >> instead of just keeping the default impl empty?
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> My points are rather minor, and should not block
> this
> > > KIP
> > > > > > > though.
> > > > > > > > > > > >> Overall LGTM.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> -Matthias
> > > > > > > > > > > >>
> > > > > > > > > > > >> On 6/27/24 1:28 PM, Alieh Saeedi wrote:
> > > > > > > > > > > >> > Hi Justine,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks for the suggestion.
> > > > > > > > > > > >> > Making applications to validate every single
> record
> > is
> > > > not
> > > > > > the
> > > > > > > > > best
> > > > > > > > > > > way,
> > > > > > > > > > > >> > from an efficiency point of view.
> > > > > > > > > > > >> > Moreover, between changing the behavior of the
> > > Producer
> > > > in
> > > > > > > > `send`
> > > > > > > > > > and
> > > > > > > > > > > >> > `commitTnx`, the former seems more reasonable and
> > > clean.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Bests,
> > > > > > > > > > > >> > Alieh
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan
> > > > > > > > > > > >> <jols...@confluent.io.invalid>
> > > > > > > > > > > >> > wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> >> Hey Alieh,
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> I see there are two options now. So folks will be
> > > > > > discussing
> > > > > > > > the
> > > > > > > > > > > >> approaches
> > > > > > > > > > > >> >> and deciding the best way forward before we vote?
> > > > > > > > > > > >> >> I do think there could be a problem with the
> > approach
> > > > on
> > > > > > > commit
> > > > > > > > > if
> > > > > > > > > > we
> > > > > > > > > > > >> get
> > > > > > > > > > > >> >> stuck on an earlier error and have more records
> > > > > > (potentially
> > > > > > > on
> > > > > > > > > new
> > > > > > > > > > > >> >> partitions) to commit as the current PR is
> > > implemented.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> I guess this takes us back to the question of
> > whether
> > > > the
> > > > > > > error
> > > > > > > > > > > should
> > > > > > > > > > > >> be
> > > > > > > > > > > >> >> cleared on send.
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> (And I guess at the back of my mind, I'm
> wondering
> > if
> > > > > there
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > way
> > > > > > > > > > > >> we can
> > > > > > > > > > > >> >> validate the "posion pill" records application
> side
> > > > > before
> > > > > > we
> > > > > > > > > even
> > > > > > > > > > > try
> > > > > > > > > > > >> to
> > > > > > > > > > > >> >> send them)
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> Justine
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi
> > > > > > > > > > > >> <asae...@confluent.io.invalid
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >> wrote:
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >>> Hi Justine,
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>> I did not update the KIP with `TxnSendOption`
> > since
> > > I
> > > > > > > thought
> > > > > > > > > it'd
> > > > > > > > > > > be
> > > > > > > > > > > >> >>> better discussed here beforehand.
> > > > > > > > > > > >> >>> right now, there are 2 PRs:
> > > > > > > > > > > >> >>> - the PR that implements the current version of
> > the
> > > > KIP:
> > > > > > > > > > > >> >>> https://github.com/apache/kafka/pull/16332
> > > > > > > > > > > >> >>> - the POC PR that clarifies the `TxnSendOption`:
> > > > > > > > > > > >> >>> https://github.com/apache/kafka/pull/16465
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>> Bests,
> > > > > > > > > > > >> >>> Alieh
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>> On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan
> > > > > > > > > > > >> >>> <jols...@confluent.io.invalid> wrote:
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>>> Hey Alieh,
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>> I think I am a little confused. Are the 3
> points
> > > > above
> > > > > > > > > addressed
> > > > > > > > > > by
> > > > > > > > > > > >> the
> > > > > > > > > > > >> >>> KIP
> > > > > > > > > > > >> >>>> or did something change? The PR seems to not
> > > include
> > > > > this
> > > > > > > > > change
> > > > > > > > > > > and
> > > > > > > > > > > >> >>> still
> > > > > > > > > > > >> >>>> has the CommitOption as well.
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>> Thanks,
> > > > > > > > > > > >> >>>> Justine
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>> On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi
> > > > > > > > > > > >> >>> <asae...@confluent.io.invalid
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>> wrote:
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>>> Hi all,
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> Looking at the PR <
> > > > > > > > https://github.com/apache/kafka/pull/16332
> > > > > > > > > >
> > > > > > > > > > > >> >>>>> corresponding to the KIP, there are some
> points
> > > > worthy
> > > > > > of
> > > > > > > > > > mention:
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> 1) clearing the error ends up dirty/messy code
> > in
> > > > > > > > > > > >> >> `TransactionManager`.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> 2) By clearing the error, we are actually
> doing
> > an
> > > > > > illegal
> > > > > > > > > > > >> transition
> > > > > > > > > > > >> >>>> from
> > > > > > > > > > > >> >>>>> `ABORTABLE_ERROR` to `IN_TRANSACTION` which is
> > > > > > > conceptually
> > > > > > > > > not
> > > > > > > > > > > >> >>>> acceptable.
> > > > > > > > > > > >> >>>>> This can be the root cause of some issues,
> with
> > > > > perhaps
> > > > > > > > > further
> > > > > > > > > > > >> >> future
> > > > > > > > > > > >> >>>>> changes by others.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> 3) If the poison pill record `r1` causes a
> > > > transition
> > > > > to
> > > > > > > the
> > > > > > > > > > error
> > > > > > > > > > > >> >>> state
> > > > > > > > > > > >> >>>>> and then the next record `r2` requires adding
> a
> > > > > > partition
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > >> >>>>> transaction, the action fails due to being in
> > the
> > > > > error
> > > > > > > > state.
> > > > > > > > > > In
> > > > > > > > > > > >> >> this
> > > > > > > > > > > >> >>>>> case, clearing errors during
> > > > > > `commitTnx(CLEAR_SEND_ERROR)`
> > > > > > > > is
> > > > > > > > > > too
> > > > > > > > > > > >> >> late.
> > > > > > > > > > > >> >>>>> However, this case can NOT be the main concern
> > as
> > > > soon
> > > > > > as
> > > > > > > > > > KIP-890
> > > > > > > > > > > is
> > > > > > > > > > > >> >>>> fully
> > > > > > > > > > > >> >>>>> implemented.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> My suggestion is to solve the problem where it
> > > > arises.
> > > > > > If
> > > > > > > > the
> > > > > > > > > > > >> >>> transition
> > > > > > > > > > > >> >>>> to
> > > > > > > > > > > >> >>>>> the error state does not happen during
> `send()`,
> > > we
> > > > do
> > > > > > not
> > > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > >> >>> clear
> > > > > > > > > > > >> >>>>> the error later. Therefore, instead of
> > > > `CommitOption`,
> > > > > > we
> > > > > > > > can
> > > > > > > > > > > define
> > > > > > > > > > > >> >> a
> > > > > > > > > > > >> >>>>> `TxnSendOption` and prevent the `send()`
> method
> > > from
> > > > > > going
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > >> >> error
> > > > > > > > > > > >> >>>>> state in case 1) we're in a transaction and 2)
> > the
> > > > > user
> > > > > > > > asked
> > > > > > > > > > for
> > > > > > > > > > > >> >>>>> IGONRE_SEND_ERRORS. For more clarity, you can
> > > take a
> > > > > > look
> > > > > > > at
> > > > > > > > > the
> > > > > > > > > > > POC
> > > > > > > > > > > >> >> PR
> > > > > > > > > > > >> >>>>> <https://github.com/apache/kafka/pull/16465>.
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>> Cheers,
> > > > > > > > > > > >> >>>>> Alieh
> > > > > > > > > > > >> >>>>>
> > > > > > > > > > > >> >>>>
> > > > > > > > > > > >> >>>
> > > > > > > > > > > >> >>
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to