Hey Boyang,

I think maybe there's something cracking here :) I'm just asking for
clarifications that as of today, which non-fatal exceptions the newly
introduced CommitFailedException would cover, and it seems to be only 1)
unknown pid, 2) invalid pid mapping, and 3) concurrent transactions. Is
that correct?

On Thu, Jan 28, 2021 at 5:06 PM Boyang Chen <reluctanthero...@gmail.com>
wrote:

> Hey Guozhang, I think TimeoutException would not be covered here as it
> potentially has a risk of hitting an illegal state on the broker side when
> the previous commit was actually successful. Users should try to increase
> their max.block.ms to avoid hitting the timeout as a base suggestion,
> which
> is discussed in the KIP.
>
> As for the umbrella exception, I agree it has values to some extent, but in
> terms of adoption difficulty, additional exception types usually make the
> error handling more complex than simplified, and we are doing our best to
> avoid compatibility issues caused by wrapping previously thrown raw
> exceptions that invalidates user error handling. And I'm also not in favor
> of our exception hierarchy today where all exceptions are subclasses of
> KafkaException, and we already mixed its handling with other specific
> exception types that could be caught as superclass KafkaException. That
> could be addressed in a separate proposal if we see necessary.
>
> Boyang
>
> On Thu, Jan 28, 2021 at 3:28 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Thanks Jason for the suggestion, that looks good to me too.
> >
> > Regarding the non-fatal exceptions wrapped as CommitFailed, I would like
> to
> > clarify if we would cover all the following cases: 1) timeout, 2) unknown
> > pid, 3) invalid pid mapping, 4) concurrent transactions?
> >
> > BTW I think it still makes sense to use an umbrella exception in case in
> > the future we add more non-fatal cases even though today we only have a
> > few.
> >
> >
> > Guozhang
> >
> > On Thu, Jan 28, 2021 at 1:08 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Thanks Jason, I agree with the proposed solution here, will update the
> > KIP.
> > >
> > > On Thu, Jan 28, 2021 at 10:52 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi Boyang,
> > > >
> > > > It seems like a reasonable suggestion. I wonder if a flag is
> sufficient
> > > > though. The current `Callback` documentation treats "fatal" errors
> from
> > > the
> > > > perspective of the individual message that was sent.
> > > >
> > > > ```
> > > >     *                  Non-Retriable exceptions (fatal, the message
> > will
> > > > never be sent):
> > > > ```
> > > >
> > > > However, we also have fatal errors from the perspective of the
> > > transaction
> > > > (e.g. when the producer gets fenced). Perhaps that suggests we need
> > > > something richer than a boolean flag:
> > > >
> > > > At a high level, I think the following cases are possible:
> > > >
> > > > - message rejected (e.g. message too large, invalid topic)
> > > > - delivery failed after retries/delivery timeout (e.g. timeout, crc
> > > error,
> > > > not enough replicas)
> > > > - transaction failed (e.g. producer fenced, invalid transaction
> state)
> > > >
> > > > Perhaps instead we can have a type like the following:
> > > >
> > > > class SendFailure {
> > > >   FailureType failureType;
> > > >   Exception cause;
> > > > }
> > > >
> > > > enum FailureType {
> > > >   MESSSAGE_REJECTED, DELIVERY_FAILED, TRANSACTION_FAILED
> > > > }
> > > >
> > > > (Not married to any of these names, just a starting point.)
> > > >
> > > > Then we add a new `onCompletion` as you've suggested:
> > > >
> > > > default void onCompletion(RecordMetadata metadata, SendFailure
> > failure) {
> > > >   this.onCompletion(metadata, failure.cause());
> > > > }
> > > >
> > > > This would give streams and other applications enough information to
> > know
> > > > whether the message can be retried and whether the transaction can be
> > > > aborted.
> > > >
> > > > What do you think?
> > > >
> > > > -Jason
> > > >
> > > >
> > > > On Wed, Jan 27, 2021 at 9:51 PM Boyang Chen <
> > reluctanthero...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Jason for the thoughts.
> > > > >
> > > > > On Wed, Jan 27, 2021 at 11:52 AM Jason Gustafson <
> ja...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Boyang,
> > > > > >
> > > > > > Thanks for the iterations here. I think this is something we
> should
> > > > have
> > > > > > done a long time ago. It sounds like there are two API changes
> > here:
> > > > > >
> > > > > > 1. We are introducing the `CommitFailedException` to wrap
> abortable
> > > > > > errors that are raised from `commitTransaction`. This sounds fine
> > to
> > > > me.
> > > > > As
> > > > > > far as I know, the only case we might need this is when we add
> > > support
> > > > to
> > > > > > let producers recover from coordinator timeouts. Are there any
> > > others?
> > > > > >
> > > > > > I think the purpose here is to ensure non-fatal exceptions are
> > > unified
> > > > > under the same
> > > > > exception umbrella, to make the proceeding to abort any ongoing
> > > > transaction
> > > > > much easier.
> > > > > I don't think `coordinator timeouts` is the only case to recover
> > here,
> > > > > since we have other
> > > > > non-fatal exceptions such as UnknownPid.
> > > > >
> > > > > 2. We are wrapping non-fatal errors raised from `send` as
> > > > `KafkaException`.
> > > > > > The motivation for this is less clear to me and it doesn't look
> > like
> > > > the
> > > > > > example from the KIP depends on it. My concern here is
> > compatibility.
> > > > > > Currently we have the following documentation for the `Callback`
> > API:
> > > > > >
> > > > > > ```
> > > > > >      *                  Non-Retriable exceptions (fatal, the
> > message
> > > > will
> > > > > > never be sent):
> > > > > >      *
> > > > > >      *                  InvalidTopicException
> > > > > >      *                  OffsetMetadataTooLargeException
> > > > > >      *                  RecordBatchTooLargeException
> > > > > >      *                  RecordTooLargeException
> > > > > >      *                  UnknownServerException
> > > > > >      *                  UnknownProducerIdException
> > > > > >      *                  InvalidProducerEpochException
> > > > > >      *
> > > > > >      *                  Retriable exceptions (transient, may be
> > > covered
> > > > > by
> > > > > > increasing #.retries):
> > > > > >      *
> > > > > >      *                  CorruptRecordException
> > > > > >      *                  InvalidMetadataException
> > > > > >      *                  NotEnoughReplicasAfterAppendException
> > > > > >      *                  NotEnoughReplicasException
> > > > > >      *                  OffsetOutOfRangeException
> > > > > >      *                  TimeoutException
> > > > > >      *                  UnknownTopicOrPartitionException
> > > > > > ```
> > > > > >
> > > > > > If we wrap all the retriable exceptions documented here as
> > > > > > `KafkaException`, wouldn't that break any error handling that
> users
> > > > might
> > > > > > already have? it's gonna introduce a compatibility issue.
> > > > > >
> > > > > > The original intention was to simplify `send` callback error
> > handling
> > > > by
> > > > > doing exception wrapping, as on Streams level
> > > > > we have to prepare an exhausting list of exceptions to catch as
> > fatal,
> > > > and
> > > > > the same lengthy list to catch as
> > > > > non-fatal. It would be much easier if we got `hints` from the
> > callback.
> > > > > However,
> > > > > I agree there is a compatibility concern, what about deprecating
> the
> > > > > existing:
> > > > >
> > > > > void onCompletion(RecordMetadata metadata, Exception exception)
> > > > >
> > > > > and replace it with:
> > > > >
> > > > > default void onCompletion(RecordMetadata metadata, Exception
> > exception,
> > > > > boolean isFatal) {
> > > > >   this.onCompletion(metadata, exception);
> > > > > }
> > > > >
> > > > > to make sure new users get the benefit of understanding the
> fatality
> > > > based
> > > > > on the info presented by the producer?
> > > > >
> > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > >
> > > > > > On Sat, Jan 23, 2021 at 3:31 AM Hiringuru <i...@hiringuru.com>
> > > wrote:
> > > > > >
> > > > > > > Why  we are receiving all emails kindly remove us from
> > > > > > > dev@kafka.apache.org we don't want to receive emails anymore.
> > > > > > >
> > > > > > > Thanks
> > > > > > > > On 01/23/2021 4:14 AM Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks Boyang, yes I think I was confused about the different
> > > > > handling
> > > > > > of
> > > > > > > > two abortTxn calls, and now I get it was not intentional. I
> > > think I
> > > > > do
> > > > > > > not
> > > > > > > > have more concerns.
> > > > > > > >
> > > > > > > > On Fri, Jan 22, 2021 at 1:12 PM Boyang Chen <
> > > > > > reluctanthero...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the clarification Guozhang, I got your point
> that
> > we
> > > > > want
> > > > > > to
> > > > > > > > > have a consistent handling of fatal exceptions being thrown
> > > from
> > > > > the
> > > > > > > > > abortTxn. I modified the current template to move the fatal
> > > > > exception
> > > > > > > > > try-catch outside of the processing loop to make sure we
> > could
> > > > get
> > > > > a
> > > > > > > chance
> > > > > > > > > to close consumer/producer modules. Let me know what you
> > think.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Boyang
> > > > > > > > >
> > > > > > > > > On Fri, Jan 22, 2021 at 11:05 AM Boyang Chen <
> > > > > > > reluctanthero...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > My understanding is that abortTransaction would only
> throw
> > > when
> > > > > the
> > > > > > > > > > producer is in fatal state. For CommitFailed, the
> producer
> > > > should
> > > > > > > still
> > > > > > > > > be
> > > > > > > > > > in the abortable error state, so that abortTransaction
> call
> > > > would
> > > > > > not
> > > > > > > > > throw.
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 22, 2021 at 11:02 AM Guozhang Wang <
> > > > > wangg...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Or are you going to maintain some internal state such
> > that,
> > > > the
> > > > > > > > > >> `abortTransaction` in the catch block would never throw
> > > again?
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Jan 22, 2021 at 11:01 AM Guozhang Wang <
> > > > > > wangg...@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Boyang/Jason,
> > > > > > > > > >> >
> > > > > > > > > >> > I've also thought about this (i.e. using CommitFailed
> > for
> > > > all
> > > > > > > > > >> non-fatal),
> > > > > > > > > >> > but what I'm pondering is that, in the catch
> > > (CommitFailed)
> > > > > > block,
> > > > > > > > > what
> > > > > > > > > >> > would happen if the `producer.abortTransaction();`
> > throws
> > > > > again?
> > > > > > > > > should
> > > > > > > > > >> > that be captured as a fatal and cause the client to
> > close
> > > > > again.
> > > > > > > > > >> >
> > > > > > > > > >> > If yes, then naively the pattern would be:
> > > > > > > > > >> >
> > > > > > > > > >> > ...
> > > > > > > > > >> > catch (CommitFailedException e) {
> > > > > > > > > >> >         // Transaction commit failed with abortable
> > error,
> > > > > user
> > > > > > > could
> > > > > > > > > >> reset
> > > > > > > > > >> >         // the application state and resume with a new
> > > > > > > transaction.
> > > > > > > > > The
> > > > > > > > > >> > root
> > > > > > > > > >> >         // cause was wrapped in the thrown exception.
> > > > > > > > > >> >         resetToLastCommittedPositions(consumer);
> > > > > > > > > >> >         try {
> > > > > > > > > >> >             producer.abortTransaction();
> > > > > > > > > >> >         } catch (KafkaException e) {
> > > > > > > > > >> >             producer.close();
> > > > > > > > > >> >             consumer.close();
> > > > > > > > > >> >             throw e;
> > > > > > > > > >> >         }
> > > > > > > > > >> >     } catch (KafkaException e) {
> > > > > > > > > >> >         producer.close();
> > > > > > > > > >> >         consumer.close();
> > > > > > > > > >> >         throw e;
> > > > > > > > > >> >     }
> > > > > > > > > >> > ...
> > > > > > > > > >> >
> > > > > > > > > >> > Guozhang
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Jan 22, 2021 at 10:47 AM Boyang Chen <
> > > > > > > > > >> reluctanthero...@gmail.com>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >> Hey Guozhang,
> > > > > > > > > >> >>
> > > > > > > > > >> >> Jason and I were discussing the new API offline and
> > > decided
> > > > > to
> > > > > > > take
> > > > > > > > > >> >> another
> > > > > > > > > >> >> approach. Firstly, the reason not to invent a new API
> > > with
> > > > > > > returned
> > > > > > > > > >> >> boolean
> > > > > > > > > >> >> flag is for compatibility consideration, since old
> EOS
> > > code
> > > > > > > would not
> > > > > > > > > >> know
> > > > > > > > > >> >> that a given transaction commit was failed internally
> > as
> > > > they
> > > > > > > don't
> > > > > > > > > >> listen
> > > > > > > > > >> >> to the output flag. Our proposed alternative solution
> > is
> > > to
> > > > > let
> > > > > > > > > >> >> *commitTransaction
> > > > > > > > > >> >> throw CommitFailedException whenever the commit
> failed
> > > with
> > > > > > > non-fatal
> > > > > > > > > >> >> exception*, so that on the caller side the handling
> > logic
> > > > > > > becomes:
> > > > > > > > > >> >>
> > > > > > > > > >> >> try {
> > > > > > > > > >> >>         if (shouldCommit) {
> > > > > > > > > >> >>             producer.commitTransaction();
> > > > > > > > > >> >>         } else {
> > > > > > > > > >> >>             resetToLastCommittedPositions(consumer);
> > > > > > > > > >> >>             producer.abortTransaction();
> > > > > > > > > >> >>         }
> > > > > > > > > >> >>     } catch (CommitFailedException e) {
> > > > > > > > > >> >>         // Transaction commit failed with abortable
> > > error,
> > > > > user
> > > > > > > could
> > > > > > > > > >> >> reset
> > > > > > > > > >> >>         // the application state and resume with a
> new
> > > > > > > transaction.
> > > > > > > > > The
> > > > > > > > > >> >> root
> > > > > > > > > >> >>         // cause was wrapped in the thrown exception.
> > > > > > > > > >> >>         resetToLastCommittedPositions(consumer);
> > > > > > > > > >> >>         producer.abortTransaction();
> > > > > > > > > >> >>     } catch (KafkaException e) {
> > > > > > > > > >> >>         producer.close();
> > > > > > > > > >> >>         consumer.close();
> > > > > > > > > >> >>         throw e;
> > > > > > > > > >> >>     }
> > > > > > > > > >> >>
> > > > > > > > > >> >> This approach looks cleaner as all exception types
> > other
> > > > than
> > > > > > > > > >> CommitFailed
> > > > > > > > > >> >> will doom to be fatal, which is very easy to adopt
> for
> > > > users.
> > > > > > In
> > > > > > > the
> > > > > > > > > >> >> meantime, we still maintain the commitTxn behavior to
> > > throw
> > > > > > > instead
> > > > > > > > > of
> > > > > > > > > >> >> silently failing.
> > > > > > > > > >> >>
> > > > > > > > > >> >> In addition, we decided to drop the recommendation to
> > > > handle
> > > > > > > > > >> >> TimeoutException and leave it to the users to make
> the
> > > > call.
> > > > > > The
> > > > > > > > > >> downside
> > > > > > > > > >> >> for blindly calling abortTxn upon timeout is that we
> > > could
> > > > > > > result in
> > > > > > > > > an
> > > > > > > > > >> >> illegal state when the commit was already successful
> on
> > > the
> > > > > > > broker
> > > > > > > > > >> >> side. Without a good guarantee on the outcome,
> > > > > overcomplicating
> > > > > > > the
> > > > > > > > > >> >> template should not be encouraged IMHO.
> > > > > > > > > >> >>
> > > > > > > > > >> >> Let me know your thoughts on the new approach here,
> > thank
> > > > > you!
> > > > > > > > > >> >>
> > > > > > > > > >> >> Best,
> > > > > > > > > >> >> Boyang
> > > > > > > > > >> >>
> > > > > > > > > >> >> On Tue, Jan 19, 2021 at 11:11 AM Guozhang Wang <
> > > > > > > wangg...@gmail.com>
> > > > > > > > > >> >> wrote:
> > > > > > > > > >> >>
> > > > > > > > > >> >> > Thanks for your clarification on 2)/3), that makes
> > > sense.
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > On Tue, Jan 19, 2021 at 10:16 AM Boyang Chen <
> > > > > > > > > >> >> reluctanthero...@gmail.com>
> > > > > > > > > >> >> > wrote:
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > > Thanks for the input Guozhang, replied inline.
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > On Mon, Jan 18, 2021 at 8:57 PM Guozhang Wang <
> > > > > > > > > wangg...@gmail.com>
> > > > > > > > > >> >> > wrote:
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > > Hello Boyang,
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Thanks for the updated KIP. I read it again and
> > > have
> > > > > the
> > > > > > > > > >> following
> > > > > > > > > >> >> > > > thoughts:
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > 0. I'm a bit concerned that if commitTxn does
> not
> > > > throw
> > > > > > any
> > > > > > > > > >> >> non-fatal
> > > > > > > > > >> >> > > > exception, and instead we rely on the
> subsequent
> > > > > beginTxn
> > > > > > > call
> > > > > > > > > to
> > > > > > > > > >> >> > throw,
> > > > > > > > > >> >> > > it
> > > > > > > > > >> >> > > > may violate some callers with a pattern that
> > > relying
> > > > on
> > > > > > > > > >> commitTxn to
> > > > > > > > > >> >> > > > succeed to make some non-rollback operations.
> For
> > > > > > example:
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > beginTxn()
> > > > > > > > > >> >> > > > // do some read-write on my local DB
> > > > > > > > > >> >> > > > commitTxn()
> > > > > > > > > >> >> > > > // if commitTxn succeeds, then commit the DB
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > -------------
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > The issue is that, committing DB is a
> > non-rollback
> > > > > > > operation,
> > > > > > > > > and
> > > > > > > > > >> >> users
> > > > > > > > > >> >> > > may
> > > > > > > > > >> >> > > > just rely on commitTxn to return without error
> to
> > > > make
> > > > > > this
> > > > > > > > > >> >> > non-rollback
> > > > > > > > > >> >> > > > call. Of course we can just claim this pattern
> is
> > > not
> > > > > > > > > legitimate
> > > > > > > > > >> >> and is
> > > > > > > > > >> >> > > not
> > > > > > > > > >> >> > > > the right way of doing things, but many users
> may
> > > > > > naturally
> > > > > > > > > adopt
> > > > > > > > > >> >> this
> > > > > > > > > >> >> > > > pattern.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > So maybe we should still let commitTxn also
> throw
> > > > > > non-fatal
> > > > > > > > > >> >> exceptions,
> > > > > > > > > >> >> > > in
> > > > > > > > > >> >> > > > which case we would then call abortTxn again.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > But if we do this, the pattern becomes:
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > try {
> > > > > > > > > >> >> > > >    beginTxn()
> > > > > > > > > >> >> > > >    // do something
> > > > > > > > > >> >> > > > } catch (Exception) {
> > > > > > > > > >> >> > > >    shouldCommit = false;
> > > > > > > > > >> >> > > > }
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > if (shouldCommit) {
> > > > > > > > > >> >> > > >     try {
> > > > > > > > > >> >> > > >         commitTxn()
> > > > > > > > > >> >> > > >     } catch (...) {        // enumerate all
> fatal
> > > > > > > exceptions
> > > > > > > > > >> >> > > >         shutdown()
> > > > > > > > > >> >> > > >     } catch (KafkaException) {
> > > > > > > > > >> >> > > >         // non-fatal
> > > > > > > > > >> >> > > >         shouldCommit = false;
> > > > > > > > > >> >> > > >     }
> > > > > > > > > >> >> > > > }
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > if (!shouldCommit && running()) {
> > > > > > > > > >> >> > > > try {
> > > > > > > > > >> >> > > >         abortTxn()
> > > > > > > > > >> >> > > >     } catch (KafkaException) {
> > > > > > > > > >> >> > > >         // only throw fatal
> > > > > > > > > >> >> > > >         shutdown()
> > > > > > > > > >> >> > > >     }
> > > > > > > > > >> >> > > > }
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > ---------------------
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Which is much more complicated.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Thank makes me think, the alternative we have
> > > > discussed
> > > > > > > offline
> > > > > > > > > >> may
> > > > > > > > > >> >> be
> > > > > > > > > >> >> > > > better: let commitTxn() to return a boolean
> flag.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > * If it returns true, it means the commit
> > > succeeded.
> > > > > > Users
> > > > > > > can
> > > > > > > > > >> >> > > comfortably
> > > > > > > > > >> >> > > > continue and do any external non-rollback
> > > operations
> > > > if
> > > > > > > they
> > > > > > > > > >> like.
> > > > > > > > > >> >> > > > * If it returns false, it means the commit
> failed
> > > > with
> > > > > > > > > non-fatal
> > > > > > > > > >> >> error
> > > > > > > > > >> >> > > *AND
> > > > > > > > > >> >> > > > the txn has been aborted*. Users do not need to
> > > call
> > > > > > > abortTxn
> > > > > > > > > >> again.
> > > > > > > > > >> >> > > > * If it throws, then it means fatal errors.
> Users
> > > > > should
> > > > > > > shut
> > > > > > > > > >> down
> > > > > > > > > >> >> the
> > > > > > > > > >> >> > > > client.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > In this case, the pattern becomes:
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > try {
> > > > > > > > > >> >> > > >    beginTxn()
> > > > > > > > > >> >> > > >    // do something
> > > > > > > > > >> >> > > > } catch (Exception) {
> > > > > > > > > >> >> > > >    shouldCommit = false;
> > > > > > > > > >> >> > > > }
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > try {
> > > > > > > > > >> >> > > >     if (shouldCommit) {
> > > > > > > > > >> >> > > >         commitSucceeded = commitTxn()
> > > > > > > > > >> >> > > >     } else {
> > > > > > > > > >> >> > > >         // reset offsets, rollback operations,
> > etc
> > > > > > > > > >> >> > > >         abortTxn()
> > > > > > > > > >> >> > > >     }
> > > > > > > > > >> >> > > > } catch (KafkaException) {
> > > > > > > > > >> >> > > >     // only throw fatal
> > > > > > > > > >> >> > > >     shutdown()
> > > > > > > > > >> >> > > > }
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > if (commitSucceeded)
> > > > > > > > > >> >> > > >    // do other non-rollbackable things
> > > > > > > > > >> >> > > > else
> > > > > > > > > >> >> > > >    // reset offsets, rollback operations, etc
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > -------------------------
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Of course, if we want to have better visibility
> > > into
> > > > > what
> > > > > > > > > caused
> > > > > > > > > >> the
> > > > > > > > > >> >> > > commit
> > > > > > > > > >> >> > > > to fail and txn to abort. We can let the return
> > > type
> > > > be
> > > > > > an
> > > > > > > enum
> > > > > > > > > >> >> instead
> > > > > > > > > >> >> > > of
> > > > > > > > > >> >> > > > a flag. But my main idea is to still let the
> > > > commitTxn
> > > > > be
> > > > > > > the
> > > > > > > > > >> final
> > > > > > > > > >> >> > point
> > > > > > > > > >> >> > > > users can tell whether this txn succeeded or
> not,
> > > > > instead
> > > > > > > of
> > > > > > > > > >> >> relying on
> > > > > > > > > >> >> > > the
> > > > > > > > > >> >> > > > next beginTxn() call.
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > I agree that adding a boolean flag is indeed
> > useful
> > > > in
> > > > > > this
> > > > > > > > > case.
> > > > > > > > > >> >> Will
> > > > > > > > > >> >> > > update the KIP.
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > 1. Re: "while maintaining the behavior to throw
> > fatal
> > > > > > > exception
> > > > > > > > > in
> > > > > > > > > >> >> raw"
> > > > > > > > > >> >> > not
> > > > > > > > > >> >> > > > sure what do you mean by "throw" here. Are you
> > > > > proposing
> > > > > > > the
> > > > > > > > > >> >> callback
> > > > > > > > > >> >> > > would
> > > > > > > > > >> >> > > > *pass* (not throw) in any fatal exceptions when
> > > > > triggered
> > > > > > > > > without
> > > > > > > > > >> >> > > wrapping?
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Yes, I want to say *pass*, the benefit is to
> make
> > > the
> > > > > end
> > > > > > > > > user's
> > > > > > > > > >> >> > > expectation consistent
> > > > > > > > > >> >> > > regarding exception handling.
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > > 2. I'm not sure I understand the claim
> regarding
> > > the
> > > > > > > callback
> > > > > > > > > >> that
> > > > > > > > > >> >> "In
> > > > > > > > > >> >> > > EOS
> > > > > > > > > >> >> > > > setup, it is not required to handle the
> > exception".
> > > > Are
> > > > > > you
> > > > > > > > > >> >> proposing
> > > > > > > > > >> >> > > that,
> > > > > > > > > >> >> > > > e.g. in Streams, we do not try to handle any
> > > > exceptions
> > > > > > if
> > > > > > > EOS
> > > > > > > > > is
> > > > > > > > > >> >> > enabled
> > > > > > > > > >> >> > > > in the callback anymore, but just let
> commitTxn()
> > > > > itself
> > > > > > to
> > > > > > > > > fail
> > > > > > > > > >> to
> > > > > > > > > >> >> be
> > > > > > > > > >> >> > > > notified about the problem? Personally I think
> in
> > > > > Streams
> > > > > > > we
> > > > > > > > > >> should
> > > > > > > > > >> >> > just
> > > > > > > > > >> >> > > > make the handling logic of the callback to be
> > > > > consistent
> > > > > > > > > >> regardless
> > > > > > > > > >> >> of
> > > > > > > > > >> >> > > the
> > > > > > > > > >> >> > > > EOS settings (today we have different logic
> > > depending
> > > > > on
> > > > > > > this
> > > > > > > > > >> logic,
> > > > > > > > > >> >> > > which
> > > > > > > > > >> >> > > > I think could be unified as well).
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > My idea originates from the claim on send API:
> > > > > > > > > >> >> > > "When used as part of a transaction, it is not
> > > > necessary
> > > > > to
> > > > > > > > > define
> > > > > > > > > >> a
> > > > > > > > > >> >> > > callback or check the result of the future  in
> > order
> > > to
> > > > > > > detect
> > > > > > > > > >> errors
> > > > > > > > > >> >> > from
> > > > > > > > > >> >> > > <code>send</code>. "
> > > > > > > > > >> >> > > My understanding is that for EOS, the exception
> > will
> > > be
> > > > > > > detected
> > > > > > > > > by
> > > > > > > > > >> >> > calling
> > > > > > > > > >> >> > > transactional APIs either way, so it is a
> duplicate
> > > > > > handling
> > > > > > > to
> > > > > > > > > >> track
> > > > > > > > > >> >> > > all the sendExceptions in RecordCollector.
> > However, I
> > > > > > looked
> > > > > > > up
> > > > > > > > > >> >> > > sendException is being used today as follow:
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > 1. Pass to "ProductionExceptionHandler" for any
> > > default
> > > > > or
> > > > > > > > > >> customized
> > > > > > > > > >> >> > > exception handler to handle
> > > > > > > > > >> >> > > 2. Stop collecting offset info or new exceptions
> > > > > > > > > >> >> > > 3. Check and rethrow exceptions in close(),
> flush()
> > > or
> > > > > new
> > > > > > > send()
> > > > > > > > > >> >> calls
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > To my understanding, we should still honor the
> > > > commitment
> > > > > > to
> > > > > > > #1
> > > > > > > > > for
> > > > > > > > > >> >> any
> > > > > > > > > >> >> > > user customized implementation. The #2 does not
> > > really
> > > > > > > affect our
> > > > > > > > > >> >> > decision
> > > > > > > > > >> >> > > upon EOS. The #3 here is still valuable as it
> could
> > > > help
> > > > > us
> > > > > > > fail
> > > > > > > > > >> fast
> > > > > > > > > >> >> in
> > > > > > > > > >> >> > > new send() instead of waiting to later stage of
> > > > > processing.
> > > > > > > In
> > > > > > > > > that
> > > > > > > > > >> >> > sense,
> > > > > > > > > >> >> > > I agree we should continue to record send
> > exceptions
> > > > even
> > > > > > > under
> > > > > > > > > EOS
> > > > > > > > > >> >> case
> > > > > > > > > >> >> > to
> > > > > > > > > >> >> > > ensure the strength of stream side Producer
> logic.
> > On
> > > > the
> > > > > > > safer
> > > > > > > > > >> side,
> > > > > > > > > >> >> we
> > > > > > > > > >> >> > no
> > > > > > > > > >> >> > > longer need to wrap certain fatal exceptions like
> > > > > > > ProducerFenced
> > > > > > > > > as
> > > > > > > > > >> >> > > TaskMigrated, since they should not crash the
> > stream
> > > > > thread
> > > > > > > if
> > > > > > > > > >> thrown
> > > > > > > > > >> >> in
> > > > > > > > > >> >> > > raw format, once we adopt the new processing
> model
> > in
> > > > the
> > > > > > > send
> > > > > > > > > >> phase.
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > Guozhang
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > On Thu, Dec 17, 2020 at 8:42 PM Boyang Chen <
> > > > > > > > > >> >> > reluctanthero...@gmail.com>
> > > > > > > > > >> >> > > > wrote:
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > > Thanks for everyone's feedback so far. I have
> > > > > polished
> > > > > > > the
> > > > > > > > > KIP
> > > > > > > > > >> >> after
> > > > > > > > > >> >> > > > > offline discussion with some folks working on
> > EOS
> > > > to
> > > > > > > make the
> > > > > > > > > >> >> > exception
> > > > > > > > > >> >> > > > > handling more lightweight. The essential
> change
> > > is
> > > > > that
> > > > > > > we
> > > > > > > > > are
> > > > > > > > > >> not
> > > > > > > > > >> >> > > > > inventing a new intermediate exception type,
> > but
> > > > > > instead
> > > > > > > > > >> >> separating a
> > > > > > > > > >> >> > > > full
> > > > > > > > > >> >> > > > > transaction into two phases:
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > > > 1. The data transmission phase
> > > > > > > > > >> >> > > > > 2. The commit phase
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > > > This way for any exception thrown from phase
> 1,
> > > > will
> > > > > be
> > > > > > > an
> > > > > > > > > >> >> indicator
> > > > > > > > > >> >> > in
> > > > > > > > > >> >> > > > > phase 2 whether we should do commit or abort,
> > and
> > > > > from
> > > > > > > now on
> > > > > > > > > >> >> > > > > `commitTransaction` should only throw fatal
> > > > > exceptions,
> > > > > > > > > >> similar to
> > > > > > > > > >> >> > > > > `abortTransaction`, so that any
> KafkaException
> > > > caught
> > > > > > in
> > > > > > > the
> > > > > > > > > >> >> commit
> > > > > > > > > >> >> > > phase
> > > > > > > > > >> >> > > > > will be definitely fatal to crash the app.
> For
> > > more
> > > > > > > advanced
> > > > > > > > > >> users
> > > > > > > > > >> >> > such
> > > > > > > > > >> >> > > > as
> > > > > > > > > >> >> > > > > Streams, we have the ability to further wrap
> > > > selected
> > > > > > > types
> > > > > > > > > of
> > > > > > > > > >> >> fatal
> > > > > > > > > >> >> > > > > exceptions to trigger task migration if
> > > necessary.
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > > > More details in the KIP, feel free to take
> > > another
> > > > > > look,
> > > > > > > > > >> thanks!
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > > > On Thu, Dec 17, 2020 at 8:36 PM Boyang Chen <
> > > > > > > > > >> >> > > reluctanthero...@gmail.com>
> > > > > > > > > >> >> > > > > wrote:
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > > > > Thanks Bruno for the feedback.
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > > > On Mon, Dec 7, 2020 at 5:26 AM Bruno
> Cadonna
> > <
> > > > > > > > > >> >> br...@confluent.io>
> > > > > > > > > >> >> > > > wrote:
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > > >> Thanks Boyang for the KIP!
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> Like Matthias, I do also not know the
> > producer
> > > > > > > internal
> > > > > > > > > well
> > > > > > > > > >> >> > enough
> > > > > > > > > >> >> > > to
> > > > > > > > > >> >> > > > > >> comment on the categorization. However, I
> > > think
> > > > > > > having a
> > > > > > > > > >> super
> > > > > > > > > >> >> > > > exception
> > > > > > > > > >> >> > > > > >> (e.g. RetriableException) that  encodes if
> > an
> > > > > > > exception is
> > > > > > > > > >> >> fatal
> > > > > > > > > >> >> > or
> > > > > > > > > >> >> > > > not
> > > > > > > > > >> >> > > > > >> is cleaner, better understandable and less
> > > > > > > error-prone,
> > > > > > > > > >> because
> > > > > > > > > >> >> > > > ideally
> > > > > > > > > >> >> > > > > >> when you add a new non-fatal exception in
> > > future
> > > > > you
> > > > > > > just
> > > > > > > > > >> need
> > > > > > > > > >> >> to
> > > > > > > > > >> >> > > > think
> > > > > > > > > >> >> > > > > >> about letting it inherit from the super
> > > > exception
> > > > > > and
> > > > > > > all
> > > > > > > > > >> the
> > > > > > > > > >> >> rest
> > > > > > > > > >> >> > > of
> > > > > > > > > >> >> > > > > >> the code will just behave correctly
> without
> > > the
> > > > > need
> > > > > > > to
> > > > > > > > > wrap
> > > > > > > > > >> >> the
> > > > > > > > > >> >> > new
> > > > > > > > > >> >> > > > > >> exception into another exception each time
> > it
> > > is
> > > > > > > thrown
> > > > > > > > > >> (maybe
> > > > > > > > > >> >> it
> > > > > > > > > >> >> > is
> > > > > > > > > >> >> > > > > >> thrown at different location in the code).
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> As far as I understand the following
> > statement
> > > > > from
> > > > > > > your
> > > > > > > > > >> >> previous
> > > > > > > > > >> >> > > > e-mail
> > > > > > > > > >> >> > > > > >> is the reason that currently such a super
> > > > > exception
> > > > > > > is not
> > > > > > > > > >> >> > possible:
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> "Right now we have RetriableException
> type,
> > if
> > > > we
> > > > > > are
> > > > > > > > > going
> > > > > > > > > >> to
> > > > > > > > > >> >> > add a
> > > > > > > > > >> >> > > > > >> `ProducerRetriableException` type, we have
> > to
> > > > put
> > > > > > > this new
> > > > > > > > > >> >> > interface
> > > > > > > > > >> >> > > > as
> > > > > > > > > >> >> > > > > >> the parent of the RetriableException,
> > because
> > > > not
> > > > > > all
> > > > > > > > > thrown
> > > > > > > > > >> >> > > non-fatal
> > > > > > > > > >> >> > > > > >> exceptions are `retriable` in general for
> > > > > producer"
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> In the list of exceptions in your KIP, I
> > found
> > > > > > > non-fatal
> > > > > > > > > >> >> > exceptions
> > > > > > > > > >> >> > > > that
> > > > > > > > > >> >> > > > > >> do not inherit from RetriableException. I
> > > guess
> > > > > > those
> > > > > > > are
> > > > > > > > > >> the
> > > > > > > > > >> >> ones
> > > > > > > > > >> >> > > you
> > > > > > > > > >> >> > > > > >> are referring to in your statement:
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> InvalidProducerEpochException
> > > > > > > > > >> >> > > > > >> InvalidPidMappingException
> > > > > > > > > >> >> > > > > >> TransactionAbortedException
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> All of those exceptions are non-fatal and
> do
> > > not
> > > > > > > inherit
> > > > > > > > > >> from
> > > > > > > > > >> >> > > > > >> RetriableException. Is there a reason for
> > > that?
> > > > If
> > > > > > > they
> > > > > > > > > >> >> depended
> > > > > > > > > >> >> > > from
> > > > > > > > > >> >> > > > > >> RetriableException we would be a bit
> closer
> > > to a
> > > > > > super
> > > > > > > > > >> >> exception I
> > > > > > > > > >> >> > > > > >> mention above.
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> The reason is that sender may catch those
> > > > > exceptions
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > >> >> > > > > > ProduceResponse, and it currently does
> > infinite
> > > > > > > > > >> >> > > > > > retries on RetriableException. To make sure
> > we
> > > > > could
> > > > > > > > > trigger
> > > > > > > > > >> the
> > > > > > > > > >> >> > > > > > abortTransaction() by catching non-fatal
> > thrown
> > > > > > > > > >> >> > > > > > exceptions and reinitialize the txn state,
> we
> > > > chose
> > > > > > > not to
> > > > > > > > > >> let
> > > > > > > > > >> >> > those
> > > > > > > > > >> >> > > > > > exceptions inherit RetriableException so
> that
> > > > > > > > > >> >> > > > > > they won't cause infinite retry on sender.
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > > >> With OutOfOrderSequenceException and
> > > > > > > > > >> >> UnknownProducerIdException, I
> > > > > > > > > >> >> > > > think
> > > > > > > > > >> >> > > > > >> to understand that their fatality depends
> on
> > > the
> > > > > > type
> > > > > > > > > (i.e.
> > > > > > > > > >> >> > > > > >> configuration) of the producer. That makes
> > it
> > > > > > > difficult to
> > > > > > > > > >> >> have a
> > > > > > > > > >> >> > > > super
> > > > > > > > > >> >> > > > > >> exception that encodes the retriability as
> > > > > mentioned
> > > > > > > > > above.
> > > > > > > > > >> >> Would
> > > > > > > > > >> >> > it
> > > > > > > > > >> >> > > > be
> > > > > > > > > >> >> > > > > >> possible to introduce exceptions that
> > inherit
> > > > from
> > > > > > > > > >> >> > > RetriableException
> > > > > > > > > >> >> > > > > >> and that are thrown when those exceptions
> > are
> > > > > caught
> > > > > > > from
> > > > > > > > > >> the
> > > > > > > > > >> >> > > brokers
> > > > > > > > > >> >> > > > > >> and the type of the producer is such that
> > the
> > > > > > > exceptions
> > > > > > > > > are
> > > > > > > > > >> >> > > > retriable?
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> Yea, I think in general the exception type
> > > > mixing
> > > > > is
> > > > > > > > > >> difficult
> > > > > > > > > >> >> to
> > > > > > > > > >> >> > > get
> > > > > > > > > >> >> > > > > > them all right. I have already proposed
> > another
> > > > > > > solution
> > > > > > > > > >> based
> > > > > > > > > >> >> on
> > > > > > > > > >> >> > my
> > > > > > > > > >> >> > > > > > offline discussion with some folks working
> on
> > > EOS
> > > > > > > > > >> >> > > > > > to make the handling more straightforward
> for
> > > end
> > > > > > users
> > > > > > > > > >> without
> > > > > > > > > >> >> the
> > > > > > > > > >> >> > > > need
> > > > > > > > > >> >> > > > > > to distinguish exception fatality.
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > > >> Best,
> > > > > > > > > >> >> > > > > >> Bruno
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >> On 04.12.20 19:34, Guozhang Wang wrote:
> > > > > > > > > >> >> > > > > >> > Thanks Boyang for the proposal! I made a
> > > pass
> > > > > over
> > > > > > > the
> > > > > > > > > >> list
> > > > > > > > > >> >> and
> > > > > > > > > >> >> > > here
> > > > > > > > > >> >> > > > > are
> > > > > > > > > >> >> > > > > >> > some thoughts:
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 0) Although this is not part of the
> public
> > > > API,
> > > > > I
> > > > > > > think
> > > > > > > > > we
> > > > > > > > > >> >> > should
> > > > > > > > > >> >> > > > make
> > > > > > > > > >> >> > > > > >> sure
> > > > > > > > > >> >> > > > > >> > that the suggested pattern (i.e. user
> can
> > > > always
> > > > > > > call
> > > > > > > > > >> >> abortTxn()
> > > > > > > > > >> >> > > > when
> > > > > > > > > >> >> > > > > >> > handling non-fatal errors) are indeed
> > > > supported.
> > > > > > > E.g. if
> > > > > > > > > >> the
> > > > > > > > > >> >> txn
> > > > > > > > > >> >> > > is
> > > > > > > > > >> >> > > > > >> already
> > > > > > > > > >> >> > > > > >> > aborted by the broker side, then users
> can
> > > > still
> > > > > > > call
> > > > > > > > > >> >> abortTxn
> > > > > > > > > >> >> > > which
> > > > > > > > > >> >> > > > > >> would
> > > > > > > > > >> >> > > > > >> > not throw another exception but just be
> > > > treated
> > > > > > as a
> > > > > > > > > >> no-op.
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 1) *ConcurrentTransactionsException*: I
> > > think
> > > > > this
> > > > > > > error
> > > > > > > > > >> can
> > > > > > > > > >> >> > also
> > > > > > > > > >> >> > > be
> > > > > > > > > >> >> > > > > >> > returned but not documented yet. This
> > should
> > > > be
> > > > > a
> > > > > > > > > >> non-fatal
> > > > > > > > > >> >> > error.
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 2) *InvalidTxnStateException*: this
> error
> > is
> > > > > > > returned
> > > > > > > > > from
> > > > > > > > > >> >> > broker
> > > > > > > > > >> >> > > > when
> > > > > > > > > >> >> > > > > >> txn
> > > > > > > > > >> >> > > > > >> > state transition failed (e.g. it is
> trying
> > > to
> > > > > > > transit to
> > > > > > > > > >> >> > > > > complete-commit
> > > > > > > > > >> >> > > > > >> > while the current state is not
> > > > prepare-commit).
> > > > > > This
> > > > > > > > > error
> > > > > > > > > >> >> could
> > > > > > > > > >> >> > > > > >> indicates
> > > > > > > > > >> >> > > > > >> > a bug on the client internal code or the
> > > > broker
> > > > > > > code,
> > > > > > > > > OR a
> > > > > > > > > >> >> user
> > > > > > > > > >> >> > > > error
> > > > > > > > > >> >> > > > > >> --- a
> > > > > > > > > >> >> > > > > >> > similar error is
> > > > > ConcurrentTransactionsException,
> > > > > > > i.e.
> > > > > > > > > if
> > > > > > > > > >> >> Kafka
> > > > > > > > > >> >> > is
> > > > > > > > > >> >> > > > > >> bug-free
> > > > > > > > > >> >> > > > > >> > these exceptions would only be returned
> if
> > > > users
> > > > > > > try to
> > > > > > > > > do
> > > > > > > > > >> >> > > something
> > > > > > > > > >> >> > > > > >> wrong,
> > > > > > > > > >> >> > > > > >> > e.g. calling abortTxn right after a
> > > commitTxn,
> > > > > > etc.
> > > > > > > So
> > > > > > > > > I'm
> > > > > > > > > >> >> > > thinking
> > > > > > > > > >> >> > > > it
> > > > > > > > > >> >> > > > > >> > should be a non-fatal error instead of a
> > > fatal
> > > > > > > error,
> > > > > > > > > >> wdyt?
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 3) *KafkaException*: case i "indicates
> > fatal
> > > > > > > > > transactional
> > > > > > > > > >> >> > > sequence
> > > > > > > > > >> >> > > > > >> > (Fatal)", this is a bit conflicting with
> > the
> > > > > > > > > >> >> > > > *OutOfSequenceException*
> > > > > > > > > >> >> > > > > >> that
> > > > > > > > > >> >> > > > > >> > is treated as non-fatal. I guess your
> > > proposal
> > > > > is
> > > > > > > that
> > > > > > > > > >> >> > > > > >> > OutOfOrderSequenceException would be
> > treated
> > > > > > either
> > > > > > > as
> > > > > > > > > >> fatal
> > > > > > > > > >> >> > with
> > > > > > > > > >> >> > > > > >> > transactional producer, or non-fatal
> with
> > > > > > idempotent
> > > > > > > > > >> >> producer,
> > > > > > > > > >> >> > is
> > > > > > > > > >> >> > > > that
> > > > > > > > > >> >> > > > > >> > right? If the producer is only
> configured
> > > with
> > > > > > > > > idempotency
> > > > > > > > > >> >> but
> > > > > > > > > >> >> > not
> > > > > > > > > >> >> > > > > >> > transaction, then throwing a
> > > > > > > > > >> >> TransactionStateCorruptedException
> > > > > > > > > >> >> > > for
> > > > > > > > > >> >> > > > > >> > non-fatal errors would be confusing
> since
> > > > users
> > > > > > are
> > > > > > > not
> > > > > > > > > >> using
> > > > > > > > > >> >> > > > > >> transactions
> > > > > > > > > >> >> > > > > >> > at all.. So I suggest we always throw
> > > > > > > > > >> OutOfSequenceException
> > > > > > > > > >> >> > as-is
> > > > > > > > > >> >> > > > > (i.e.
> > > > > > > > > >> >> > > > > >> > not wrapped) no matter how the producer
> is
> > > > > > > configured,
> > > > > > > > > and
> > > > > > > > > >> >> let
> > > > > > > > > >> >> > the
> > > > > > > > > >> >> > > > > >> caller
> > > > > > > > > >> >> > > > > >> > decide how to handle it based on whether
> > it
> > > is
> > > > > > only
> > > > > > > > > >> >> idempotent
> > > > > > > > > >> >> > or
> > > > > > > > > >> >> > > > > >> > transactional itself.
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 4) Besides all the txn APIs, the
> `send()`
> > > > > > callback /
> > > > > > > > > >> future
> > > > > > > > > >> >> can
> > > > > > > > > >> >> > > also
> > > > > > > > > >> >> > > > > >> throw
> > > > > > > > > >> >> > > > > >> > txn-related exceptions, I think this KIP
> > > > should
> > > > > > also
> > > > > > > > > cover
> > > > > > > > > >> >> this
> > > > > > > > > >> >> > > API
> > > > > > > > > >> >> > > > as
> > > > > > > > > >> >> > > > > >> well?
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > 5) This is related to 1/2) above:
> > sometimes
> > > > > those
> > > > > > > > > >> non-fatal
> > > > > > > > > >> >> > errors
> > > > > > > > > >> >> > > > > like
> > > > > > > > > >> >> > > > > >> > ConcurrentTxn or InvalidTxnState are not
> > due
> > > > to
> > > > > > the
> > > > > > > > > state
> > > > > > > > > >> >> being
> > > > > > > > > >> >> > > > > >> corrupted
> > > > > > > > > >> >> > > > > >> > at the broker side, but maybe users are
> > > doing
> > > > > > > something
> > > > > > > > > >> >> wrong.
> > > > > > > > > >> >> > So
> > > > > > > > > >> >> > > > I'm
> > > > > > > > > >> >> > > > > >> > wondering if we should further
> distinguish
> > > > those
> > > > > > > > > non-fatal
> > > > > > > > > >> >> > errors
> > > > > > > > > >> >> > > > > >> between
> > > > > > > > > >> >> > > > > >> > a) those that are caused by Kafka
> itself,
> > > > e.g. a
> > > > > > > broker
> > > > > > > > > >> timed
> > > > > > > > > >> >> > out
> > > > > > > > > >> >> > > > and
> > > > > > > > > >> >> > > > > >> > aborted a txn and later an endTxn
> request
> > is
> > > > > > > received,
> > > > > > > > > >> and b)
> > > > > > > > > >> >> > the
> > > > > > > > > >> >> > > > > user's
> > > > > > > > > >> >> > > > > >> > API call pattern is incorrect, causing
> the
> > > > > request
> > > > > > > to be
> > > > > > > > > >> >> > rejected
> > > > > > > > > >> >> > > > with
> > > > > > > > > >> >> > > > > >> an
> > > > > > > > > >> >> > > > > >> > error code from the broker.
> > > > > > > > > >> >> *TransactionStateCorruptedException*
> > > > > > > > > >> >> > > > feels
> > > > > > > > > >> >> > > > > >> to
> > > > > > > > > >> >> > > > > >> > me more like for case a), but not case
> b).
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > Guozhang
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> > On Wed, Dec 2, 2020 at 4:50 PM Boyang
> > Chen <
> > > > > > > > > >> >> > > > > reluctanthero...@gmail.com>
> > > > > > > > > >> >> > > > > >> > wrote:
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> >> Thanks Matthias, I think your proposal
> > > makes
> > > > > > sense
> > > > > > > as
> > > > > > > > > >> well,
> > > > > > > > > >> >> on
> > > > > > > > > >> >> > > the
> > > > > > > > > >> >> > > > > pro
> > > > > > > > > >> >> > > > > >> side
> > > > > > > > > >> >> > > > > >> >> we could have a universally agreed
> > > exception
> > > > > type
> > > > > > > to be
> > > > > > > > > >> >> caught
> > > > > > > > > >> >> > by
> > > > > > > > > >> >> > > > the
> > > > > > > > > >> >> > > > > >> user,
> > > > > > > > > >> >> > > > > >> >> without having an extra layer on top of
> > the
> > > > > > actual
> > > > > > > > > >> >> exceptions.
> > > > > > > > > >> >> > I
> > > > > > > > > >> >> > > > > could
> > > > > > > > > >> >> > > > > >> see
> > > > > > > > > >> >> > > > > >> >> some issue on downsides:
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >> 1. The exception hierarchy will be more
> > > > > complex.
> > > > > > > Right
> > > > > > > > > >> now
> > > > > > > > > >> >> we
> > > > > > > > > >> >> > > have
> > > > > > > > > >> >> > > > > >> >> RetriableException type, if we are
> going
> > to
> > > > > add a
> > > > > > > > > >> >> > > > > >> >> `ProducerRetriableException` type, we
> > have
> > > to
> > > > > put
> > > > > > > this
> > > > > > > > > >> new
> > > > > > > > > >> >> > > > interface
> > > > > > > > > >> >> > > > > >> as the
> > > > > > > > > >> >> > > > > >> >> parent of the RetriableException,
> because
> > > not
> > > > > all
> > > > > > > > > thrown
> > > > > > > > > >> >> > > non-fatal
> > > > > > > > > >> >> > > > > >> >> exceptions are `retriable` in general
> for
> > > > > > > producer, for
> > > > > > > > > >> >> example
> > > > > > > > > >> >> > > > > >> >> <
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> >
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/e275742f850af4a1b79b0d1bd1ac9a1d2e89c64e/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L745
> > > > > > > > > >> >> > > > > >> >>> .
> > > > > > > > > >> >> > > > > >> >> We could have an empty interface
> > > > > > > > > >> >> `ProducerRetriableException`
> > > > > > > > > >> >> > to
> > > > > > > > > >> >> > > > let
> > > > > > > > > >> >> > > > > >> all
> > > > > > > > > >> >> > > > > >> >> the thrown exceptions implement for
> sure,
> > > but
> > > > > > it's
> > > > > > > a
> > > > > > > > > bit
> > > > > > > > > >> >> > > unorthodox
> > > > > > > > > >> >> > > > > in
> > > > > > > > > >> >> > > > > >> the
> > > > > > > > > >> >> > > > > >> >> way we deal with exceptions in general.
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >> 2. There are cases where we throw a
> > > > > > KafkaException
> > > > > > > > > >> wrapping
> > > > > > > > > >> >> > > another
> > > > > > > > > >> >> > > > > >> >> KafkaException as either fatal or
> > > non-fatal.
> > > > If
> > > > > > we
> > > > > > > use
> > > > > > > > > an
> > > > > > > > > >> >> > > interface
> > > > > > > > > >> >> > > > > to
> > > > > > > > > >> >> > > > > >> >> solve #1, it is also required to
> > implement
> > > > > > another
> > > > > > > > > >> bloated
> > > > > > > > > >> >> > > > exception
> > > > > > > > > >> >> > > > > >> class
> > > > > > > > > >> >> > > > > >> >> which could replace KafkaException
> type,
> > as
> > > > we
> > > > > > > couldn't
> > > > > > > > > >> mark
> > > > > > > > > >> >> > > > > >> KafkaException
> > > > > > > > > >> >> > > > > >> >> as retriable for sure.
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >> 3. In terms of the encapsulation,
> > wrapping
> > > > > means
> > > > > > we
> > > > > > > > > could
> > > > > > > > > >> >> limit
> > > > > > > > > >> >> > > the
> > > > > > > > > >> >> > > > > >> scope
> > > > > > > > > >> >> > > > > >> >> of affection to the producer only,
> which
> > is
> > > > > > > important
> > > > > > > > > >> since
> > > > > > > > > >> >> we
> > > > > > > > > >> >> > > > don't
> > > > > > > > > >> >> > > > > >> want
> > > > > > > > > >> >> > > > > >> >> shared exception types to implement a
> > > > > > > producer-related
> > > > > > > > > >> >> > interface,
> > > > > > > > > >> >> > > > > such
> > > > > > > > > >> >> > > > > >> >> as UnknownTopicOrPartitionException.
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >> Best,
> > > > > > > > > >> >> > > > > >> >> Boyang
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >> On Wed, Dec 2, 2020 at 3:38 PM Matthias
> > J.
> > > > Sax
> > > > > <
> > > > > > > > > >> >> > mj...@apache.org
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > > >> wrote:
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >>> Thanks for the KIP Boyang!
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>> Overall, categorizing exceptions
> makes a
> > > lot
> > > > > of
> > > > > > > sense.
> > > > > > > > > >> As I
> > > > > > > > > >> >> > > don't
> > > > > > > > > >> >> > > > > know
> > > > > > > > > >> >> > > > > >> >>> the producer internals well enough, I
> > > cannot
> > > > > > > comment
> > > > > > > > > on
> > > > > > > > > >> the
> > > > > > > > > >> >> > > > > >> >>> categorization in detail though.
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>> What I am wondering is, if we should
> > > > introduce
> > > > > > an
> > > > > > > > > >> exception
> > > > > > > > > >> >> > > > > interface
> > > > > > > > > >> >> > > > > >> >>> that non-fatal exception would
> implement
> > > > > instead
> > > > > > > of
> > > > > > > > > >> >> creating a
> > > > > > > > > >> >> > > new
> > > > > > > > > >> >> > > > > >> class
> > > > > > > > > >> >> > > > > >> >>> that will wrap non-fatal exceptions?
> > What
> > > > > would
> > > > > > > be the
> > > > > > > > > >> >> > pros/cons
> > > > > > > > > >> >> > > > for
> > > > > > > > > >> >> > > > > >> >>> both designs?
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>> -Matthias
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>> On 12/2/20 11:35 AM, Boyang Chen
> wrote:
> > > > > > > > > >> >> > > > > >> >>>> Hey there,
> > > > > > > > > >> >> > > > > >> >>>>
> > > > > > > > > >> >> > > > > >> >>>> I would like to start a discussion
> > thread
> > > > for
> > > > > > > > > KIP-691:
> > > > > > > > > >> >> > > > > >> >>>>
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> >
> > > > > > > > > >> >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-691%3A+Enhance+Transactional+Producer+Exception+Handling
> > > > > > > > > >> >> > > > > >> >>>>
> > > > > > > > > >> >> > > > > >> >>>> The KIP is aiming to simplify the
> > > exception
> > > > > > > handling
> > > > > > > > > >> logic
> > > > > > > > > >> >> > for
> > > > > > > > > >> >> > > > > >> >>>> transactional Producer users by
> > > classifying
> > > > > > > fatal and
> > > > > > > > > >> >> > non-fatal
> > > > > > > > > >> >> > > > > >> >>> exceptions
> > > > > > > > > >> >> > > > > >> >>>> and throw them correspondingly for
> > easier
> > > > > catch
> > > > > > > and
> > > > > > > > > >> retry.
> > > > > > > > > >> >> > Let
> > > > > > > > > >> >> > > me
> > > > > > > > > >> >> > > > > >> know
> > > > > > > > > >> >> > > > > >> >>> what
> > > > > > > > > >> >> > > > > >> >>>> you think.
> > > > > > > > > >> >> > > > > >> >>>>
> > > > > > > > > >> >> > > > > >> >>>> Best,
> > > > > > > > > >> >> > > > > >> >>>> Boyang
> > > > > > > > > >> >> > > > > >> >>>>
> > > > > > > > > >> >> > > > > >> >>>
> > > > > > > > > >> >> > > > > >> >>
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >> >
> > > > > > > > > >> >> > > > > >>
> > > > > > > > > >> >> > > > > >
> > > > > > > > > >> >> > > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > > > --
> > > > > > > > > >> >> > > > -- Guozhang
> > > > > > > > > >> >> > > >
> > > > > > > > > >> >> > >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> >
> > > > > > > > > >> >> > --
> > > > > > > > > >> >> > -- Guozhang
> > > > > > > > > >> >> >
> > > > > > > > > >> >>
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > --
> > > > > > > > > >> > -- Guozhang
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> -- Guozhang
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to