Thanks Guozhang! Will wait and see if there are more comments.

On Fri, Apr 10, 2020 at 5:17 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks Boyang, the newly added example looks good to me.
>
> On Thu, Apr 9, 2020 at 2:47 PM Boyang Chen <reluctanthero...@gmail.com>
> wrote:
>
> > Hey Guozhang,
> >
> > I have added an example of the producer API usage under new improvements.
> > Let me know if this looks good to you.
> >
> > Boyang
> >
> > On Wed, Apr 8, 2020 at 1:38 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED
> error
> > > should help distinguish exceptions and could safely mark
> > > INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated
> > the
> > > KIP.
> > >
> > > Boyang
> > >
> > > On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > >> Hey Boyang,
> > >>
> > >> Thanks for the KIP. I think the main problem we've identified here is
> > that
> > >> the current errors conflate transaction timeouts with producer
> fencing.
> > >> The
> > >> first of these ought to be recoverable, but we cannot distinguish it.
> > The
> > >> suggestion to add a new error code makes sense to me, but it leaves
> this
> > >> bit of awkwardness:
> > >>
> > >> > One extra issue that needs to be addressed is how to handle
> > >> `ProducerFenced` from Produce requests.
> > >>
> > >> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH.
> It's
> > >> just that the code treats this as equivalent to `ProducerFenced`. One
> > >> thought I had is maybe PRODUCER_FENCED needs to be a separate error
> code
> > >> as
> > >> well. After all, only the transaction coordinator knows whether a
> > producer
> > >> has been fenced or not. So maybe the handling could be something like
> > the
> > >> following:
> > >>
> > >> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
> > >> recovers by following KIP-360 logic to see whether the epoch can be
> > >> bumped.
> > >> If it cannot because the broker version is too old, we fail.
> > >> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or
> > >> PRODUCER_FENCED. In the first case, we do the same as above. We try to
> > >> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is
> > >> fatal.
> > >> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
> > >> transactional APIs. We treat this the same as 1.
> > >>
> > >> What do you think?
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen <
> reluctanthero...@gmail.com>
> > >> wrote:
> > >>
> > >> > Yep, updated the KIP, thanks!
> > >> >
> > >> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang <wangg...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly
> > >> handled
> > >> > > today in produce / add-partitions-to-txn / add-offsets-to-txn /
> > >> end-txn
> > >> > > responses, so that should be well covered.
> > >> > >
> > >> > > Could you reflect this in the wiki page that the broker should be
> > >> > > responsible for using different error codes given client request
> > >> versions
> > >> > > as well?
> > >> > >
> > >> > >
> > >> > >
> > >> > > Guozhang
> > >> > >
> > >> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen <
> > >> reluctanthero...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks Guozhang for the review!
> > >> > > >
> > >> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang <
> wangg...@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > Hello Boyang,
> > >> > > > >
> > >> > > > > Thank you for the proposed KIP. Just some minor comments
> below:
> > >> > > > >
> > >> > > > > 1. Could you also describe which producer APIs could
> potentially
> > >> > throw
> > >> > > > the
> > >> > > > > new TransactionTimedOutException, and also how should callers
> > >> handle
> > >> > > them
> > >> > > > > differently (i.e. just to make your description more concrete
> as
> > >> > > > javadocs).
> > >> > > > >
> > >> > > > > Good point, I will add example java doc changes.
> > >> > > >
> > >> > > >
> > >> > > > > 2. It's straight-forward if client is on newer version while
> > >> broker's
> > >> > > on
> > >> > > > > older version; however If the client is on older version while
> > >> > broker's
> > >> > > > on
> > >> > > > > newer version, today would the internal module of producers
> > treat
> > >> it
> > >> > > as a
> > >> > > > > general fatal error or not? If not, should the broker set a
> > >> different
> > >> > > > error
> > >> > > > > code upon detecting older request versions?
> > >> > > > >
> > >> > > > > That's a good suggestion, my understanding is that the
> > >> prerequisite
> > >> > of
> > >> > > > this change is the new KIP-360 API which is going out with 2.5,
> > >> > > > so we could just return UNKNOWN_PRODUCER_ID instead of
> > >> PRODUCER_FENCED
> > >> > as
> > >> > > > it could be interpreted as abortable error
> > >> > > > in 2.5 producer and retry. Producers older than 2.5 will not be
> > >> > covered.
> > >> > > > WDYT?
> > >> > > >
> > >> > > > >
> > >> > > > > Guozhang
> > >> > > > >
> > >> > > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen <
> > >> > reluctanthero...@gmail.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hey there,
> > >> > > > > >
> > >> > > > > > I would like to start discussion for KIP-588:
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > >> > > > > >
> > >> > > > > > which aims to improve Producer resilience to transaction
> > timeout
> > >> > due
> > >> > > to
> > >> > > > > > transient system gaps.
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Boyang
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > -- Guozhang
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > -- Guozhang
> > >> > >
> > >> >
> > >>
> > >
> >
>
>
> --
> -- Guozhang
>

Reply via email to