Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-13 Thread Boyang Chen
Thanks Guozhang! Will wait and see if there are more comments.

On Fri, Apr 10, 2020 at 5:17 PM Guozhang Wang  wrote:

> Thanks Boyang, the newly added example looks good to me.
>
> On Thu, Apr 9, 2020 at 2:47 PM Boyang Chen 
> wrote:
>
> > Hey Guozhang,
> >
> > I have added an example of the producer API usage under new improvements.
> > Let me know if this looks good to you.
> >
> > Boyang
> >
> > On Wed, Apr 8, 2020 at 1:38 PM Boyang Chen 
> > wrote:
> >
> > > That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED
> error
> > > should help distinguish exceptions and could safely mark
> > > INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated
> > the
> > > KIP.
> > >
> > > Boyang
> > >
> > > On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson 
> > > wrote:
> > >
> > >> Hey Boyang,
> > >>
> > >> Thanks for the KIP. I think the main problem we've identified here is
> > that
> > >> the current errors conflate transaction timeouts with producer
> fencing.
> > >> The
> > >> first of these ought to be recoverable, but we cannot distinguish it.
> > The
> > >> suggestion to add a new error code makes sense to me, but it leaves
> this
> > >> bit of awkwardness:
> > >>
> > >> > One extra issue that needs to be addressed is how to handle
> > >> `ProducerFenced` from Produce requests.
> > >>
> > >> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH.
> It's
> > >> just that the code treats this as equivalent to `ProducerFenced`. One
> > >> thought I had is maybe PRODUCER_FENCED needs to be a separate error
> code
> > >> as
> > >> well. After all, only the transaction coordinator knows whether a
> > producer
> > >> has been fenced or not. So maybe the handling could be something like
> > the
> > >> following:
> > >>
> > >> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
> > >> recovers by following KIP-360 logic to see whether the epoch can be
> > >> bumped.
> > >> If it cannot because the broker version is too old, we fail.
> > >> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or
> > >> PRODUCER_FENCED. In the first case, we do the same as above. We try to
> > >> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is
> > >> fatal.
> > >> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
> > >> transactional APIs. We treat this the same as 1.
> > >>
> > >> What do you think?
> > >>
> > >> Thanks,
> > >> Jason
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen <
> reluctanthero...@gmail.com>
> > >> wrote:
> > >>
> > >> > Yep, updated the KIP, thanks!
> > >> >
> > >> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang 
> > >> wrote:
> > >> >
> > >> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly
> > >> handled
> > >> > > today in produce / add-partitions-to-txn / add-offsets-to-txn /
> > >> end-txn
> > >> > > responses, so that should be well covered.
> > >> > >
> > >> > > Could you reflect this in the wiki page that the broker should be
> > >> > > responsible for using different error codes given client request
> > >> versions
> > >> > > as well?
> > >> > >
> > >> > >
> > >> > >
> > >> > > Guozhang
> > >> > >
> > >> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen <
> > >> reluctanthero...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks Guozhang for the review!
> > >> > > >
> > >> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang <
> wangg...@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > > > Hello Boyang,
> > >> > > > >
> > >> > > > > Thank you for the proposed KIP. Just some minor comments
> below:
> > >> > > > >
> > >> > > > > 1. Could you also describe which producer APIs could
> potentially
> > >> > throw
> > >> > > > the
> > >> > > > > new TransactionTimedOutException, and also how should callers
> > >> handle
> > >> > > them
> > >> > > > > differently (i.e. just to make your description more concrete
> as
> > >> > > > javadocs).
> > >> > > > >
> > >> > > > > Good point, I will add example java doc changes.
> > >> > > >
> > >> > > >
> > >> > > > > 2. It's straight-forward if client is on newer version while
> > >> broker's
> > >> > > on
> > >> > > > > older version; however If the client is on older version while
> > >> > broker's
> > >> > > > on
> > >> > > > > newer version, today would the internal module of producers
> > treat
> > >> it
> > >> > > as a
> > >> > > > > general fatal error or not? If not, should the broker set a
> > >> different
> > >> > > > error
> > >> > > > > code upon detecting older request versions?
> > >> > > > >
> > >> > > > > That's a good suggestion, my understanding is that the
> > >> prerequisite
> > >> > of
> > >> > > > this change is the new KIP-360 API which is going out with 2.5,
> > >> > > > so we could just return UNKNOWN_PRODUCER_ID instead of
> > >> PRODUCER_FENCED
> > >> > as
> > >> > > > it could be interpreted as abortable error
> > >> > > > in 2.5 producer and retry. 

Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-10 Thread Guozhang Wang
Thanks Boyang, the newly added example looks good to me.

On Thu, Apr 9, 2020 at 2:47 PM Boyang Chen 
wrote:

> Hey Guozhang,
>
> I have added an example of the producer API usage under new improvements.
> Let me know if this looks good to you.
>
> Boyang
>
> On Wed, Apr 8, 2020 at 1:38 PM Boyang Chen 
> wrote:
>
> > That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED error
> > should help distinguish exceptions and could safely mark
> > INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated
> the
> > KIP.
> >
> > Boyang
> >
> > On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson 
> > wrote:
> >
> >> Hey Boyang,
> >>
> >> Thanks for the KIP. I think the main problem we've identified here is
> that
> >> the current errors conflate transaction timeouts with producer fencing.
> >> The
> >> first of these ought to be recoverable, but we cannot distinguish it.
> The
> >> suggestion to add a new error code makes sense to me, but it leaves this
> >> bit of awkwardness:
> >>
> >> > One extra issue that needs to be addressed is how to handle
> >> `ProducerFenced` from Produce requests.
> >>
> >> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH. It's
> >> just that the code treats this as equivalent to `ProducerFenced`. One
> >> thought I had is maybe PRODUCER_FENCED needs to be a separate error code
> >> as
> >> well. After all, only the transaction coordinator knows whether a
> producer
> >> has been fenced or not. So maybe the handling could be something like
> the
> >> following:
> >>
> >> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
> >> recovers by following KIP-360 logic to see whether the epoch can be
> >> bumped.
> >> If it cannot because the broker version is too old, we fail.
> >> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or
> >> PRODUCER_FENCED. In the first case, we do the same as above. We try to
> >> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is
> >> fatal.
> >> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
> >> transactional APIs. We treat this the same as 1.
> >>
> >> What do you think?
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen 
> >> wrote:
> >>
> >> > Yep, updated the KIP, thanks!
> >> >
> >> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang 
> >> wrote:
> >> >
> >> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly
> >> handled
> >> > > today in produce / add-partitions-to-txn / add-offsets-to-txn /
> >> end-txn
> >> > > responses, so that should be well covered.
> >> > >
> >> > > Could you reflect this in the wiki page that the broker should be
> >> > > responsible for using different error codes given client request
> >> versions
> >> > > as well?
> >> > >
> >> > >
> >> > >
> >> > > Guozhang
> >> > >
> >> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen <
> >> reluctanthero...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Thanks Guozhang for the review!
> >> > > >
> >> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang 
> >> > wrote:
> >> > > >
> >> > > > > Hello Boyang,
> >> > > > >
> >> > > > > Thank you for the proposed KIP. Just some minor comments below:
> >> > > > >
> >> > > > > 1. Could you also describe which producer APIs could potentially
> >> > throw
> >> > > > the
> >> > > > > new TransactionTimedOutException, and also how should callers
> >> handle
> >> > > them
> >> > > > > differently (i.e. just to make your description more concrete as
> >> > > > javadocs).
> >> > > > >
> >> > > > > Good point, I will add example java doc changes.
> >> > > >
> >> > > >
> >> > > > > 2. It's straight-forward if client is on newer version while
> >> broker's
> >> > > on
> >> > > > > older version; however If the client is on older version while
> >> > broker's
> >> > > > on
> >> > > > > newer version, today would the internal module of producers
> treat
> >> it
> >> > > as a
> >> > > > > general fatal error or not? If not, should the broker set a
> >> different
> >> > > > error
> >> > > > > code upon detecting older request versions?
> >> > > > >
> >> > > > > That's a good suggestion, my understanding is that the
> >> prerequisite
> >> > of
> >> > > > this change is the new KIP-360 API which is going out with 2.5,
> >> > > > so we could just return UNKNOWN_PRODUCER_ID instead of
> >> PRODUCER_FENCED
> >> > as
> >> > > > it could be interpreted as abortable error
> >> > > > in 2.5 producer and retry. Producers older than 2.5 will not be
> >> > covered.
> >> > > > WDYT?
> >> > > >
> >> > > > >
> >> > > > > Guozhang
> >> > > > >
> >> > > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen <
> >> > reluctanthero...@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hey there,
> >> > > > > >
> >> > > > > > I would like to start discussion for KIP-588:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> 

Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-09 Thread Boyang Chen
Hey Guozhang,

I have added an example of the producer API usage under new improvements.
Let me know if this looks good to you.

Boyang

On Wed, Apr 8, 2020 at 1:38 PM Boyang Chen 
wrote:

> That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED error
> should help distinguish exceptions and could safely mark
> INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated the
> KIP.
>
> Boyang
>
> On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson 
> wrote:
>
>> Hey Boyang,
>>
>> Thanks for the KIP. I think the main problem we've identified here is that
>> the current errors conflate transaction timeouts with producer fencing.
>> The
>> first of these ought to be recoverable, but we cannot distinguish it. The
>> suggestion to add a new error code makes sense to me, but it leaves this
>> bit of awkwardness:
>>
>> > One extra issue that needs to be addressed is how to handle
>> `ProducerFenced` from Produce requests.
>>
>> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH. It's
>> just that the code treats this as equivalent to `ProducerFenced`. One
>> thought I had is maybe PRODUCER_FENCED needs to be a separate error code
>> as
>> well. After all, only the transaction coordinator knows whether a producer
>> has been fenced or not. So maybe the handling could be something like the
>> following:
>>
>> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
>> recovers by following KIP-360 logic to see whether the epoch can be
>> bumped.
>> If it cannot because the broker version is too old, we fail.
>> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or
>> PRODUCER_FENCED. In the first case, we do the same as above. We try to
>> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is
>> fatal.
>> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
>> transactional APIs. We treat this the same as 1.
>>
>> What do you think?
>>
>> Thanks,
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen 
>> wrote:
>>
>> > Yep, updated the KIP, thanks!
>> >
>> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang 
>> wrote:
>> >
>> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly
>> handled
>> > > today in produce / add-partitions-to-txn / add-offsets-to-txn /
>> end-txn
>> > > responses, so that should be well covered.
>> > >
>> > > Could you reflect this in the wiki page that the broker should be
>> > > responsible for using different error codes given client request
>> versions
>> > > as well?
>> > >
>> > >
>> > >
>> > > Guozhang
>> > >
>> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen <
>> reluctanthero...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Guozhang for the review!
>> > > >
>> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang 
>> > wrote:
>> > > >
>> > > > > Hello Boyang,
>> > > > >
>> > > > > Thank you for the proposed KIP. Just some minor comments below:
>> > > > >
>> > > > > 1. Could you also describe which producer APIs could potentially
>> > throw
>> > > > the
>> > > > > new TransactionTimedOutException, and also how should callers
>> handle
>> > > them
>> > > > > differently (i.e. just to make your description more concrete as
>> > > > javadocs).
>> > > > >
>> > > > > Good point, I will add example java doc changes.
>> > > >
>> > > >
>> > > > > 2. It's straight-forward if client is on newer version while
>> broker's
>> > > on
>> > > > > older version; however If the client is on older version while
>> > broker's
>> > > > on
>> > > > > newer version, today would the internal module of producers treat
>> it
>> > > as a
>> > > > > general fatal error or not? If not, should the broker set a
>> different
>> > > > error
>> > > > > code upon detecting older request versions?
>> > > > >
>> > > > > That's a good suggestion, my understanding is that the
>> prerequisite
>> > of
>> > > > this change is the new KIP-360 API which is going out with 2.5,
>> > > > so we could just return UNKNOWN_PRODUCER_ID instead of
>> PRODUCER_FENCED
>> > as
>> > > > it could be interpreted as abortable error
>> > > > in 2.5 producer and retry. Producers older than 2.5 will not be
>> > covered.
>> > > > WDYT?
>> > > >
>> > > > >
>> > > > > Guozhang
>> > > > >
>> > > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen <
>> > reluctanthero...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hey there,
>> > > > > >
>> > > > > > I would like to start discussion for KIP-588:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
>> > > > > >
>> > > > > > which aims to improve Producer resilience to transaction timeout
>> > due
>> > > to
>> > > > > > transient system gaps.
>> > > > > >
>> > > > > > Best,
>> > > > > > Boyang
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > -- Guozhang
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > -- Guozhang
>> > >
>> >
>>
>


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-08 Thread Boyang Chen
That's a good suggestion Jason. Adding a dedicated PRODUCER_FENCED error
should help distinguish exceptions and could safely mark
INVALID_PRODUCER_EPOCH exception as non-fatal in the new code. Updated the
KIP.

Boyang

On Wed, Apr 8, 2020 at 12:18 PM Jason Gustafson  wrote:

> Hey Boyang,
>
> Thanks for the KIP. I think the main problem we've identified here is that
> the current errors conflate transaction timeouts with producer fencing. The
> first of these ought to be recoverable, but we cannot distinguish it. The
> suggestion to add a new error code makes sense to me, but it leaves this
> bit of awkwardness:
>
> > One extra issue that needs to be addressed is how to handle
> `ProducerFenced` from Produce requests.
>
> In fact, the underlying error code here is INVALID_PRODUCER_EPOCH. It's
> just that the code treats this as equivalent to `ProducerFenced`. One
> thought I had is maybe PRODUCER_FENCED needs to be a separate error code as
> well. After all, only the transaction coordinator knows whether a producer
> has been fenced or not. So maybe the handling could be something like the
> following:
>
> 1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
> recovers by following KIP-360 logic to see whether the epoch can be bumped.
> If it cannot because the broker version is too old, we fail.
> 2. Transactional APIs may return either TRANSACTION_TIMEOUT or
> PRODUCER_FENCED. In the first case, we do the same as above. We try to
> recover by bumping the epoch. If the error is PRODUCER_FENCED, it is fatal.
> 3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
> transactional APIs. We treat this the same as 1.
>
> What do you think?
>
> Thanks,
> Jason
>
>
>
>
>
>
>
>
>
>
> On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen 
> wrote:
>
> > Yep, updated the KIP, thanks!
> >
> > On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang  wrote:
> >
> > > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly
> handled
> > > today in produce / add-partitions-to-txn / add-offsets-to-txn / end-txn
> > > responses, so that should be well covered.
> > >
> > > Could you reflect this in the wiki page that the broker should be
> > > responsible for using different error codes given client request
> versions
> > > as well?
> > >
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen  >
> > > wrote:
> > >
> > > > Thanks Guozhang for the review!
> > > >
> > > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang 
> > wrote:
> > > >
> > > > > Hello Boyang,
> > > > >
> > > > > Thank you for the proposed KIP. Just some minor comments below:
> > > > >
> > > > > 1. Could you also describe which producer APIs could potentially
> > throw
> > > > the
> > > > > new TransactionTimedOutException, and also how should callers
> handle
> > > them
> > > > > differently (i.e. just to make your description more concrete as
> > > > javadocs).
> > > > >
> > > > > Good point, I will add example java doc changes.
> > > >
> > > >
> > > > > 2. It's straight-forward if client is on newer version while
> broker's
> > > on
> > > > > older version; however If the client is on older version while
> > broker's
> > > > on
> > > > > newer version, today would the internal module of producers treat
> it
> > > as a
> > > > > general fatal error or not? If not, should the broker set a
> different
> > > > error
> > > > > code upon detecting older request versions?
> > > > >
> > > > > That's a good suggestion, my understanding is that the prerequisite
> > of
> > > > this change is the new KIP-360 API which is going out with 2.5,
> > > > so we could just return UNKNOWN_PRODUCER_ID instead of
> PRODUCER_FENCED
> > as
> > > > it could be interpreted as abortable error
> > > > in 2.5 producer and retry. Producers older than 2.5 will not be
> > covered.
> > > > WDYT?
> > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hey there,
> > > > > >
> > > > > > I would like to start discussion for KIP-588:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > > > > >
> > > > > > which aims to improve Producer resilience to transaction timeout
> > due
> > > to
> > > > > > transient system gaps.
> > > > > >
> > > > > > Best,
> > > > > > Boyang
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-08 Thread Jason Gustafson
Hey Boyang,

Thanks for the KIP. I think the main problem we've identified here is that
the current errors conflate transaction timeouts with producer fencing. The
first of these ought to be recoverable, but we cannot distinguish it. The
suggestion to add a new error code makes sense to me, but it leaves this
bit of awkwardness:

> One extra issue that needs to be addressed is how to handle
`ProducerFenced` from Produce requests.

In fact, the underlying error code here is INVALID_PRODUCER_EPOCH. It's
just that the code treats this as equivalent to `ProducerFenced`. One
thought I had is maybe PRODUCER_FENCED needs to be a separate error code as
well. After all, only the transaction coordinator knows whether a producer
has been fenced or not. So maybe the handling could be something like the
following:

1. Produce requests may return INVALID_PRODUCER_EPOCH. The producer
recovers by following KIP-360 logic to see whether the epoch can be bumped.
If it cannot because the broker version is too old, we fail.
2. Transactional APIs may return either TRANSACTION_TIMEOUT or
PRODUCER_FENCED. In the first case, we do the same as above. We try to
recover by bumping the epoch. If the error is PRODUCER_FENCED, it is fatal.
3. Older brokers may return INVALID_PRODUCER_EPOCH as well from
transactional APIs. We treat this the same as 1.

What do you think?

Thanks,
Jason










On Mon, Apr 6, 2020 at 3:41 PM Boyang Chen 
wrote:

> Yep, updated the KIP, thanks!
>
> On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang  wrote:
>
> > Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly handled
> > today in produce / add-partitions-to-txn / add-offsets-to-txn / end-txn
> > responses, so that should be well covered.
> >
> > Could you reflect this in the wiki page that the broker should be
> > responsible for using different error codes given client request versions
> > as well?
> >
> >
> >
> > Guozhang
> >
> > On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen 
> > wrote:
> >
> > > Thanks Guozhang for the review!
> > >
> > > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang 
> wrote:
> > >
> > > > Hello Boyang,
> > > >
> > > > Thank you for the proposed KIP. Just some minor comments below:
> > > >
> > > > 1. Could you also describe which producer APIs could potentially
> throw
> > > the
> > > > new TransactionTimedOutException, and also how should callers handle
> > them
> > > > differently (i.e. just to make your description more concrete as
> > > javadocs).
> > > >
> > > > Good point, I will add example java doc changes.
> > >
> > >
> > > > 2. It's straight-forward if client is on newer version while broker's
> > on
> > > > older version; however If the client is on older version while
> broker's
> > > on
> > > > newer version, today would the internal module of producers treat it
> > as a
> > > > general fatal error or not? If not, should the broker set a different
> > > error
> > > > code upon detecting older request versions?
> > > >
> > > > That's a good suggestion, my understanding is that the prerequisite
> of
> > > this change is the new KIP-360 API which is going out with 2.5,
> > > so we could just return UNKNOWN_PRODUCER_ID instead of PRODUCER_FENCED
> as
> > > it could be interpreted as abortable error
> > > in 2.5 producer and retry. Producers older than 2.5 will not be
> covered.
> > > WDYT?
> > >
> > > >
> > > > Guozhang
> > > >
> > > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen <
> reluctanthero...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hey there,
> > > > >
> > > > > I would like to start discussion for KIP-588:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > > > >
> > > > > which aims to improve Producer resilience to transaction timeout
> due
> > to
> > > > > transient system gaps.
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-06 Thread Boyang Chen
Yep, updated the KIP, thanks!

On Mon, Apr 6, 2020 at 3:11 PM Guozhang Wang  wrote:

> Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly handled
> today in produce / add-partitions-to-txn / add-offsets-to-txn / end-txn
> responses, so that should be well covered.
>
> Could you reflect this in the wiki page that the broker should be
> responsible for using different error codes given client request versions
> as well?
>
>
>
> Guozhang
>
> On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen 
> wrote:
>
> > Thanks Guozhang for the review!
> >
> > On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang  wrote:
> >
> > > Hello Boyang,
> > >
> > > Thank you for the proposed KIP. Just some minor comments below:
> > >
> > > 1. Could you also describe which producer APIs could potentially throw
> > the
> > > new TransactionTimedOutException, and also how should callers handle
> them
> > > differently (i.e. just to make your description more concrete as
> > javadocs).
> > >
> > > Good point, I will add example java doc changes.
> >
> >
> > > 2. It's straight-forward if client is on newer version while broker's
> on
> > > older version; however If the client is on older version while broker's
> > on
> > > newer version, today would the internal module of producers treat it
> as a
> > > general fatal error or not? If not, should the broker set a different
> > error
> > > code upon detecting older request versions?
> > >
> > > That's a good suggestion, my understanding is that the prerequisite of
> > this change is the new KIP-360 API which is going out with 2.5,
> > so we could just return UNKNOWN_PRODUCER_ID instead of PRODUCER_FENCED as
> > it could be interpreted as abortable error
> > in 2.5 producer and retry. Producers older than 2.5 will not be covered.
> > WDYT?
> >
> > >
> > > Guozhang
> > >
> > > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen  >
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > I would like to start discussion for KIP-588:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > > >
> > > > which aims to improve Producer resilience to transaction timeout due
> to
> > > > transient system gaps.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-06 Thread Guozhang Wang
Regarding 2), sounds good, I saw UNKNOWN_PRODUCER_ID is properly handled
today in produce / add-partitions-to-txn / add-offsets-to-txn / end-txn
responses, so that should be well covered.

Could you reflect this in the wiki page that the broker should be
responsible for using different error codes given client request versions
as well?



Guozhang

On Mon, Apr 6, 2020 at 9:20 AM Boyang Chen 
wrote:

> Thanks Guozhang for the review!
>
> On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang  wrote:
>
> > Hello Boyang,
> >
> > Thank you for the proposed KIP. Just some minor comments below:
> >
> > 1. Could you also describe which producer APIs could potentially throw
> the
> > new TransactionTimedOutException, and also how should callers handle them
> > differently (i.e. just to make your description more concrete as
> javadocs).
> >
> > Good point, I will add example java doc changes.
>
>
> > 2. It's straight-forward if client is on newer version while broker's on
> > older version; however If the client is on older version while broker's
> on
> > newer version, today would the internal module of producers treat it as a
> > general fatal error or not? If not, should the broker set a different
> error
> > code upon detecting older request versions?
> >
> > That's a good suggestion, my understanding is that the prerequisite of
> this change is the new KIP-360 API which is going out with 2.5,
> so we could just return UNKNOWN_PRODUCER_ID instead of PRODUCER_FENCED as
> it could be interpreted as abortable error
> in 2.5 producer and retry. Producers older than 2.5 will not be covered.
> WDYT?
>
> >
> > Guozhang
> >
> > On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen 
> > wrote:
> >
> > > Hey there,
> > >
> > > I would like to start discussion for KIP-588:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > >
> > > which aims to improve Producer resilience to transaction timeout due to
> > > transient system gaps.
> > >
> > > Best,
> > > Boyang
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-06 Thread Boyang Chen
Thanks Guozhang for the review!

On Sun, Apr 5, 2020 at 5:47 PM Guozhang Wang  wrote:

> Hello Boyang,
>
> Thank you for the proposed KIP. Just some minor comments below:
>
> 1. Could you also describe which producer APIs could potentially throw the
> new TransactionTimedOutException, and also how should callers handle them
> differently (i.e. just to make your description more concrete as javadocs).
>
> Good point, I will add example java doc changes.


> 2. It's straight-forward if client is on newer version while broker's on
> older version; however If the client is on older version while broker's on
> newer version, today would the internal module of producers treat it as a
> general fatal error or not? If not, should the broker set a different error
> code upon detecting older request versions?
>
> That's a good suggestion, my understanding is that the prerequisite of
this change is the new KIP-360 API which is going out with 2.5,
so we could just return UNKNOWN_PRODUCER_ID instead of PRODUCER_FENCED as
it could be interpreted as abortable error
in 2.5 producer and retry. Producers older than 2.5 will not be covered.
WDYT?

>
> Guozhang
>
> On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen 
> wrote:
>
> > Hey there,
> >
> > I would like to start discussion for KIP-588:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> >
> > which aims to improve Producer resilience to transaction timeout due to
> > transient system gaps.
> >
> > Best,
> > Boyang
> >
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-05 Thread Guozhang Wang
Hello Boyang,

Thank you for the proposed KIP. Just some minor comments below:

1. Could you also describe which producer APIs could potentially throw the
new TransactionTimedOutException, and also how should callers handle them
differently (i.e. just to make your description more concrete as javadocs).

2. It's straight-forward if client is on newer version while broker's on
older version; however If the client is on older version while broker's on
newer version, today would the internal module of producers treat it as a
general fatal error or not? If not, should the broker set a different error
code upon detecting older request versions?


Guozhang

On Thu, Apr 2, 2020 at 1:40 PM Boyang Chen 
wrote:

> Hey there,
>
> I would like to start discussion for KIP-588:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
>
> which aims to improve Producer resilience to transaction timeout due to
> transient system gaps.
>
> Best,
> Boyang
>


-- 
-- Guozhang


[DISCUSS] KIP-588: Allow producers to recover gracefully from transaction timeouts

2020-04-02 Thread Boyang Chen
Hey there,

I would like to start discussion for KIP-588:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts

which aims to improve Producer resilience to transaction timeout due to
transient system gaps.

Best,
Boyang