Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Alexandre Dupriez Wed, 08 Feb 2023 06:01:46 -0800

Hi, Calvin,

Thanks for the KIP and fast follow-up. A few questions.

100. The scenario illustrated in the KIP involves a controller
movement. Is this really required? Cannot the scenario occur with a
similar stale AlterPartition request and the same controller
throughout?

101. In the case where card(ISR) = 1 and the last replica leaves, it
will be re-elected as the leader upon reconnection. If the replica is
empty, all data for the partition will be lost. Is this a correct
understanding of the scenario?

102. I understand that ZK is going to be unsupported soon. However for
as long as it is available to end users, is there any reason not to
support the fix in ZK mode? Arguably, the implementation for the logic
to AlterPartition is duplicated for both controller types, and it may
be more work than is worth if ZK is fully decommissioned in the next
few months. (Alternatively, is there a plan to back port the fix to
older minor versions?).

103. The KIP mentions system tests to be used to simulate the race
condition. Would it be possible to provide more details about it? Do
we think it worth having this scenario be exercised in the functional
tests of the core/server module?

Thanks,
Alexandre

Le mer. 8 févr. 2023 à 03:31, Artem Livshits
<alivsh...@confluent.io.invalid> a écrit :
>
> Hi Calvin,
>
> Thank you for the KIP.  I have a similar question -- we need to support
> rolling upgrades (when we have some old brokers and some new brokers), so
> there could be combinations of old leader - new follower, new leader - old
> follower, new leader - old controller, old leader - new controller.  Could
> you elaborate on the behavior during rolls in the Compatibility section?
>
> Also for compatibility it's probably going to be easier to just add a new
> array of epochs in addition to the existing array of broker ids, instead of
> removing one field and adding another.
>
> The KIP mentions that we would explicitly do something special in ZK mode
> in order to not implement new functionality.  I think it may be easier to
> implement functionality for both ZK and KRraft mode than adding code to
> disable it in ZK mode.
>
> -Artem
>
> On Tue, Feb 7, 2023 at 4:58 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, Calvin,
> >
> > Thanks for the KIP. Looks good to me overall.
> >
> > Since this KIP changes the inter-broker protocol, should we bump up the
> > metadata version (the equivalent of IBP) during upgrade?
> >
> > Jun
> >
> >
> > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu <ca...@confluent.io.invalid>
> > wrote:
> >
> > > Hi everyone,
> > > I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
> > > <https://issues.apache.org/jira/browse/KAFKA-14139>.
> > > It changes the Fetch and AlterPartition requests to include the broker
> > > epochs. Then the controller can use these epochs to help reject the stale
> > > AlterPartition request.
> > > Please take a look. Thanks!
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > >
> >

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Reply via email to