Hi Calvin,

Thanks for the KIP! The overall approach looks reasonable to me. I have a
few questions/comments:

01. I wonder if the leader should also verify the broker epochs based on
its metadata cache before sending the AlterPartition request to the
controller. Imagine the case where a follower not in the ISR would keep
fetching with a stale broker epoch but with a valid leader epoch. In this
case, the leader would try to add it back to the ISR when the follower
catches up but the controller would reject it because its broker epoch is
stale. Until this condition resolves itself, the leader can't add any other
followers back to the ISR because the ISR validation is all or nothing at
the moment. Letting the leader check before sending the AlterPartition
request would mitigate this. This condition must be rare but could happen.
We did this in KIP-841.

02. In AlterPartitionRequest/AlterPartitionResponse, you need to update the
`validVersions` to `0-3`.

03. Personally, I like the `NewIsrWithEpochs` field in
the AlterPartitionRequest. We can handle the backward compatibility in the
request builder. Basically, we would always populate NewIsrWithEpochs and
the request builder can translate it to NewIsr if an earlier version is
used. Should the field be ignorable?

04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?

05. In FetchRequest/FetchResponse, you need to update the `validVersions`
as well.

06. It is a little weird to have ReplicaId and BrokerEpoch in the
FetchRequest. I wonder if we should rename ReplicaId to BrokerId because it
is actually the broker id (even the documentation of the field says it).

07. On the followers, the fetch request version is derived from the
metadata version/IBP. As we add a new fetch version, we need to add a new
metadata version/IBP as well.

08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is on the way
out so not doing it seems fine. If we do this, we could basically just
ignore the broker epoch in ZK and it will keep working as today.

Best,
David

On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi, Calvin,
>
> Thanks for the KIP and fast follow-up. A few questions.
>
> 100. The scenario illustrated in the KIP involves a controller
> movement. Is this really required? Cannot the scenario occur with a
> similar stale AlterPartition request and the same controller
> throughout?
>
> 101. In the case where card(ISR) = 1 and the last replica leaves, it
> will be re-elected as the leader upon reconnection. If the replica is
> empty, all data for the partition will be lost. Is this a correct
> understanding of the scenario?
>
> 102. I understand that ZK is going to be unsupported soon. However for
> as long as it is available to end users, is there any reason not to
> support the fix in ZK mode? Arguably, the implementation for the logic
> to AlterPartition is duplicated for both controller types, and it may
> be more work than is worth if ZK is fully decommissioned in the next
> few months. (Alternatively, is there a plan to back port the fix to
> older minor versions?).
>
> 103. The KIP mentions system tests to be used to simulate the race
> condition. Would it be possible to provide more details about it? Do
> we think it worth having this scenario be exercised in the functional
> tests of the core/server module?
>
> Thanks,
> Alexandre
>
> Le mer. 8 févr. 2023 à 03:31, Artem Livshits
> <alivsh...@confluent.io.invalid> a écrit :
> >
> > Hi Calvin,
> >
> > Thank you for the KIP.  I have a similar question -- we need to support
> > rolling upgrades (when we have some old brokers and some new brokers), so
> > there could be combinations of old leader - new follower, new leader -
> old
> > follower, new leader - old controller, old leader - new controller.
> Could
> > you elaborate on the behavior during rolls in the Compatibility section?
> >
> > Also for compatibility it's probably going to be easier to just add a new
> > array of epochs in addition to the existing array of broker ids, instead
> of
> > removing one field and adding another.
> >
> > The KIP mentions that we would explicitly do something special in ZK mode
> > in order to not implement new functionality.  I think it may be easier to
> > implement functionality for both ZK and KRraft mode than adding code to
> > disable it in ZK mode.
> >
> > -Artem
> >
> > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao <j...@confluent.io.invalid> wrote:
> >
> > > Hi, Calvin,
> > >
> > > Thanks for the KIP. Looks good to me overall.
> > >
> > > Since this KIP changes the inter-broker protocol, should we bump up the
> > > metadata version (the equivalent of IBP) during upgrade?
> > >
> > > Jun
> > >
> > >
> > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu <ca...@confluent.io.invalid
> >
> > > wrote:
> > >
> > > > Hi everyone,
> > > > I'd like to discuss the fix for the broker reboot data loss
> KAFKA-14139
> > > > <https://issues.apache.org/jira/browse/KAFKA-14139>.
> > > > It changes the Fetch and AlterPartition requests to include the
> broker
> > > > epochs. Then the controller can use these epochs to help reject the
> stale
> > > > AlterPartition request.
> > > > Please take a look. Thanks!
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > > >
> > >
>

Reply via email to