Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-13 Thread Calvin Liu
Hi David.
Thanks for the feedback. I have updated the KIP as you suggested.

On Mon, Feb 13, 2023 at 7:34 AM David Jacot 
wrote:

> Hi Calvin,
>
> 01. Should we update the KIP to mention this?
>
> 09. The upgrade part is still not very clear in my opinion. It would be
> good to clearly mention how we are going to handle the upgrade of the
> AlterPartition API and the Fetch API. For the former, the broker uses the
> ApiVersions API to discover the supported versions of the controller so it
> will use the appropriate version based on this. For the later, we use the
> IBP/metadata version to gate the version of the fetch request used.
> Therefore, the version 14 will be used only when the IBP/metadata version
> is upgraded.
>
> Besides those two points, the KIP looks good to me.
>
> Thanks,
> David
>
> On Fri, Feb 10, 2023 at 7:47 PM Calvin Liu 
> wrote:
>
> > Hi David,
> > 01. It is a good idea to fence some delayed Fetch requests from the
> reboot
> > followers.
> > 06. Ack
> > 09. Rephrased the sentence to let the ZK controller ignore the broker
> > epoch.
> >
> > Thanks,
> > Calvin.
> >
> > On Fri, Feb 10, 2023 at 6:11 AM David Jacot  >
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > 01. Yeah, this should generally not happen. I was wondering if it could
> > > happen in the case of a partial failure of a broker for instance.
> > > Generally, I think that being defensive on the leader side does not
> hurt
> > > here. I am perhaps too extreme here.
> > >
> > > 03. NewIsr does not need to be ignorable because we will only set it
> when
> > > an old version is used.
> > >
> > > 06. It is a bit weird to have ReplicaId and BrokerId in the final
> > schema. I
> > > would remove ReplicaId. The comment is enough to explain the renaming.
> > >
> > > 09. Could you elaborate on "As the AlterPartition and Fetch requests
> are
> > > shared between ZK and Kraft mode, the related field will keep empty in
> ZK
> > > mode and will not be used."? I would do it the other way around, I
> think
> > > that the followers and the leader should always populare all the fields
> > but
> > > the ZK controller should not take it into account. It will be easier
> like
> > > this vs having to put gates everywhere.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu 
> > > wrote:
> > >
> > > > Hi Alexandre,
> > > > 104: I am not aware of any proposals around the up-to-date replica
> > > > election. It can be a good place to dig deeper.
> > > > 105: Yes, it almost puts nothing on the performance.
> > > >
> > > > Thanks,
> > > > Calvin
> > > >
> > > > On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
> > > > alexandre.dupr...@gmail.com> wrote:
> > > >
> > > > > Hi, Calvin,
> > > > >
> > > > > Many thanks for the replies.
> > > > >
> > > > > 103. Thanks for confirming the testing strategy. Exercising the
> right
> > > > > sequence of requests is a good idea.
> > > > >
> > > > > 104. Out of scope here, but the KIP reminds that currently, the
> > > > > controller does not attempt to detect data loss on the latest
> replica
> > > > > in the ISR and to choose the most up-to-date replica as a leader.
> Is
> > > > > this an area where the community wants to invest in building a
> > > > > potential solution?
> > > > >
> > > > > 105. I assume any performance impact of the change in Fetch
> requests
> > > > > is marginal if even noticeable at all.
> > > > >
> > > > >  Thanks,
> > > > > Alexandre
> > > > >
> > > > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu
>  > >
> > > a
> > > > > écrit :
> > > > > >
> > > > > > For Jun:
> > > > > > --- Since this KIP changes the inter-broker protocol, should we
> > bump
> > > up
> > > > > the
> > > > > > metadata version (the equivalent of IBP) during the upgrade?
> > > > > > Yes, we can.
> > > > > >
> > > > > > For Artem:
> > > > > > --- Could you elaborate on the behavior during rolls in the
> > > > Compatibility
> > > > > > section?
> > > > > > We can update the metadata version(IBP) to make sure the brokers
> > can
> > > > have
> > > > > > the latest code.
> > > > > > Also, during the IBP upgrade, the leader can just skip setting
> > broker
> > > > > epoch
> > > > > > or set -1 for the brokers that have not started using the new
> APIs.
> > > The
> > > > > > controller can bypass the check for such brokers so that the
> > cluster
> > > > can
> > > > > > still change ISR during the upgrade.
> > > > > >
> > > > > > --- Also for compatibility it's probably going to be easier to
> just
> > > > add a
> > > > > > new
> > > > > > array of epochs in addition to the existing array of broker ids,
> > > > instead
> > > > > of
> > > > > > removing one field and adding another.
> > > > > > We can do it both ways. Just prefer to replace it with a new
> field
> > > for
> > > > > > better readability.
> > > > > >
> > > > > > --- The KIP mentions that we would explicitly do something
> special
> > in
> > > > ZK
> > > > > > mode
> > > > > > in order to not implement new functionality.
> > > > > > It c

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-13 Thread David Jacot
Hi Calvin,

01. Should we update the KIP to mention this?

09. The upgrade part is still not very clear in my opinion. It would be
good to clearly mention how we are going to handle the upgrade of the
AlterPartition API and the Fetch API. For the former, the broker uses the
ApiVersions API to discover the supported versions of the controller so it
will use the appropriate version based on this. For the later, we use the
IBP/metadata version to gate the version of the fetch request used.
Therefore, the version 14 will be used only when the IBP/metadata version
is upgraded.

Besides those two points, the KIP looks good to me.

Thanks,
David

On Fri, Feb 10, 2023 at 7:47 PM Calvin Liu 
wrote:

> Hi David,
> 01. It is a good idea to fence some delayed Fetch requests from the reboot
> followers.
> 06. Ack
> 09. Rephrased the sentence to let the ZK controller ignore the broker
> epoch.
>
> Thanks,
> Calvin.
>
> On Fri, Feb 10, 2023 at 6:11 AM David Jacot 
> wrote:
>
> > Hi Calvin,
> >
> > 01. Yeah, this should generally not happen. I was wondering if it could
> > happen in the case of a partial failure of a broker for instance.
> > Generally, I think that being defensive on the leader side does not hurt
> > here. I am perhaps too extreme here.
> >
> > 03. NewIsr does not need to be ignorable because we will only set it when
> > an old version is used.
> >
> > 06. It is a bit weird to have ReplicaId and BrokerId in the final
> schema. I
> > would remove ReplicaId. The comment is enough to explain the renaming.
> >
> > 09. Could you elaborate on "As the AlterPartition and Fetch requests are
> > shared between ZK and Kraft mode, the related field will keep empty in ZK
> > mode and will not be used."? I would do it the other way around, I think
> > that the followers and the leader should always populare all the fields
> but
> > the ZK controller should not take it into account. It will be easier like
> > this vs having to put gates everywhere.
> >
> > Best,
> > David
> >
> > On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu 
> > wrote:
> >
> > > Hi Alexandre,
> > > 104: I am not aware of any proposals around the up-to-date replica
> > > election. It can be a good place to dig deeper.
> > > 105: Yes, it almost puts nothing on the performance.
> > >
> > > Thanks,
> > > Calvin
> > >
> > > On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
> > > alexandre.dupr...@gmail.com> wrote:
> > >
> > > > Hi, Calvin,
> > > >
> > > > Many thanks for the replies.
> > > >
> > > > 103. Thanks for confirming the testing strategy. Exercising the right
> > > > sequence of requests is a good idea.
> > > >
> > > > 104. Out of scope here, but the KIP reminds that currently, the
> > > > controller does not attempt to detect data loss on the latest replica
> > > > in the ISR and to choose the most up-to-date replica as a leader. Is
> > > > this an area where the community wants to invest in building a
> > > > potential solution?
> > > >
> > > > 105. I assume any performance impact of the change in Fetch requests
> > > > is marginal if even noticeable at all.
> > > >
> > > >  Thanks,
> > > > Alexandre
> > > >
> > > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu  >
> > a
> > > > écrit :
> > > > >
> > > > > For Jun:
> > > > > --- Since this KIP changes the inter-broker protocol, should we
> bump
> > up
> > > > the
> > > > > metadata version (the equivalent of IBP) during the upgrade?
> > > > > Yes, we can.
> > > > >
> > > > > For Artem:
> > > > > --- Could you elaborate on the behavior during rolls in the
> > > Compatibility
> > > > > section?
> > > > > We can update the metadata version(IBP) to make sure the brokers
> can
> > > have
> > > > > the latest code.
> > > > > Also, during the IBP upgrade, the leader can just skip setting
> broker
> > > > epoch
> > > > > or set -1 for the brokers that have not started using the new APIs.
> > The
> > > > > controller can bypass the check for such brokers so that the
> cluster
> > > can
> > > > > still change ISR during the upgrade.
> > > > >
> > > > > --- Also for compatibility it's probably going to be easier to just
> > > add a
> > > > > new
> > > > > array of epochs in addition to the existing array of broker ids,
> > > instead
> > > > of
> > > > > removing one field and adding another.
> > > > > We can do it both ways. Just prefer to replace it with a new field
> > for
> > > > > better readability.
> > > > >
> > > > > --- The KIP mentions that we would explicitly do something special
> in
> > > ZK
> > > > > mode
> > > > > in order to not implement new functionality.
> > > > > It can be as simple as skipping setting the broker epoch if it is
> in
> > ZK
> > > > > mode.
> > > > >
> > > > > For Alexandre:
> > > > > 100: Yes, you are right, it does not require a controller
> movement. I
> > > > > removed the controller movement part when quoting the scenario from
> > the
> > > > > KAFKA-14139.
> > > > > 101: In the current code, you can't remove the last replica in the
> > ISR.
> > > > > But, any damage to this re

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-10 Thread Calvin Liu
Hi David,
01. It is a good idea to fence some delayed Fetch requests from the reboot
followers.
06. Ack
09. Rephrased the sentence to let the ZK controller ignore the broker epoch.

Thanks,
Calvin.

On Fri, Feb 10, 2023 at 6:11 AM David Jacot 
wrote:

> Hi Calvin,
>
> 01. Yeah, this should generally not happen. I was wondering if it could
> happen in the case of a partial failure of a broker for instance.
> Generally, I think that being defensive on the leader side does not hurt
> here. I am perhaps too extreme here.
>
> 03. NewIsr does not need to be ignorable because we will only set it when
> an old version is used.
>
> 06. It is a bit weird to have ReplicaId and BrokerId in the final schema. I
> would remove ReplicaId. The comment is enough to explain the renaming.
>
> 09. Could you elaborate on "As the AlterPartition and Fetch requests are
> shared between ZK and Kraft mode, the related field will keep empty in ZK
> mode and will not be used."? I would do it the other way around, I think
> that the followers and the leader should always populare all the fields but
> the ZK controller should not take it into account. It will be easier like
> this vs having to put gates everywhere.
>
> Best,
> David
>
> On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu 
> wrote:
>
> > Hi Alexandre,
> > 104: I am not aware of any proposals around the up-to-date replica
> > election. It can be a good place to dig deeper.
> > 105: Yes, it almost puts nothing on the performance.
> >
> > Thanks,
> > Calvin
> >
> > On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
> > alexandre.dupr...@gmail.com> wrote:
> >
> > > Hi, Calvin,
> > >
> > > Many thanks for the replies.
> > >
> > > 103. Thanks for confirming the testing strategy. Exercising the right
> > > sequence of requests is a good idea.
> > >
> > > 104. Out of scope here, but the KIP reminds that currently, the
> > > controller does not attempt to detect data loss on the latest replica
> > > in the ISR and to choose the most up-to-date replica as a leader. Is
> > > this an area where the community wants to invest in building a
> > > potential solution?
> > >
> > > 105. I assume any performance impact of the change in Fetch requests
> > > is marginal if even noticeable at all.
> > >
> > >  Thanks,
> > > Alexandre
> > >
> > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu 
> a
> > > écrit :
> > > >
> > > > For Jun:
> > > > --- Since this KIP changes the inter-broker protocol, should we bump
> up
> > > the
> > > > metadata version (the equivalent of IBP) during the upgrade?
> > > > Yes, we can.
> > > >
> > > > For Artem:
> > > > --- Could you elaborate on the behavior during rolls in the
> > Compatibility
> > > > section?
> > > > We can update the metadata version(IBP) to make sure the brokers can
> > have
> > > > the latest code.
> > > > Also, during the IBP upgrade, the leader can just skip setting broker
> > > epoch
> > > > or set -1 for the brokers that have not started using the new APIs.
> The
> > > > controller can bypass the check for such brokers so that the cluster
> > can
> > > > still change ISR during the upgrade.
> > > >
> > > > --- Also for compatibility it's probably going to be easier to just
> > add a
> > > > new
> > > > array of epochs in addition to the existing array of broker ids,
> > instead
> > > of
> > > > removing one field and adding another.
> > > > We can do it both ways. Just prefer to replace it with a new field
> for
> > > > better readability.
> > > >
> > > > --- The KIP mentions that we would explicitly do something special in
> > ZK
> > > > mode
> > > > in order to not implement new functionality.
> > > > It can be as simple as skipping setting the broker epoch if it is in
> ZK
> > > > mode.
> > > >
> > > > For Alexandre:
> > > > 100: Yes, you are right, it does not require a controller movement. I
> > > > removed the controller movement part when quoting the scenario from
> the
> > > > KAFKA-14139.
> > > > 101: In the current code, you can't remove the last replica in the
> ISR.
> > > > But, any damage to this replica will result in data loss.
> > > > 102: Yeah, I understand your concern. It is definitely great to have
> a
> > > fix
> > > > in ZK, but considering the KAFKA-14139 is a rare case, it has higher
> > ROE
> > > > for applying to just Kraft mode.
> > > > 103: I found that it is much easier to repro the bug in Kraft mode by
> > > > feeding the controller with a given sequence of events/requests. So
> we
> > > may
> > > > just need a unit test to cover the case.
> > > >
> > > > For David:
> > > > 01:
> > > > Can you help explain why the follower can have a stale broker epoch?
> Is
> > > it
> > > > because the registration request has any delay? But the broker will
> not
> > > > start fetching before the registration success.
> > > > On the other hand, if the follower fetches with the stale broker
> epoch,
> > > is
> > > > it good enough to ask the leader to hold to include this follower
> until
> > > the
> > > > fetch is consistent with the metada

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-10 Thread David Jacot
Hi Calvin,

01. Yeah, this should generally not happen. I was wondering if it could
happen in the case of a partial failure of a broker for instance.
Generally, I think that being defensive on the leader side does not hurt
here. I am perhaps too extreme here.

03. NewIsr does not need to be ignorable because we will only set it when
an old version is used.

06. It is a bit weird to have ReplicaId and BrokerId in the final schema. I
would remove ReplicaId. The comment is enough to explain the renaming.

09. Could you elaborate on "As the AlterPartition and Fetch requests are
shared between ZK and Kraft mode, the related field will keep empty in ZK
mode and will not be used."? I would do it the other way around, I think
that the followers and the leader should always populare all the fields but
the ZK controller should not take it into account. It will be easier like
this vs having to put gates everywhere.

Best,
David

On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu 
wrote:

> Hi Alexandre,
> 104: I am not aware of any proposals around the up-to-date replica
> election. It can be a good place to dig deeper.
> 105: Yes, it almost puts nothing on the performance.
>
> Thanks,
> Calvin
>
> On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
> > Hi, Calvin,
> >
> > Many thanks for the replies.
> >
> > 103. Thanks for confirming the testing strategy. Exercising the right
> > sequence of requests is a good idea.
> >
> > 104. Out of scope here, but the KIP reminds that currently, the
> > controller does not attempt to detect data loss on the latest replica
> > in the ISR and to choose the most up-to-date replica as a leader. Is
> > this an area where the community wants to invest in building a
> > potential solution?
> >
> > 105. I assume any performance impact of the change in Fetch requests
> > is marginal if even noticeable at all.
> >
> >  Thanks,
> > Alexandre
> >
> > Le mer. 8 févr. 2023 à 23:57, Calvin Liu  a
> > écrit :
> > >
> > > For Jun:
> > > --- Since this KIP changes the inter-broker protocol, should we bump up
> > the
> > > metadata version (the equivalent of IBP) during the upgrade?
> > > Yes, we can.
> > >
> > > For Artem:
> > > --- Could you elaborate on the behavior during rolls in the
> Compatibility
> > > section?
> > > We can update the metadata version(IBP) to make sure the brokers can
> have
> > > the latest code.
> > > Also, during the IBP upgrade, the leader can just skip setting broker
> > epoch
> > > or set -1 for the brokers that have not started using the new APIs. The
> > > controller can bypass the check for such brokers so that the cluster
> can
> > > still change ISR during the upgrade.
> > >
> > > --- Also for compatibility it's probably going to be easier to just
> add a
> > > new
> > > array of epochs in addition to the existing array of broker ids,
> instead
> > of
> > > removing one field and adding another.
> > > We can do it both ways. Just prefer to replace it with a new field for
> > > better readability.
> > >
> > > --- The KIP mentions that we would explicitly do something special in
> ZK
> > > mode
> > > in order to not implement new functionality.
> > > It can be as simple as skipping setting the broker epoch if it is in ZK
> > > mode.
> > >
> > > For Alexandre:
> > > 100: Yes, you are right, it does not require a controller movement. I
> > > removed the controller movement part when quoting the scenario from the
> > > KAFKA-14139.
> > > 101: In the current code, you can't remove the last replica in the ISR.
> > > But, any damage to this replica will result in data loss.
> > > 102: Yeah, I understand your concern. It is definitely great to have a
> > fix
> > > in ZK, but considering the KAFKA-14139 is a rare case, it has higher
> ROE
> > > for applying to just Kraft mode.
> > > 103: I found that it is much easier to repro the bug in Kraft mode by
> > > feeding the controller with a given sequence of events/requests. So we
> > may
> > > just need a unit test to cover the case.
> > >
> > > For David:
> > > 01:
> > > Can you help explain why the follower can have a stale broker epoch? Is
> > it
> > > because the registration request has any delay? But the broker will not
> > > start fetching before the registration success.
> > > On the other hand, if the follower fetches with the stale broker epoch,
> > is
> > > it good enough to ask the leader to hold to include this follower until
> > the
> > > fetch is consistent with the metadata cache?
> > > 02: Ack
> > > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated
> > > field NewIsr, what is the right way to make it ignorable? Just mark it
> > > ignorable in this change?
> > > 04: Ack
> > > 05: Ack
> > > 06: Ack
> > > 07: Yes, will add it.
> > >
> > >
> > >
> > > On Wed, Feb 8, 2023 at 6:52 AM David Jacot  >
> > > wrote:
> > >
> > > > Hi Calvin,
> > > >
> > > > Thanks for the KIP! The overall approach looks reasonable to me. I
> > have a
> > > > few questions/comments:
> > > >

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-09 Thread Calvin Liu
Hi Alexandre,
104: I am not aware of any proposals around the up-to-date replica
election. It can be a good place to dig deeper.
105: Yes, it almost puts nothing on the performance.

Thanks,
Calvin

On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi, Calvin,
>
> Many thanks for the replies.
>
> 103. Thanks for confirming the testing strategy. Exercising the right
> sequence of requests is a good idea.
>
> 104. Out of scope here, but the KIP reminds that currently, the
> controller does not attempt to detect data loss on the latest replica
> in the ISR and to choose the most up-to-date replica as a leader. Is
> this an area where the community wants to invest in building a
> potential solution?
>
> 105. I assume any performance impact of the change in Fetch requests
> is marginal if even noticeable at all.
>
>  Thanks,
> Alexandre
>
> Le mer. 8 févr. 2023 à 23:57, Calvin Liu  a
> écrit :
> >
> > For Jun:
> > --- Since this KIP changes the inter-broker protocol, should we bump up
> the
> > metadata version (the equivalent of IBP) during the upgrade?
> > Yes, we can.
> >
> > For Artem:
> > --- Could you elaborate on the behavior during rolls in the Compatibility
> > section?
> > We can update the metadata version(IBP) to make sure the brokers can have
> > the latest code.
> > Also, during the IBP upgrade, the leader can just skip setting broker
> epoch
> > or set -1 for the brokers that have not started using the new APIs. The
> > controller can bypass the check for such brokers so that the cluster can
> > still change ISR during the upgrade.
> >
> > --- Also for compatibility it's probably going to be easier to just add a
> > new
> > array of epochs in addition to the existing array of broker ids, instead
> of
> > removing one field and adding another.
> > We can do it both ways. Just prefer to replace it with a new field for
> > better readability.
> >
> > --- The KIP mentions that we would explicitly do something special in ZK
> > mode
> > in order to not implement new functionality.
> > It can be as simple as skipping setting the broker epoch if it is in ZK
> > mode.
> >
> > For Alexandre:
> > 100: Yes, you are right, it does not require a controller movement. I
> > removed the controller movement part when quoting the scenario from the
> > KAFKA-14139.
> > 101: In the current code, you can't remove the last replica in the ISR.
> > But, any damage to this replica will result in data loss.
> > 102: Yeah, I understand your concern. It is definitely great to have a
> fix
> > in ZK, but considering the KAFKA-14139 is a rare case, it has higher ROE
> > for applying to just Kraft mode.
> > 103: I found that it is much easier to repro the bug in Kraft mode by
> > feeding the controller with a given sequence of events/requests. So we
> may
> > just need a unit test to cover the case.
> >
> > For David:
> > 01:
> > Can you help explain why the follower can have a stale broker epoch? Is
> it
> > because the registration request has any delay? But the broker will not
> > start fetching before the registration success.
> > On the other hand, if the follower fetches with the stale broker epoch,
> is
> > it good enough to ask the leader to hold to include this follower until
> the
> > fetch is consistent with the metadata cache?
> > 02: Ack
> > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated
> > field NewIsr, what is the right way to make it ignorable? Just mark it
> > ignorable in this change?
> > 04: Ack
> > 05: Ack
> > 06: Ack
> > 07: Yes, will add it.
> >
> >
> >
> > On Wed, Feb 8, 2023 at 6:52 AM David Jacot 
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > Thanks for the KIP! The overall approach looks reasonable to me. I
> have a
> > > few questions/comments:
> > >
> > > 01. I wonder if the leader should also verify the broker epochs based
> on
> > > its metadata cache before sending the AlterPartition request to the
> > > controller. Imagine the case where a follower not in the ISR would keep
> > > fetching with a stale broker epoch but with a valid leader epoch. In
> this
> > > case, the leader would try to add it back to the ISR when the follower
> > > catches up but the controller would reject it because its broker epoch
> is
> > > stale. Until this condition resolves itself, the leader can't add any
> other
> > > followers back to the ISR because the ISR validation is all or nothing
> at
> > > the moment. Letting the leader check before sending the AlterPartition
> > > request would mitigate this. This condition must be rare but could
> happen.
> > > We did this in KIP-841.
> > >
> > > 02. In AlterPartitionRequest/AlterPartitionResponse, you need to
> update the
> > > `validVersions` to `0-3`.
> > >
> > > 03. Personally, I like the `NewIsrWithEpochs` field in
> > > the AlterPartitionRequest. We can handle the backward compatibility in
> the
> > > request builder. Basically, we would always populate NewIsrWithEpochs
> and
> > > the request builder can tr

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-09 Thread Alexandre Dupriez
Hi, Calvin,

Many thanks for the replies.

103. Thanks for confirming the testing strategy. Exercising the right
sequence of requests is a good idea.

104. Out of scope here, but the KIP reminds that currently, the
controller does not attempt to detect data loss on the latest replica
in the ISR and to choose the most up-to-date replica as a leader. Is
this an area where the community wants to invest in building a
potential solution?

105. I assume any performance impact of the change in Fetch requests
is marginal if even noticeable at all.

 Thanks,
Alexandre

Le mer. 8 févr. 2023 à 23:57, Calvin Liu  a écrit :
>
> For Jun:
> --- Since this KIP changes the inter-broker protocol, should we bump up the
> metadata version (the equivalent of IBP) during the upgrade?
> Yes, we can.
>
> For Artem:
> --- Could you elaborate on the behavior during rolls in the Compatibility
> section?
> We can update the metadata version(IBP) to make sure the brokers can have
> the latest code.
> Also, during the IBP upgrade, the leader can just skip setting broker epoch
> or set -1 for the brokers that have not started using the new APIs. The
> controller can bypass the check for such brokers so that the cluster can
> still change ISR during the upgrade.
>
> --- Also for compatibility it's probably going to be easier to just add a
> new
> array of epochs in addition to the existing array of broker ids, instead of
> removing one field and adding another.
> We can do it both ways. Just prefer to replace it with a new field for
> better readability.
>
> --- The KIP mentions that we would explicitly do something special in ZK
> mode
> in order to not implement new functionality.
> It can be as simple as skipping setting the broker epoch if it is in ZK
> mode.
>
> For Alexandre:
> 100: Yes, you are right, it does not require a controller movement. I
> removed the controller movement part when quoting the scenario from the
> KAFKA-14139.
> 101: In the current code, you can't remove the last replica in the ISR.
> But, any damage to this replica will result in data loss.
> 102: Yeah, I understand your concern. It is definitely great to have a fix
> in ZK, but considering the KAFKA-14139 is a rare case, it has higher ROE
> for applying to just Kraft mode.
> 103: I found that it is much easier to repro the bug in Kraft mode by
> feeding the controller with a given sequence of events/requests. So we may
> just need a unit test to cover the case.
>
> For David:
> 01:
> Can you help explain why the follower can have a stale broker epoch? Is it
> because the registration request has any delay? But the broker will not
> start fetching before the registration success.
> On the other hand, if the follower fetches with the stale broker epoch, is
> it good enough to ask the leader to hold to include this follower until the
> fetch is consistent with the metadata cache?
> 02: Ack
> 03: I think NewIsrWithEpochs is not ignorable. For the deprecated
> field NewIsr, what is the right way to make it ignorable? Just mark it
> ignorable in this change?
> 04: Ack
> 05: Ack
> 06: Ack
> 07: Yes, will add it.
>
>
>
> On Wed, Feb 8, 2023 at 6:52 AM David Jacot 
> wrote:
>
> > Hi Calvin,
> >
> > Thanks for the KIP! The overall approach looks reasonable to me. I have a
> > few questions/comments:
> >
> > 01. I wonder if the leader should also verify the broker epochs based on
> > its metadata cache before sending the AlterPartition request to the
> > controller. Imagine the case where a follower not in the ISR would keep
> > fetching with a stale broker epoch but with a valid leader epoch. In this
> > case, the leader would try to add it back to the ISR when the follower
> > catches up but the controller would reject it because its broker epoch is
> > stale. Until this condition resolves itself, the leader can't add any other
> > followers back to the ISR because the ISR validation is all or nothing at
> > the moment. Letting the leader check before sending the AlterPartition
> > request would mitigate this. This condition must be rare but could happen.
> > We did this in KIP-841.
> >
> > 02. In AlterPartitionRequest/AlterPartitionResponse, you need to update the
> > `validVersions` to `0-3`.
> >
> > 03. Personally, I like the `NewIsrWithEpochs` field in
> > the AlterPartitionRequest. We can handle the backward compatibility in the
> > request builder. Basically, we would always populate NewIsrWithEpochs and
> > the request builder can translate it to NewIsr if an earlier version is
> > used. Should the field be ignorable?
> >
> > 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?
> >
> > 05. In FetchRequest/FetchResponse, you need to update the `validVersions`
> > as well.
> >
> > 06. It is a little weird to have ReplicaId and BrokerEpoch in the
> > FetchRequest. I wonder if we should rename ReplicaId to BrokerId because it
> > is actually the broker id (even the documentation of the field says it).
> >
> > 07. On the followers, the fetch request versio

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-08 Thread Calvin Liu
For Jun:
--- Since this KIP changes the inter-broker protocol, should we bump up the
metadata version (the equivalent of IBP) during the upgrade?
Yes, we can.

For Artem:
--- Could you elaborate on the behavior during rolls in the Compatibility
section?
We can update the metadata version(IBP) to make sure the brokers can have
the latest code.
Also, during the IBP upgrade, the leader can just skip setting broker epoch
or set -1 for the brokers that have not started using the new APIs. The
controller can bypass the check for such brokers so that the cluster can
still change ISR during the upgrade.

--- Also for compatibility it's probably going to be easier to just add a
new
array of epochs in addition to the existing array of broker ids, instead of
removing one field and adding another.
We can do it both ways. Just prefer to replace it with a new field for
better readability.

--- The KIP mentions that we would explicitly do something special in ZK
mode
in order to not implement new functionality.
It can be as simple as skipping setting the broker epoch if it is in ZK
mode.

For Alexandre:
100: Yes, you are right, it does not require a controller movement. I
removed the controller movement part when quoting the scenario from the
KAFKA-14139.
101: In the current code, you can't remove the last replica in the ISR.
But, any damage to this replica will result in data loss.
102: Yeah, I understand your concern. It is definitely great to have a fix
in ZK, but considering the KAFKA-14139 is a rare case, it has higher ROE
for applying to just Kraft mode.
103: I found that it is much easier to repro the bug in Kraft mode by
feeding the controller with a given sequence of events/requests. So we may
just need a unit test to cover the case.

For David:
01:
Can you help explain why the follower can have a stale broker epoch? Is it
because the registration request has any delay? But the broker will not
start fetching before the registration success.
On the other hand, if the follower fetches with the stale broker epoch, is
it good enough to ask the leader to hold to include this follower until the
fetch is consistent with the metadata cache?
02: Ack
03: I think NewIsrWithEpochs is not ignorable. For the deprecated
field NewIsr, what is the right way to make it ignorable? Just mark it
ignorable in this change?
04: Ack
05: Ack
06: Ack
07: Yes, will add it.



On Wed, Feb 8, 2023 at 6:52 AM David Jacot 
wrote:

> Hi Calvin,
>
> Thanks for the KIP! The overall approach looks reasonable to me. I have a
> few questions/comments:
>
> 01. I wonder if the leader should also verify the broker epochs based on
> its metadata cache before sending the AlterPartition request to the
> controller. Imagine the case where a follower not in the ISR would keep
> fetching with a stale broker epoch but with a valid leader epoch. In this
> case, the leader would try to add it back to the ISR when the follower
> catches up but the controller would reject it because its broker epoch is
> stale. Until this condition resolves itself, the leader can't add any other
> followers back to the ISR because the ISR validation is all or nothing at
> the moment. Letting the leader check before sending the AlterPartition
> request would mitigate this. This condition must be rare but could happen.
> We did this in KIP-841.
>
> 02. In AlterPartitionRequest/AlterPartitionResponse, you need to update the
> `validVersions` to `0-3`.
>
> 03. Personally, I like the `NewIsrWithEpochs` field in
> the AlterPartitionRequest. We can handle the backward compatibility in the
> request builder. Basically, we would always populate NewIsrWithEpochs and
> the request builder can translate it to NewIsr if an earlier version is
> used. Should the field be ignorable?
>
> 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?
>
> 05. In FetchRequest/FetchResponse, you need to update the `validVersions`
> as well.
>
> 06. It is a little weird to have ReplicaId and BrokerEpoch in the
> FetchRequest. I wonder if we should rename ReplicaId to BrokerId because it
> is actually the broker id (even the documentation of the field says it).
>
> 07. On the followers, the fetch request version is derived from the
> metadata version/IBP. As we add a new fetch version, we need to add a new
> metadata version/IBP as well.
>
> 08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is on the way
> out so not doing it seems fine. If we do this, we could basically just
> ignore the broker epoch in ZK and it will keep working as today.
>
> Best,
> David
>
> On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
> > Hi, Calvin,
> >
> > Thanks for the KIP and fast follow-up. A few questions.
> >
> > 100. The scenario illustrated in the KIP involves a controller
> > movement. Is this really required? Cannot the scenario occur with a
> > similar stale AlterPartition request and the same controller
> > throughout?
> >
> > 101. In the case where card(ISR) = 1

RE: Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-08 Thread Calvin Liu
Hi Jun,
Thanks for taking a look.
Yes, We can update the metadata version.

On 2023/02/08 00:57:54 Jun Rao wrote:
> Hi, Calvin,
>
> Thanks for the KIP. Looks good to me overall.
>
> Since this KIP changes the inter-broker protocol, should we bump up the
> metadata version (the equivalent of IBP) during upgrade?
>
> Jun
>
>
> On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu 
> wrote:
>
> > Hi everyone,
> > I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
> > .
> > It changes the Fetch and AlterPartition requests to include the broker
> > epochs. Then the controller can use these epochs to help reject the
stale
> > AlterPartition request.
> > Please take a look. Thanks!
> >
> >
https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> >
>


Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-08 Thread David Jacot
Hi Calvin,

Thanks for the KIP! The overall approach looks reasonable to me. I have a
few questions/comments:

01. I wonder if the leader should also verify the broker epochs based on
its metadata cache before sending the AlterPartition request to the
controller. Imagine the case where a follower not in the ISR would keep
fetching with a stale broker epoch but with a valid leader epoch. In this
case, the leader would try to add it back to the ISR when the follower
catches up but the controller would reject it because its broker epoch is
stale. Until this condition resolves itself, the leader can't add any other
followers back to the ISR because the ISR validation is all or nothing at
the moment. Letting the leader check before sending the AlterPartition
request would mitigate this. This condition must be rare but could happen.
We did this in KIP-841.

02. In AlterPartitionRequest/AlterPartitionResponse, you need to update the
`validVersions` to `0-3`.

03. Personally, I like the `NewIsrWithEpochs` field in
the AlterPartitionRequest. We can handle the backward compatibility in the
request builder. Basically, we would always populate NewIsrWithEpochs and
the request builder can translate it to NewIsr if an earlier version is
used. Should the field be ignorable?

04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?

05. In FetchRequest/FetchResponse, you need to update the `validVersions`
as well.

06. It is a little weird to have ReplicaId and BrokerEpoch in the
FetchRequest. I wonder if we should rename ReplicaId to BrokerId because it
is actually the broker id (even the documentation of the field says it).

07. On the followers, the fetch request version is derived from the
metadata version/IBP. As we add a new fetch version, we need to add a new
metadata version/IBP as well.

08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is on the way
out so not doing it seems fine. If we do this, we could basically just
ignore the broker epoch in ZK and it will keep working as today.

Best,
David

On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi, Calvin,
>
> Thanks for the KIP and fast follow-up. A few questions.
>
> 100. The scenario illustrated in the KIP involves a controller
> movement. Is this really required? Cannot the scenario occur with a
> similar stale AlterPartition request and the same controller
> throughout?
>
> 101. In the case where card(ISR) = 1 and the last replica leaves, it
> will be re-elected as the leader upon reconnection. If the replica is
> empty, all data for the partition will be lost. Is this a correct
> understanding of the scenario?
>
> 102. I understand that ZK is going to be unsupported soon. However for
> as long as it is available to end users, is there any reason not to
> support the fix in ZK mode? Arguably, the implementation for the logic
> to AlterPartition is duplicated for both controller types, and it may
> be more work than is worth if ZK is fully decommissioned in the next
> few months. (Alternatively, is there a plan to back port the fix to
> older minor versions?).
>
> 103. The KIP mentions system tests to be used to simulate the race
> condition. Would it be possible to provide more details about it? Do
> we think it worth having this scenario be exercised in the functional
> tests of the core/server module?
>
> Thanks,
> Alexandre
>
> Le mer. 8 févr. 2023 à 03:31, Artem Livshits
>  a écrit :
> >
> > Hi Calvin,
> >
> > Thank you for the KIP.  I have a similar question -- we need to support
> > rolling upgrades (when we have some old brokers and some new brokers), so
> > there could be combinations of old leader - new follower, new leader -
> old
> > follower, new leader - old controller, old leader - new controller.
> Could
> > you elaborate on the behavior during rolls in the Compatibility section?
> >
> > Also for compatibility it's probably going to be easier to just add a new
> > array of epochs in addition to the existing array of broker ids, instead
> of
> > removing one field and adding another.
> >
> > The KIP mentions that we would explicitly do something special in ZK mode
> > in order to not implement new functionality.  I think it may be easier to
> > implement functionality for both ZK and KRraft mode than adding code to
> > disable it in ZK mode.
> >
> > -Artem
> >
> > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao  wrote:
> >
> > > Hi, Calvin,
> > >
> > > Thanks for the KIP. Looks good to me overall.
> > >
> > > Since this KIP changes the inter-broker protocol, should we bump up the
> > > metadata version (the equivalent of IBP) during upgrade?
> > >
> > > Jun
> > >
> > >
> > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu  >
> > > wrote:
> > >
> > > > Hi everyone,
> > > > I'd like to discuss the fix for the broker reboot data loss
> KAFKA-14139
> > > > .
> > > > It changes the Fetch and AlterPartition requests to include the
> broker
> > > > epochs. The

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-08 Thread Alexandre Dupriez
Hi, Calvin,

Thanks for the KIP and fast follow-up. A few questions.

100. The scenario illustrated in the KIP involves a controller
movement. Is this really required? Cannot the scenario occur with a
similar stale AlterPartition request and the same controller
throughout?

101. In the case where card(ISR) = 1 and the last replica leaves, it
will be re-elected as the leader upon reconnection. If the replica is
empty, all data for the partition will be lost. Is this a correct
understanding of the scenario?

102. I understand that ZK is going to be unsupported soon. However for
as long as it is available to end users, is there any reason not to
support the fix in ZK mode? Arguably, the implementation for the logic
to AlterPartition is duplicated for both controller types, and it may
be more work than is worth if ZK is fully decommissioned in the next
few months. (Alternatively, is there a plan to back port the fix to
older minor versions?).

103. The KIP mentions system tests to be used to simulate the race
condition. Would it be possible to provide more details about it? Do
we think it worth having this scenario be exercised in the functional
tests of the core/server module?

Thanks,
Alexandre

Le mer. 8 févr. 2023 à 03:31, Artem Livshits
 a écrit :
>
> Hi Calvin,
>
> Thank you for the KIP.  I have a similar question -- we need to support
> rolling upgrades (when we have some old brokers and some new brokers), so
> there could be combinations of old leader - new follower, new leader - old
> follower, new leader - old controller, old leader - new controller.  Could
> you elaborate on the behavior during rolls in the Compatibility section?
>
> Also for compatibility it's probably going to be easier to just add a new
> array of epochs in addition to the existing array of broker ids, instead of
> removing one field and adding another.
>
> The KIP mentions that we would explicitly do something special in ZK mode
> in order to not implement new functionality.  I think it may be easier to
> implement functionality for both ZK and KRraft mode than adding code to
> disable it in ZK mode.
>
> -Artem
>
> On Tue, Feb 7, 2023 at 4:58 PM Jun Rao  wrote:
>
> > Hi, Calvin,
> >
> > Thanks for the KIP. Looks good to me overall.
> >
> > Since this KIP changes the inter-broker protocol, should we bump up the
> > metadata version (the equivalent of IBP) during upgrade?
> >
> > Jun
> >
> >
> > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu 
> > wrote:
> >
> > > Hi everyone,
> > > I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
> > > .
> > > It changes the Fetch and AlterPartition requests to include the broker
> > > epochs. Then the controller can use these epochs to help reject the stale
> > > AlterPartition request.
> > > Please take a look. Thanks!
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > >
> >


Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-07 Thread Artem Livshits
Hi Calvin,

Thank you for the KIP.  I have a similar question -- we need to support
rolling upgrades (when we have some old brokers and some new brokers), so
there could be combinations of old leader - new follower, new leader - old
follower, new leader - old controller, old leader - new controller.  Could
you elaborate on the behavior during rolls in the Compatibility section?

Also for compatibility it's probably going to be easier to just add a new
array of epochs in addition to the existing array of broker ids, instead of
removing one field and adding another.

The KIP mentions that we would explicitly do something special in ZK mode
in order to not implement new functionality.  I think it may be easier to
implement functionality for both ZK and KRraft mode than adding code to
disable it in ZK mode.

-Artem

On Tue, Feb 7, 2023 at 4:58 PM Jun Rao  wrote:

> Hi, Calvin,
>
> Thanks for the KIP. Looks good to me overall.
>
> Since this KIP changes the inter-broker protocol, should we bump up the
> metadata version (the equivalent of IBP) during upgrade?
>
> Jun
>
>
> On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu 
> wrote:
>
> > Hi everyone,
> > I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
> > .
> > It changes the Fetch and AlterPartition requests to include the broker
> > epochs. Then the controller can use these epochs to help reject the stale
> > AlterPartition request.
> > Please take a look. Thanks!
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> >
>


Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

2023-02-07 Thread Jun Rao
Hi, Calvin,

Thanks for the KIP. Looks good to me overall.

Since this KIP changes the inter-broker protocol, should we bump up the
metadata version (the equivalent of IBP) during upgrade?

Jun


On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu 
wrote:

> Hi everyone,
> I'd like to discuss the fix for the broker reboot data loss KAFKA-14139
> .
> It changes the Fetch and AlterPartition requests to include the broker
> epochs. Then the controller can use these epochs to help reject the stale
> AlterPartition request.
> Please take a look. Thanks!
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
>