Thanks for the details, Colin. I understand how this can happen. But this
API has been out for a long time. Are we saying that we have seen Cruise
Control cause this kind of problem? If so, it would be good to mention it
in the KIP as evidence that the current approach is brittle.

Ismael

On Wed, Sep 7, 2022 at 2:15 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Ismael,
>
> I think this issue comes up when people write software that automatically
> creates partition reassignments to balance the cluster. Cruise Control is
> one example; Confluent also has some software that does this. If there is
> already a reassignment that is going on for some partition and the software
> tries to create a new reassignment for that partition, the software may
> inadvertently change the replication factor.
>
> In general, I think some people find it surprising that reassignment can
> change the replication factor of a partition. When we outlined the
> reassignment API in KIP-455 we maintained the ability to do this, since the
> old ZK-based API had always been able to do it. But this was a bit
> controversial. Maybe it would have been more intuitive to preserve
> replication factor by default unless the user explicitly stated that they
> wanted to change it. So in a sense, you could view this as a fix for
> KIP-455 :) (in my opinion, at least)
>
> best,
> Colin
>
>
> On Wed, Sep 7, 2022, at 07:07, Ismael Juma wrote:
> > Thanks for the KIP. Can we explain a bit more why this is an important
> use
> > case to address? For example, do we have concrete examples of people
> > running into this? The way the KIP is written, it sounds like a potential
> > problem but no information is given on whether it's a real problem in
> > practice.
> >
> > Ismael
> >
> > On Thu, Jul 28, 2022 at 2:00 AM Stanislav Kozlovski
> > <stanis...@confluent.io.invalid> wrote:
> >
> >> Hey all,
> >>
> >> I'd like to start a discussion on a proposal to help API users from
> >> inadvertently increasing the replication factor of a topic through
> >> the alter partition reassignments API. The KIP describes two fairly
> >> easy-to-hit race conditions in which this can happen.
> >>
> >> The KIP itself is pretty simple, yet has a couple of alternatives that
> can
> >> help solve the same problem. I would appreciate thoughts from the
> community
> >> on how you think we should proceed, and whether the proposal makes
> sense in
> >> the first place.
> >>
> >> Thanks!
> >>
> >> KIP:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
> >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14121
> >>
> >> --
> >> Best,
> >> Stanislav
> >>
>

Reply via email to