Thanks Ismael,

I added an extra paragraph in the motivation. We have certainly hit this
within our internal Confluent reassignment software and from a quick skim
in the popular Cruise Control repository, I notice that similar problems
have been hit there too. Hopefully the examples in the KIP are sufficient
to make the case

On Wed, Sep 7, 2022 at 11:21 PM Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks for the details, Colin. I understand how this can happen. But this
> API has been out for a long time. Are we saying that we have seen Cruise
> Control cause this kind of problem? If so, it would be good to mention it
> in the KIP as evidence that the current approach is brittle.
>
> Ismael
>
> On Wed, Sep 7, 2022 at 2:15 PM Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi Ismael,
> >
> > I think this issue comes up when people write software that automatically
> > creates partition reassignments to balance the cluster. Cruise Control is
> > one example; Confluent also has some software that does this. If there is
> > already a reassignment that is going on for some partition and the
> software
> > tries to create a new reassignment for that partition, the software may
> > inadvertently change the replication factor.
> >
> > In general, I think some people find it surprising that reassignment can
> > change the replication factor of a partition. When we outlined the
> > reassignment API in KIP-455 we maintained the ability to do this, since
> the
> > old ZK-based API had always been able to do it. But this was a bit
> > controversial. Maybe it would have been more intuitive to preserve
> > replication factor by default unless the user explicitly stated that they
> > wanted to change it. So in a sense, you could view this as a fix for
> > KIP-455 :) (in my opinion, at least)
> >
> > best,
> > Colin
> >
> >
> > On Wed, Sep 7, 2022, at 07:07, Ismael Juma wrote:
> > > Thanks for the KIP. Can we explain a bit more why this is an important
> > use
> > > case to address? For example, do we have concrete examples of people
> > > running into this? The way the KIP is written, it sounds like a
> potential
> > > problem but no information is given on whether it's a real problem in
> > > practice.
> > >
> > > Ismael
> > >
> > > On Thu, Jul 28, 2022 at 2:00 AM Stanislav Kozlovski
> > > <stanis...@confluent.io.invalid> wrote:
> > >
> > >> Hey all,
> > >>
> > >> I'd like to start a discussion on a proposal to help API users from
> > >> inadvertently increasing the replication factor of a topic through
> > >> the alter partition reassignments API. The KIP describes two fairly
> > >> easy-to-hit race conditions in which this can happen.
> > >>
> > >> The KIP itself is pretty simple, yet has a couple of alternatives that
> > can
> > >> help solve the same problem. I would appreciate thoughts from the
> > community
> > >> on how you think we should proceed, and whether the proposal makes
> > sense in
> > >> the first place.
> > >>
> > >> Thanks!
> > >>
> > >> KIP:
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
> > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14121
> > >>
> > >> --
> > >> Best,
> > >> Stanislav
> > >>
> >
>


-- 
Best,
Stanislav

Reply via email to