Thanks for the discussion all,

I have updated the KIP to mention throwing an UnsupportedVersionException
if the server is running an old version that would not honor the configured
allowReplicationFactor option.

Please take a look:
- KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
- changes:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=217392873&selectedPageVersions=4&selectedPageVersions=3

If there aren't extra comments, I plan on starting a vote thread by the end
of this week.

Best,
Stanislav

On Tue, Aug 9, 2022 at 5:06 AM David Jacot <dja...@confluent.io.invalid>
wrote:

> Throwing an UnsupportedVersionException with an appropriate message
> seems to be the best option when the new API is not supported and
> AllowReplicationFactorChange is not set to the default value.
>
> Cheers,
> David
>
> On Mon, Aug 8, 2022 at 6:25 PM Vikas Singh <vi...@confluent.io.invalid>
> wrote:
> >
> > I personally like the UVE option. It provides options for clients to go
> > either way, retry or abort. If we do it in AdminClient, then users have
> to
> > live with what we have chosen.
> >
> > > Note this can happen during an RF change too. e.g [1,2,3] => [4,5,6,7]
> (RF
> > > change, intermediate set is [1,2,3,4,5,6,7]), and we try to do a
> > > reassignment to [9,10,11], the logic will compare [4,5,6,7] to
> [9,10,11].
> > > In such a situation where one wants to cancel the RF increase and
> reassign
> > > again, one first needs to cancel the existing reassignment via the API
> (no
> > > special action required despite RF change)
> >
> > Thanks for the explanation. I did realize this nuance and thus requested
> to
> > put that in KIP as it's not mentioned why the choice was made. I am fine
> if
> > you choose to not do it in the interest of brevity.
> >
> > Vikas
> >
> > On Sun, Aug 7, 2022 at 9:02 AM Stanislav Kozlovski
> > <stanis...@confluent.io.invalid> wrote:
> >
> > > Thank you for the reviews.
> > >
> > > Vikas,
> > > > > In the case of an already-reassigning partition being reassigned
> again,
> > > the validation compares the targetReplicaSet size of the reassignment
> to
> > > the targetReplicaSet size of the new reassignment and throws if those
> > > differ.
> > > > Can you add more detail to this, or clarify what is targetReplicaSet
> (for
> > > e.g. why not sourceReplicaSet?) and how the target replica set will be
> > > calculated?
> > > If a reassignment is ongoing, such that [1,2,3] => [4,5,6] (the
> replica set
> > > in Kafka will be [1,2,3,4,5,6] during the reassignment), and you try to
> > > issue a new reassignment (e.g [7,8,9], Kafka should NOT think that the
> RF
> > > of the partition is 6 just because a reassignment is ongoing. Hence, we
> > > compare [4,5,6]'s length to [7,8,9]
> > > The targetReplicaSet is a term we use in KIP-455
> > > <
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> > > >.
> > > It means the desired replica set that a given reassignment is trying to
> > > achieve. Here we compare said set of the on-going reassignment to the
> new
> > > reassignment.
> > >
> > > Note this can happen during an RF change too. e.g [1,2,3] => [4,5,6,7]
> (RF
> > > change, intermediate set is [1,2,3,4,5,6,7]), and we try to do a
> > > reassignment to [9,10,11], the logic will compare [4,5,6,7] to
> [9,10,11].
> > > In such a situation where one wants to cancel the RF increase and
> reassign
> > > again, one first needs to cancel the existing reassignment via the API
> (no
> > > special action required despite RF change)
> > >
> > >
> > > > And what about the reassign partitions CLI? Do we want to expose the
> > > option there too?
> > > Yes, this is already present in the KIP if I'm not mistaken. We
> describe it
> > > in "Accordingly, the kafka-reassign-partitions.sh tool will be updated
> to
> > > allow supplying the new option:"
> > > I have edited the KIP to contain two clear paragraphs called Admin API
> and
> > > CLI now.
> > >
> > > Colin,
> > >
> > > >  it would be nice for the first paragraph to be a bit more explicit
> about
> > > this goal.
> > > sounds good, updated it with that suggestion.
> > >
> > > > client-side forward compatibility
> > > I was under the assumption that it is not recommended to upgrade
> clients
> > > before brokers, but a quick search cleared it up to me that we're
> pretty
> > > intentional about allowing that
> > > <
> > >
> https://www.confluent.io/blog/upgrading-apache-kafka-clients-just-got-easier/
> > > >
> > > .
> > > Do you happen to know if we have any policy on client-side forward
> > > compatibility with regard to such things -- extending "write" APIs
> (that
> > > mutate the state) with fields that conditionally limit that
> modification?
> > > It seems like a rare use case to me, hence renaming it to something
> like
> > > tryDisableReplicationFactorChange may unnecessary impair the API.
> > >
> > > Would Admin API documentation that says "this is supported only from
> > > brokers on version X and above. If not supported, the default behavior
> (no
> > > replication factor guards) is applied" be sufficient?
> > >
> > > Best,
> > > Stanislav
> > >
> > >
> > > On Fri, Aug 5, 2022 at 8:32 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > Thanks for the KIP. I think this is a nice solution to the problem
> of not
> > > > wanting to change the replication factor during reassignments.
> > > >
> > > > Just from a writing point of view, it would be nice for the first
> > > > paragraph to be a bit more explicit about this goal. Maybe lead with
> > > "Many
> > > > times, we don't want to change the replication factor of a partition
> > > during
> > > > reassignment..." As it is, we're talking about metadata races before
> > > we've
> > > > even explained what the goal is that the metadata races are
> thwarting. :)
> > > >
> > > > I like the RPC and command-line format changes; they are well-done
> and
> > > > well-written. One thing we do need to spell out, those, is what the
> > > > behavior is when the server does not support this new option. The
> > > simplest
> > > > thing to do would be for the client to throw
> UnsupportedVersionException
> > > > with an exception message indicating what the problem is. Then the
> caller
> > > > could catch this and re-try the call without the flag (or give up, as
> > > > appropriate?)
> > > >
> > > > The other option is to continue on but not actually protect
> replication
> > > > factor. If we do this, at minimum we'd need to rename the flag
> something
> > > > like "try to protect replication factor" to make it clear that it's
> > > > best-effort.
> > > >
> > > > It's sort of debatable which way is better. In principle the UVE
> sounds
> > > > nicer, but in practice maybe the other behavior is best? I suspect
> most
> > > > systems would turn around and retry without the flag in the event of
> a
> > > > UVE...
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Thu, Aug 4, 2022, at 13:37, Vikas Singh wrote:
> > > > > Thanks Stanislav for the KIP. Seems like a reasonable proposal,
> > > > > preventing users from accidentally altering the replica set under
> > > certain
> > > > > conditions. I have couple of comments:
> > > > >
> > > > >
> > > > >> In the case of an already-reassigning partition being reassigned
> > > again,
> > > > > the validation compares the targetReplicaSet size of the
> reassignment
> > > to
> > > > > the targetReplicaSet size of the new reassignment and throws if
> those
> > > > > differ.
> > > > > Can you add more detail to this, or clarify what is
> targetReplicaSet
> > > (for
> > > > > e.g. why not sourceReplicaSet?) and how the target replica set
> will be
> > > > > calculated?
> > > > >
> > > > > And what about the reassign partitions CLI? Do we want to expose
> the
> > > > option
> > > > > there too?
> > > > >
> > > > > Cheers,
> > > > > Vikas
> > > > >
> > > > > On Thu, Jul 28, 2022 at 1:59 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> Hey all,
> > > > >>
> > > > >> I'd like to start a discussion on a proposal to help API users
> from
> > > > >> inadvertently increasing the replication factor of a topic through
> > > > >> the alter partition reassignments API. The KIP describes two
> fairly
> > > > >> easy-to-hit race conditions in which this can happen.
> > > > >>
> > > > >> The KIP itself is pretty simple, yet has a couple of alternatives
> that
> > > > can
> > > > >> help solve the same problem. I would appreciate thoughts from the
> > > > community
> > > > >> on how you think we should proceed, and whether the proposal makes
> > > > sense in
> > > > >> the first place.
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> KIP:
> > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-860%3A+Add+client-provided+option+to+guard+against+replication+factor+change+during+partition+reassignments
> > > > >> JIRA: https://issues.apache.org/jira/browse/KAFKA-14121
> > > > >>
> > > > >> --
> > > > >> Best,
> > > > >> Stanislav
> > > > >>
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
>


-- 
Best,
Stanislav

Reply via email to