Hey folks, I just updated the KIP with details on proposed changes to the
kafka-features.sh tool. It includes four proposed sub-commands which will
provide the Basic and Advanced functions detailed in KIP-584. Please have a
look, thanks!
https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades#KIP778:KRaftUpgrades-KIP-584Addendum

Aside from this change, if there isn't any more feedback on the KIP I'd
like to start a vote soon.

Cheers,
David

On Thu, Oct 21, 2021 at 3:09 AM Kowshik Prakasam
<kpraka...@confluent.io.invalid> wrote:

> Hi David,
>
> Thanks for the explanations. Few comments below.
>
> 7001. Sounds good.
>
> 7002. Sounds good. The --force-downgrade-all option can be used for the
> basic CLI while the --force-downgrade option can be used for the advanced
> CLI.
>
> 7003. I like your suggestion on separate sub-commands, I agree it's more
> convenient to use.
>
> 7004/7005. Your explanation sounds good to me. Regarding the min finalized
> version level, this becomes useful for feature version deprecation as
> explained here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation
> . This is not implemented yet, and the work item is tracked in KAFKA-10622.
>
>
> Cheers,
> Kowshik
>
>
>
> On Fri, Oct 15, 2021 at 11:38 AM David Arthur <mum...@gmail.com> wrote:
>
> > >
> > > How does the active controller know what is a valid `metadata.version`
> > > to persist? Could the active controller learn this from the
> > > ApiVersions response from all of the inactive controllers?
> >
> >
> > The active controller should probably validate whatever value is read
> from
> > meta.properties against its own range of supported versions (statically
> > defined in code). If the operator sets a version unsupported by the
> active
> > controller, that sounds like a configuration error and we should
> shutdown.
> > I'm not sure what other validation we could do here without introducing
> > ordering dependencies (e.g., must have quorum before initializing the
> > version)
> >
> > For example, let's say that we have a cluster that only has remote
> > > controllers, what are the valid metadata.version in that case?
> >
> >
> > I believe it would be the intersection of supported versions across all
> > brokers and controllers. This does raise a concern with upgrading the
> > metadata.version in general. Currently, the active controller only
> > validates the target version based on the brokers' support versions. We
> > will need to include controllers supported versions here as well (using
> > ApiVersions, probably).
> >
> > On Fri, Oct 15, 2021 at 1:44 PM José Armando García Sancio
> > <jsan...@confluent.io.invalid> wrote:
> >
> > > On Fri, Oct 15, 2021 at 7:24 AM David Arthur <mum...@gmail.com> wrote:
> > > > Hmm. So I think you are proposing the following flow:
> > > > > 1. Cluster metadata partition replicas establish a quorum using
> > > > > ApiVersions and the KRaft protocol.
> > > > > 2. Inactive controllers send a registration RPC to the active
> > > controller.
> > > > > 3. The active controller persists this information to the metadata
> > log.
> > > >
> > > >
> > > > What happens if the inactive controllers send a metadata.version
> range
> > > > > that is not compatible with the metadata.version set for the
> cluster?
> > > >
> > > >
> > > > As we discussed offline, we don't need the explicit registration
> step.
> > > Once
> > > > a controller has joined the quorum, it will learn about the finalized
> > > > "metadata.version" level once it reads that record.
> > >
> > > How does the active controller know what is a valid `metadata.version`
> > > to persist? Could the active controller learn this from the
> > > ApiVersions response from all of the inactive controllers? For
> > > example, let's say that we have a cluster that only has remote
> > > controllers, what are the valid metadata.version in that case?
> > >
> > > > If it encounters a
> > > > version it can't support it should probably shutdown since it might
> not
> > > be
> > > > able to process any more records.
> > >
> > > I think that makes sense. If a controller cannot replay the metadata
> > > log, it might as well not be part of the quorum. If the cluster
> > > continues in this state it won't guarantee availability based on the
> > > replication factor.
> > >
> > > Thanks
> > > --
> > > -Jose
> > >
> >
> >
> > --
> > David Arthur
> >
>


-- 
David Arthur

Reply via email to