Hey folks, I just updated the KIP with details on proposed changes to the kafka-features.sh tool. It includes four proposed sub-commands which will provide the Basic and Advanced functions detailed in KIP-584. Please have a look, thanks! https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades#KIP778:KRaftUpgrades-KIP-584Addendum
Aside from this change, if there isn't any more feedback on the KIP I'd like to start a vote soon. Cheers, David On Thu, Oct 21, 2021 at 3:09 AM Kowshik Prakasam <kpraka...@confluent.io.invalid> wrote: > Hi David, > > Thanks for the explanations. Few comments below. > > 7001. Sounds good. > > 7002. Sounds good. The --force-downgrade-all option can be used for the > basic CLI while the --force-downgrade option can be used for the advanced > CLI. > > 7003. I like your suggestion on separate sub-commands, I agree it's more > convenient to use. > > 7004/7005. Your explanation sounds good to me. Regarding the min finalized > version level, this becomes useful for feature version deprecation as > explained here: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation > . This is not implemented yet, and the work item is tracked in KAFKA-10622. > > > Cheers, > Kowshik > > > > On Fri, Oct 15, 2021 at 11:38 AM David Arthur <mum...@gmail.com> wrote: > > > > > > > How does the active controller know what is a valid `metadata.version` > > > to persist? Could the active controller learn this from the > > > ApiVersions response from all of the inactive controllers? > > > > > > The active controller should probably validate whatever value is read > from > > meta.properties against its own range of supported versions (statically > > defined in code). If the operator sets a version unsupported by the > active > > controller, that sounds like a configuration error and we should > shutdown. > > I'm not sure what other validation we could do here without introducing > > ordering dependencies (e.g., must have quorum before initializing the > > version) > > > > For example, let's say that we have a cluster that only has remote > > > controllers, what are the valid metadata.version in that case? > > > > > > I believe it would be the intersection of supported versions across all > > brokers and controllers. This does raise a concern with upgrading the > > metadata.version in general. Currently, the active controller only > > validates the target version based on the brokers' support versions. We > > will need to include controllers supported versions here as well (using > > ApiVersions, probably). > > > > On Fri, Oct 15, 2021 at 1:44 PM José Armando García Sancio > > <jsan...@confluent.io.invalid> wrote: > > > > > On Fri, Oct 15, 2021 at 7:24 AM David Arthur <mum...@gmail.com> wrote: > > > > Hmm. So I think you are proposing the following flow: > > > > > 1. Cluster metadata partition replicas establish a quorum using > > > > > ApiVersions and the KRaft protocol. > > > > > 2. Inactive controllers send a registration RPC to the active > > > controller. > > > > > 3. The active controller persists this information to the metadata > > log. > > > > > > > > > > > > What happens if the inactive controllers send a metadata.version > range > > > > > that is not compatible with the metadata.version set for the > cluster? > > > > > > > > > > > > As we discussed offline, we don't need the explicit registration > step. > > > Once > > > > a controller has joined the quorum, it will learn about the finalized > > > > "metadata.version" level once it reads that record. > > > > > > How does the active controller know what is a valid `metadata.version` > > > to persist? Could the active controller learn this from the > > > ApiVersions response from all of the inactive controllers? For > > > example, let's say that we have a cluster that only has remote > > > controllers, what are the valid metadata.version in that case? > > > > > > > If it encounters a > > > > version it can't support it should probably shutdown since it might > not > > > be > > > > able to process any more records. > > > > > > I think that makes sense. If a controller cannot replay the metadata > > > log, it might as well not be part of the quorum. If the cluster > > > continues in this state it won't guarantee availability based on the > > > replication factor. > > > > > > Thanks > > > -- > > > -Jose > > > > > > > > > -- > > David Arthur > > > -- David Arthur