Jun, I updated the KIP with the "disable" CLI.

For 16, I think you're asking how we can safely introduce the
initial version of other feature flags in the future. This will probably
depend on the nature of the feature that the new flag is gating. We can
probably take a similar approach and say version 1 is backwards compatible
to some point (possibly an IBP or metadata.version?).

-David

On Fri, Nov 19, 2021 at 3:00 PM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, David,
>
> Thanks for the reply. The new CLI sounds reasonable to me.
>
> 16.
> For case C, choosing the latest version sounds good to me.
> For case B, for metadata.version, we pick version 1 since it just happens
> that for metadata.version version 1 is backward compatible. How do we make
> this more general for other features?
>
> 21. Do you still plan to change "delete" to "disable" in the CLI?
>
> Thanks,
>
> Jun
>
>
>
> On Thu, Nov 18, 2021 at 11:50 AM David Arthur
> <david.art...@confluent.io.invalid> wrote:
>
> > Jun,
> >
> > The KIP has some changes to the CLI for KIP-584. With Jason's suggestion
> > incorporated, these three commands would look like:
> >
> > 1) kafka-features.sh upgrade --release latest
> > upgrades all known features to their defaults in the latest release
> >
> > 2) kafka-features.sh downgrade --release 3.x
> > downgrade all known features to the default versions from 3.x
> >
> > 3) kafka-features.sh describe --release latest
> > Describe the known features of the latest release
> >
> > The --upgrade-all and --downgrade-all are replaced by specific each
> > feature+version or giving the --release argument. One problem with
> > --downgrade-all is it's not clear what the target versions should be: the
> > previous version before the last upgrade -- or the lowest supported
> > version. Since downgrades will be less common, I think it's better to
> make
> > the operator be more explicit about the desired downgrade version (either
> > through [--feature X --version Y] or [--release 3.1]). Does that seem
> > reasonable?
> >
> > 16. For case C, I think we will want to always use the latest
> > metadata.version for new clusters (like we do for IBP). We can always
> > change what we mean by "default" down the road. Also, this will likely
> > become a bit of information we include in release and upgrade notes with
> > each release.
> >
> > -David
> >
> > On Thu, Nov 18, 2021 at 1:14 PM Jun Rao <j...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Jason, David,
> > >
> > > Just to clarify on the interaction with the end user, the design in
> > KIP-584
> > > allows one to do the following.
> > > (1) kafka-features.sh  --upgrade-all --bootstrap-server
> > > kafka-broker0.prn1:9071 to upgrade all features to the latest version
> > known
> > > by the tool. The user doesn't need to know a specific feature version.
> > > (2) kafka-features.sh  --downgrade-all --bootstrap-server
> > > kafka-broker0.prn1:9071 to downgrade all features to the latest version
> > > known by the tool. The user doesn't need to know a specific feature
> > > version.
> > > (3) kafka-features.sh  --describe --bootstrap-server
> > > kafka-broker0.prn1:9071 to find out the supported version for each
> > feature.
> > > This allows the user to upgrade each feature individually.
> > >
> > > Most users will be doing (1) and occasionally (2), and won't need to
> know
> > > the exact version of each feature.
> > >
> > > 16. For case C, what's the default version? Is it always the latest?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Nov 18, 2021 at 8:15 AM David Arthur
> > > <david.art...@confluent.io.invalid> wrote:
> > >
> > > > Colin, thanks for the detailed response. I understand what you're
> > saying
> > > > and I agree with your rationale.
> > > >
> > > > It seems like we could just initialize their cluster.metadata to 1
> when
> > > the
> > > > > software is upgraded to 3.2.
> > > > >
> > > >
> > > > Concretely, this means the controller would see that there is no
> > > > FeatureLevelRecord in the log, and so it would bootstrap one.
> Normally
> > > for
> > > > cluster initialization, this would be read from meta.properties, but
> in
> > > the
> > > > case of preview clusters we would need to special case it to
> initialize
> > > the
> > > > version to 1.
> > > >
> > > > Once the new FeatureLevelRecord has been committed, any nodes
> (brokers
> > or
> > > > controllers) that are running the latest software will react to the
> new
> > > > metadata.version. This means we will need to make this initial
> version
> > > of 1
> > > > be backwards compatible to what we have in 3.0 and 3.1 (since some
> > > brokers
> > > > will be on the new software and some on older/preview versions)
> > > >
> > > > I agree that auto-upgrading preview users from IBP 3.0 to
> > > metadata.version
> > > > 1 (equivalent to IBP 3.2) is probably fine.
> > > >
> > > > Back to Jun's case B:
> > > >
> > > > b. We upgrade from an old version where no metadata.version has been
> > > > > finalized. In this case, it makes sense to leave metadata.version
> > > > disabled
> > > > > since we don't know if all brokers have been upgraded.
> > > >
> > > >
> > > > Instead of leaving it uninitialized, we would initialize it to 1
> which
> > > > would be backwards compatible to 3.0 and 3.1. This would eliminate
> the
> > > need
> > > > to check a "final IBP" or deal with 3.2+ clusters without a
> > > > metadata.version set. The downgrade path for 3.2 back to a preview
> > > release
> > > > should be fine since we are saying that metadata.version 1 is
> > compatible
> > > > with those releases. The FeatureLevelRecord would exist, but it would
> > be
> > > > ignored (we might need to make sure this actually works in 3.0).
> > > >
> > > > For clarity, here are the three workflows of Jun's three cases:
> > > >
> > > > A (KRaft 3.2+ to KRaft 3.2+):
> > > > * User initializes cluster with an explicit metadata.version X, or
> the
> > > tool
> > > > selects the default
> > > > * After upgrade, cluster stays at version X until operator upgrades
> to
> > Y
> > > >
> > > > B (KRaft Preview to KRaft 3.2+):
> > > > * User initializes cluster without metadata.version
> > > > * After controller leader is upgraded, initializes metadata.version
> to
> > 1
> > > > * After the cluster is upgraded, an operator may further upgrade to
> > > version
> > > > Y
> > > >
> > > > C (New KRaft 3.2+):
> > > > * User initializes cluster with an explicit metadata.version X, or
> the
> > > tool
> > > > selects the default
> > > >
> > > >
> > > > This has been mentioned in this thread, but we should explicitly call
> > out
> > > > that the absence of metadata.version in meta.properties will be used
> > > > to identify that we are in case B. After 3.2, we will require that
> > > > metadata.version is present in meta.properties for new clusters. If
> > users
> > > > omit the --metadata-version flag from kafka-storage.sh, we should
> fill
> > > in a
> > > > default.
> > > >
> > > > So how about the following changes/clarifications to the KIP:
> > > >
> > > > * Starting with 3.2, metadata.version is required in KRaft, IBP is
> > > ignored
> > > > * If meta.properties does not include metadata.version, it indicates
> > this
> > > > cluster was initialized with a preview release
> > > > * If upgrading from a preview release to 3.2+, the controller will
> > > > initialize metadata.version=1
> > > > * metadata.version=1 is backwards compatible with preview releases
> > > >
> > > > -David
> > > >
> > > >
> > > > On Thu, Nov 18, 2021 at 10:02 AM David Arthur <
> > david.art...@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > Jason,
> > > > >
> > > > > 1/2. You've got it right. The intention is that metadata.version
> will
> > > > gate
> > > > > both RPCs (like IBP does) as well as metadata records. So, when a
> > > broker
> > > > > sees that metadata.version changed, it may start advertising new
> RPCs
> > > and
> > > > > it will need to refresh the ApiVersions it has for other brokers. I
> > can
> > > > try
> > > > > to make this more explicit in the KIP.
> > > > >
> > > > > 3. Yes, that's the intention of the --dry-run flag. The current
> > > > > implementation in kafka-features.sh does a dry run on the client
> > side,
> > > > but
> > > > > this KIP pushes the validation down to the controller. This will
> > allow
> > > us
> > > > > to have the context needed to do proper validation
> > > > >
> > > > > Re: version number complexity -- yes this has come up in offline
> > > > > discussions. With just one feature flag to manage, it's not so bad,
> > but
> > > > > explicitly managing even a few would be a burden. I like your
> > > suggestion
> > > > of
> > > > > the "--release" flag. That could act as a sort of manifest of
> > versions
> > > > > (i.e., the defaults from that release). We could also support
> > something
> > > > > like "kafka-features upgrade --release latest" to bring everything
> to
> > > the
> > > > > highest supported version.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 17, 2021 at 9:36 PM Jason Gustafson
> > > > <ja...@confluent.io.invalid>
> > > > > wrote:
> > > > >
> > > > >> Hi Colin/David,
> > > > >>
> > > > >> > Like David said, basically it boils down to creating a feature
> > flag
> > > > for
> > > > >> the new proposed __consumer_offsets version, or using a new
> > > > >> IBP/metadata.version for it. Both approaches have pros and cons.
> > Using
> > > > an
> > > > >> IBP/metadata.version bump reduces the size of the testing matrix.
> > But
> > > > >> using
> > > > >> a feature flag allows people to avoid any bugs or pain associated
> > with
> > > > the
> > > > >> change if they don't care about enabling it. This is basically the
> > > > classic
> > > > >> "should I use a feature flag or not?" discussion and we need to
> have
> > > it
> > > > on
> > > > >> a case-by-case basis.
> > > > >>
> > > > >> I think most users are not going to care to manage versions for a
> > > bunch
> > > > of
> > > > >> different features. The IBP today has many shortcomings, but at
> > least
> > > > it's
> > > > >> tied to a version that users understand (i.e. the release
> version).
> > > How
> > > > >> would users know after upgrading to Kafka 3.1, for example, that
> > they
> > > > need
> > > > >> to upgrade the metadata.version to 3  and offsets.version to 4 (or
> > > > >> whatever)? It's a lot of overhead trying to understand all of the
> > > > >> potential
> > > > >> features and what each upgrade actually means to them. I am
> > wondering
> > > if
> > > > >> we
> > > > >> could give them something more convenient which is tied to the
> > release
> > > > >> version. For example, maybe we could use a command like
> > > `kafka-features
> > > > >> upgrade --release 3.1`, which the broker would then translate to
> an
> > > > >> upgrade
> > > > >> to the latest versions of the individual features at the time of
> the
> > > 3.1
> > > > >> release. Basically it's just a static map from release version to
> > > > feature
> > > > >> versions to make the upgrade process more convenient.
> > > > >>
> > > > >> Thanks,
> > > > >> Jason
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Nov 17, 2021 at 6:20 PM Jason Gustafson <
> ja...@confluent.io
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > A few additional questions:
> > > > >> >
> > > > >> > 1. Currently the IBP tells us what version of individual
> > > inter-broker
> > > > >> RPCs
> > > > >> > will be used. I think the plan in this KIP is to use ApiVersions
> > > > request
> > > > >> > instead to find the highest compatible version (just like
> > clients).
> > > > Do I
> > > > >> > have that right?
> > > > >> >
> > > > >> > 2. The following wasn't very clear to me:
> > > > >> >
> > > > >> > > Brokers will be able to observe changes to metadata.version by
> > > > >> observing
> > > > >> > the metadata log, and could then submit a new ApiVersionsRequest
> > to
> > > > the
> > > > >> > other Kafka nodes.
> > > > >> >
> > > > >> > Is the purpose of submitting new ApiVersions requests to update
> > the
> > > > >> > features or the RPC versions? Does metadata.version also
> influence
> > > the
> > > > >> > versions that a broker advertises? It would help to have more
> > detail
> > > > >> about
> > > > >> > this.
> > > > >> >
> > > > >> > 3. I imagine users will want to know before performing an
> upgrade
> > > > >> whether
> > > > >> > downgrading will be safe. Would the --dry-run flag tell them
> this?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Jason
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Nov 17, 2021 at 3:55 PM Colin McCabe <
> cmcc...@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> >> On Wed, Nov 17, 2021, at 11:28, Jason Gustafson wrote:
> > > > >> >> > Hi David,
> > > > >> >> >
> > > > >> >> > Forgive me if this ground has been covered already. Today, we
> > > have
> > > > a
> > > > >> few
> > > > >> >> > other things that we have latched onto the IBP, such as
> > upgrades
> > > to
> > > > >> the
> > > > >> >> > format of records in __consumer_offsets. I've been assuming
> > that
> > > > >> >> > metadata.version is not covering this. Is that right or is
> > there
> > > > some
> > > > >> >> other
> > > > >> >> > plan to take care of cases like this?
> > > > >> >> >
> > > > >> >>
> > > > >> >> I think metadata.version could cover changes to things like
> > > > >> >> __consumer_offsets, if people want it to. Or to put it another
> > way,
> > > > >> that is
> > > > >> >> out of scope for this KIP.
> > > > >> >>
> > > > >> >> Like David said, basically it boils down to creating a feature
> > flag
> > > > for
> > > > >> >> the new proposed __consumer_offsets version, or using a new
> > > > >> >> IBP/metadata.version for it. Both approaches have pros and
> cons.
> > > > Using
> > > > >> an
> > > > >> >> IBP/metadata.version bump reduces the size of the testing
> matrix.
> > > But
> > > > >> using
> > > > >> >> a feature flag allows people to avoid any bugs or pain
> associated
> > > > with
> > > > >> the
> > > > >> >> change if they don't care about enabling it. This is basically
> > the
> > > > >> classic
> > > > >> >> "should I use a feature flag or not?" discussion and we need to
> > > have
> > > > >> it on
> > > > >> >> a case-by-case basis.
> > > > >> >>
> > > > >> >> I think it's worth calling out that having a 1:1 mapping
> between
> > > IBP
> > > > >> >> versions and metadata.versions will result in some
> > > metadata.versions
> > > > >> that
> > > > >> >> "don't do anything" (aka they do the same thing as the previous
> > > > >> >> metadata.version). For example, if we change StopReplicaRequest
> > > > again,
> > > > >> that
> > > > >> >> will not affect KRaft mode, but probably would require an IBP
> > bump
> > > > and
> > > > >> >> hence a metadata.version bump. I think that's OK. It's not that
> > > > >> different
> > > > >> >> from updating your IBP and getting support for ZStandard, when
> > your
> > > > >> >> deployment doesn't use ZStandard compression.
> > > > >> >>
> > > > >> >> best,
> > > > >> >> Colin
> > > > >> >>
> > > > >> >> > Thanks,
> > > > >> >> > Jason
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Wed, Nov 17, 2021 at 10:17 AM Jun Rao
> > > <j...@confluent.io.invalid
> > > > >
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >> Hi, Colin,
> > > > >> >> >>
> > > > >> >> >> Thanks for the reply.
> > > > >> >> >>
> > > > >> >> >> For case b, I am not sure that I understand your suggestion.
> > > Does
> > > > >> "each
> > > > >> >> >> subsequent level for metadata.version corresponds to an IBP
> > > > version"
> > > > >> >> mean
> > > > >> >> >> that we need to keep IBP forever? Could you describe the
> > upgrade
> > > > >> >> process in
> > > > >> >> >> this case?
> > > > >> >> >>
> > > > >> >> >> Jun
> > > > >> >> >>
> > > > >> >> >> On Tue, Nov 16, 2021 at 3:45 PM Colin McCabe <
> > > cmcc...@apache.org>
> > > > >> >> wrote:
> > > > >> >> >>
> > > > >> >> >> > On Tue, Nov 16, 2021, at 15:13, Jun Rao wrote:
> > > > >> >> >> > > Hi, David, Colin,
> > > > >> >> >> > >
> > > > >> >> >> > > Thanks for the reply.
> > > > >> >> >> > >
> > > > >> >> >> > > 16. Discussed with David offline a bit. We have 3 cases.
> > > > >> >> >> > > a. We upgrade from an old version where the
> > metadata.version
> > > > has
> > > > >> >> >> already
> > > > >> >> >> > > been finalized. In this case it makes sense to stay with
> > > that
> > > > >> >> feature
> > > > >> >> >> > > version after the upgrade.
> > > > >> >> >> >
> > > > >> >> >> > +1
> > > > >> >> >> >
> > > > >> >> >> > > b. We upgrade from an old version where no
> > metadata.version
> > > > has
> > > > >> >> been
> > > > >> >> >> > > finalized. In this case, it makes sense to leave
> > > > >> metadata.version
> > > > >> >> >> > disabled
> > > > >> >> >> > > since we don't know if all brokers have been upgraded.
> > > > >> >> >> >
> > > > >> >> >> > This is the scenario I was hoping to avoid by saying that
> > ALL
> > > > >> KRaft
> > > > >> >> >> > clusters have metadata.version of at least 1, and each
> > > > subsequent
> > > > >> >> level
> > > > >> >> >> for
> > > > >> >> >> > metadata.version corresponds to an IBP version. The
> existing
> > > > KRaft
> > > > >> >> >> clusters
> > > > >> >> >> > in 3.0 and earlier are preview (not for production) so I
> > think
> > > > >> this
> > > > >> >> >> change
> > > > >> >> >> > is OK for 3.x (given that it affects only KRaft). Then IBP
> > is
> > > > >> >> irrelevant
> > > > >> >> >> > for KRaft clusters (the config is ignored, possibly with a
> > > WARN
> > > > or
> > > > >> >> ERROR
> > > > >> >> >> > message generated if it is set).
> > > > >> >> >> >
> > > > >> >> >> > > c. We are starting from a brand new cluster and of
> course
> > no
> > > > >> >> >> > > metadata.version has been finalized. In this case, the
> KIP
> > > > says
> > > > >> it
> > > > >> >> will
> > > > >> >> >> > > pick the metadata.version in meta.properties. In the
> > common
> > > > >> case,
> > > > >> >> >> people
> > > > >> >> >> > > probably won't set the metadata.version in the
> > > meta.properties
> > > > >> file
> > > > >> >> >> > > explicitly. So, it will be useful to put a default
> > (stable)
> > > > >> version
> > > > >> >> >> there
> > > > >> >> >> > > when the meta.properties.
> > > > >> >> >> >
> > > > >> >> >> > Hmm. I was assuming that clusters where the admin didn't
> > > specify
> > > > >> any
> > > > >> >> >> > metadata.version during formatting would get the latest
> > > > >> >> metadata.version.
> > > > >> >> >> > Partly, because this is what we do for IBP today. It would
> > be
> > > > >> good to
> > > > >> >> >> > clarify this...
> > > > >> >> >> >
> > > > >> >> >> > >
> > > > >> >> >> > > Also, it would be useful to clarify that if a
> > > > FeatureLevelRecord
> > > > >> >> exists
> > > > >> >> >> > for
> > > > >> >> >> > > metadata.version, the metadata.version in
> meta.properties
> > > will
> > > > >> be
> > > > >> >> >> > ignored.
> > > > >> >> >> > >
> > > > >> >> >> >
> > > > >> >> >> > Yeah, I agree.
> > > > >> >> >> >
> > > > >> >> >> > best,
> > > > >> >> >> > Colin
> > > > >> >> >> >
> > > > >> >> >> > > Thanks,
> > > > >> >> >> > >
> > > > >> >> >> > > Jun
> > > > >> >> >> > >
> > > > >> >> >> > >
> > > > >> >> >> > > On Tue, Nov 16, 2021 at 12:39 PM Colin McCabe <
> > > > >> cmcc...@apache.org>
> > > > >> >> >> > wrote:
> > > > >> >> >> > >
> > > > >> >> >> > >> On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> > > > >> >> >> > >> > Hi, David,
> > > > >> >> >> > >> >
> > > > >> >> >> > >> > Thanks for the reply.
> > > > >> >> >> > >> >
> > > > >> >> >> > >> > 16. My first concern is that the KIP picks up
> > > meta.version
> > > > >> >> >> > inconsistently
> > > > >> >> >> > >> > during the deployment. If a new cluster is started,
> we
> > > pick
> > > > >> up
> > > > >> >> the
> > > > >> >> >> > >> highest
> > > > >> >> >> > >> > version. If we upgrade, we leave the feature version
> > > > >> unchanged.
> > > > >> >> >> > >>
> > > > >> >> >> > >> Hi Jun,
> > > > >> >> >> > >>
> > > > >> >> >> > >> Thanks again for taking a look.
> > > > >> >> >> > >>
> > > > >> >> >> > >> The proposed behavior in KIP-778 is consistent with how
> > it
> > > > >> works
> > > > >> >> >> today.
> > > > >> >> >> > >> Upgrading the software is distinct from upgrading the
> > IBP.
> > > > >> >> >> > >>
> > > > >> >> >> > >> I think it is important to keep these two operations
> > > > >> ("upgrading
> > > > >> >> >> > >> IBP/metadata version" and "upgrading software version")
> > > > >> separate.
> > > > >> >> If
> > > > >> >> >> > they
> > > > >> >> >> > >> are coupled it will create a situation where software
> > > > upgrades
> > > > >> are
> > > > >> >> >> > >> difficult and dangerous.
> > > > >> >> >> > >>
> > > > >> >> >> > >> Consider a situation where you find some bug in your
> > > current
> > > > >> >> software,
> > > > >> >> >> > and
> > > > >> >> >> > >> you want to upgrade to new software that fixes the bug.
> > If
> > > > >> >> upgrades
> > > > >> >> >> and
> > > > >> >> >> > IBP
> > > > >> >> >> > >> bumps are coupled, you can't do this without also
> bumping
> > > the
> > > > >> IBP,
> > > > >> >> >> > which is
> > > > >> >> >> > >> usually considered a high-risk change. That means that
> > > either
> > > > >> you
> > > > >> >> have
> > > > >> >> >> > to
> > > > >> >> >> > >> make a special build that includes only the fix
> > > > (time-consuming
> > > > >> >> and
> > > > >> >> >> > >> error-prone), live with the bug for longer, or be very
> > > > >> >> conservative
> > > > >> >> >> > about
> > > > >> >> >> > >> ever introducing new IBP/metadata versions. None of
> those
> > > are
> > > > >> >> really
> > > > >> >> >> > good
> > > > >> >> >> > >> choices.
> > > > >> >> >> > >>
> > > > >> >> >> > >> > Intuitively, it seems that independent of how a
> cluster
> > > is
> > > > >> >> deployed,
> > > > >> >> >> > we
> > > > >> >> >> > >> > should always pick the same feature version.
> > > > >> >> >> > >>
> > > > >> >> >> > >> I think it makes sense to draw a distinction between
> > > > upgrading
> > > > >> an
> > > > >> >> >> > existing
> > > > >> >> >> > >> cluster and deploying a new one. What most people want
> > out
> > > of
> > > > >> >> upgrades
> > > > >> >> >> > is
> > > > >> >> >> > >> that things should keep working, but with bug fixes. If
> > we
> > > > >> change
> > > > >> >> >> that,
> > > > >> >> >> > it
> > > > >> >> >> > >> just makes people more reluctant to upgrade (which is
> > > always
> > > > a
> > > > >> >> >> > problem...)
> > > > >> >> >> > >>
> > > > >> >> >> > >> > I think we need to think this through in this KIP. My
> > > > second
> > > > >> >> concern
> > > > >> >> >> > is
> > > > >> >> >> > >> > that as a particular version matures, it's
> inconvenient
> > > > for a
> > > > >> >> user
> > > > >> >> >> to
> > > > >> >> >> > >> manually
> > > > >> >> >> > >> > upgrade every feature version. As long as we have a
> > path
> > > to
> > > > >> >> achieve
> > > > >> >> >> > that
> > > > >> >> >> > >> in
> > > > >> >> >> > >> > the future, we don't need to address that in this
> KIP.
> > > > >> >> >> > >>
> > > > >> >> >> > >> If people are managing a large number of Kafka
> clusters,
> > > they
> > > > >> will
> > > > >> >> >> want
> > > > >> >> >> > to
> > > > >> >> >> > >> do some sort of A/B testing with IBP/metadata versions.
> > So
> > > if
> > > > >> you
> > > > >> >> have
> > > > >> >> >> > 1000
> > > > >> >> >> > >> Kafka clusters, you roll out the new IBP version to 10
> of
> > > > them
> > > > >> >> and see
> > > > >> >> >> > how
> > > > >> >> >> > >> it goes. If that goes well, you roll it out to more,
> etc.
> > > > >> >> >> > >>
> > > > >> >> >> > >> So, the automation needs to be at the cluster
> management
> > > > layer,
> > > > >> >> not at
> > > > >> >> >> > the
> > > > >> >> >> > >> Kafka layer. Each Kafka cluster doesn't know how well
> > > things
> > > > >> went
> > > > >> >> in
> > > > >> >> >> the
> > > > >> >> >> > >> other 999 clusters. Automatically upgrading is a bad
> > thing
> > > > for
> > > > >> the
> > > > >> >> >> same
> > > > >> >> >> > >> reason Kafka automatically upgrading its own software
> > > version
> > > > >> >> would
> > > > >> >> >> be a
> > > > >> >> >> > >> bad thing -- it could lead to a disruption to a
> sensitive
> > > > >> cluster
> > > > >> >> at
> > > > >> >> >> the
> > > > >> >> >> > >> wrong time.
> > > > >> >> >> > >>
> > > > >> >> >> > >> For people who are just managing one or two Kafka
> > clusters,
> > > > >> >> >> > automatically
> > > > >> >> >> > >> upgrading feature versions isn't a big burden and can
> be
> > > done
> > > > >> >> >> manually.
> > > > >> >> >> > >> This is all consistent with how IBP works today.
> > > > >> >> >> > >>
> > > > >> >> >> > >> Also, we already have a command-line option to the
> > feature
> > > > tool
> > > > >> >> which
> > > > >> >> >> > >> upgrades all features to the latest available, if that
> is
> > > > what
> > > > >> the
> > > > >> >> >> > >> administrator desires. Perhaps we could add
> documentation
> > > > >> saying
> > > > >> >> that
> > > > >> >> >> > this
> > > > >> >> >> > >> should be done as the last step of the upgrade.
> > > > >> >> >> > >>
> > > > >> >> >> > >> best,
> > > > >> >> >> > >> Colin
> > > > >> >> >> > >>
> > > > >> >> >> > >> >
> > > > >> >> >> > >> > 21. "./kafka-features.sh delete": Deleting a feature
> > > seems
> > > > a
> > > > >> bit
> > > > >> >> >> weird
> > > > >> >> >> > >> > since the logic is always there. Would it be better
> to
> > > use
> > > > >> >> disable?
> > > > >> >> >> > >> >
> > > > >> >> >> > >> > Jun
> > > > >> >> >> > >> >
> > > > >> >> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> > > > >> >> >> > >> > <david.art...@confluent.io.invalid> wrote:
> > > > >> >> >> > >> >
> > > > >> >> >> > >> >> Colin and Jun, thanks for the additional comments!
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> Colin:
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > We've been talking about having an automated RPC
> > > > >> >> compatibility
> > > > >> >> >> > checker
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> Do we have a way to mark fields in schemas as
> > > deprecated?
> > > > It
> > > > >> >> can
> > > > >> >> >> > stay in
> > > > >> >> >> > >> >> the RPC, it just complicates the logic a bit.
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > It would be nice if the active controller could
> > > validate
> > > > >> >> that a
> > > > >> >> >> > >> majority
> > > > >> >> >> > >> >> of the quorum could use the proposed
> metadata.version.
> > > The
> > > > >> >> active
> > > > >> >> >> > >> >> controller should have this information, right? If
> we
> > > > don't
> > > > >> >> have
> > > > >> >> >> > recent
> > > > >> >> >> > >> >> information  from a quorum of voters, we wouldn't be
> > > > active.
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> I believe we should have this information from the
> > > > >> >> >> > ApiVersionsResponse.
> > > > >> >> >> > >> It
> > > > >> >> >> > >> >> would be good to do this validation to avoid a
> > situation
> > > > >> where
> > > > >> >> a
> > > > >> >> >> > >> >> quorum leader can't be elected due to unprocessable
> > > > records.
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > Do we need delete as a command separate from
> > > downgrade?
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> I think from an operator's perspective, it is nice
> to
> > > > >> >> distinguish
> > > > >> >> >> > >> between
> > > > >> >> >> > >> >> changing a feature flag and unsetting it. It might
> be
> > > > >> >> surprising to
> > > > >> >> >> > an
> > > > >> >> >> > >> >> operator to see the flag's version set to nothing
> when
> > > > they
> > > > >> >> >> requested
> > > > >> >> >> > >> the
> > > > >> >> >> > >> >> downgrade to version 0 (or less).
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > it seems like we should spell out that
> > > metadata.version
> > > > >> >> begins at
> > > > >> >> >> > 1 in
> > > > >> >> >> > >> >> KRaft clusters
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> I added this text:
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> Introduce an IBP version to indicate the lowest
> > software
> > > > >> >> version
> > > > >> >> >> that
> > > > >> >> >> > >> >> > supports *metadata.version*. Below this IBP, the
> > > > >> >> >> > *metadata.version* is
> > > > >> >> >> > >> >> > undefined and will not be examined. At or above
> this
> > > > IBP,
> > > > >> the
> > > > >> >> >> > >> >> > *metadata.version* must be *0* for ZooKeeper
> > clusters
> > > > and
> > > > >> >> will be
> > > > >> >> >> > >> >> > initialized as *1* for KRaft clusters.
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > We probably also want an RPC implemented by both
> > > brokers
> > > > >> and
> > > > >> >> >> > >> controllers
> > > > >> >> >> > >> >> that will reveal the min and max supported versions
> > for
> > > > each
> > > > >> >> >> feature
> > > > >> >> >> > >> level
> > > > >> >> >> > >> >> supported by the server
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> This is available in ApiVersionsResponse (we include
> > the
> > > > >> >> server's
> > > > >> >> >> > >> supported
> > > > >> >> >> > >> >> features as well as the cluster's finalized
> features)
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> --------
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> Jun:
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> 12. I've updated the KIP with AdminClient changes
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> 14. You're right, it looks like I missed a few
> > sections
> > > > >> >> regarding
> > > > >> >> >> > >> snapshot
> > > > >> >> >> > >> >> generation. I've corrected it
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> 16. This feels more like an enhancement to KIP-584.
> I
> > > > agree
> > > > >> it
> > > > >> >> >> could
> > > > >> >> >> > be
> > > > >> >> >> > >> >> useful, but perhaps we could address it separately
> > from
> > > > >> KRaft
> > > > >> >> >> > upgrades?
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> 20. Indeed snapshots are not strictly necessary
> during
> > > an
> > > > >> >> upgrade,
> > > > >> >> >> > I've
> > > > >> >> >> > >> >> reworded this
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> Thanks!
> > > > >> >> >> > >> >> David
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> > > > >> >> <j...@confluent.io.invalid>
> > > > >> >> >> > >> wrote:
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> > Hi, David, Jose and Colin,
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > Thanks for the reply. A few more comments.
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > 12. It seems that we haven't updated the
> AdminClient
> > > > >> >> accordingly?
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > 14. "Metadata snapshot is generated and sent to
> the
> > > > other
> > > > >> >> >> inactive
> > > > >> >> >> > >> >> > controllers and to brokers". I thought we wanted
> > each
> > > > >> broker
> > > > >> >> to
> > > > >> >> >> > >> generate
> > > > >> >> >> > >> >> > its own snapshot independently? If only the
> > controller
> > > > >> >> generates
> > > > >> >> >> > the
> > > > >> >> >> > >> >> > snapshot, how do we force other brokers to pick it
> > up?
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > 16. If a feature version is new, one may not want
> to
> > > > >> enable
> > > > >> >> it
> > > > >> >> >> > >> >> immediately
> > > > >> >> >> > >> >> > after the cluster is upgraded. However, if a
> feature
> > > > >> version
> > > > >> >> has
> > > > >> >> >> > been
> > > > >> >> >> > >> >> > stable, requiring every user to run a command to
> > > upgrade
> > > > >> to
> > > > >> >> that
> > > > >> >> >> > >> version
> > > > >> >> >> > >> >> > seems inconvenient. One way to improve this is for
> > > each
> > > > >> >> feature
> > > > >> >> >> to
> > > > >> >> >> > >> define
> > > > >> >> >> > >> >> > one version as the default. Then, when we upgrade
> a
> > > > >> cluster,
> > > > >> >> we
> > > > >> >> >> > will
> > > > >> >> >> > >> >> > automatically upgrade the feature to the default
> > > > version.
> > > > >> An
> > > > >> >> >> admin
> > > > >> >> >> > >> could
> > > > >> >> >> > >> >> > use the tool to upgrade to a version higher than
> the
> > > > >> default.
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > 20. "The quorum controller can assist with this
> > > process
> > > > by
> > > > >> >> >> > generating
> > > > >> >> >> > >> a
> > > > >> >> >> > >> >> > metadata snapshot after a metadata.version
> increase
> > > has
> > > > >> been
> > > > >> >> >> > >> committed to
> > > > >> >> >> > >> >> > the metadata log. This snapshot will be a
> convenient
> > > way
> > > > >> to
> > > > >> >> let
> > > > >> >> >> > broker
> > > > >> >> >> > >> >> and
> > > > >> >> >> > >> >> > controller components rebuild their entire
> in-memory
> > > > state
> > > > >> >> >> > following
> > > > >> >> >> > >> an
> > > > >> >> >> > >> >> > upgrade." The new version of the software could
> read
> > > > both
> > > > >> >> the new
> > > > >> >> >> > and
> > > > >> >> >> > >> the
> > > > >> >> >> > >> >> > old version. Is generating a new snapshot during
> > > upgrade
> > > > >> >> needed?
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > Jun
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <
> > > > >> >> cmcc...@apache.org>
> > > > >> >> >> > >> wrote:
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
> > > > >> >> >> > >> >> > > > Hi, David,
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > > One more comment.
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > > 16. The main reason why KIP-584 requires
> > > finalizing
> > > > a
> > > > >> >> feature
> > > > >> >> >> > >> >> manually
> > > > >> >> >> > >> >> > is
> > > > >> >> >> > >> >> > > > that in the ZK world, the controller doesn't
> > know
> > > > all
> > > > >> >> brokers
> > > > >> >> >> > in a
> > > > >> >> >> > >> >> > > cluster.
> > > > >> >> >> > >> >> > > > A broker temporarily down is not registered in
> > ZK.
> > > > in
> > > > >> the
> > > > >> >> >> KRaft
> > > > >> >> >> > >> >> world,
> > > > >> >> >> > >> >> > > the
> > > > >> >> >> > >> >> > > > controller keeps track of all brokers,
> including
> > > > those
> > > > >> >> that
> > > > >> >> >> are
> > > > >> >> >> > >> >> > > temporarily
> > > > >> >> >> > >> >> > > > down. This makes it possible for the
> controller
> > to
> > > > >> >> >> > automatically
> > > > >> >> >> > >> >> > > finalize a
> > > > >> >> >> > >> >> > > > feature---it's safe to do so when all brokers
> > > > support
> > > > >> >> that
> > > > >> >> >> > >> feature.
> > > > >> >> >> > >> >> > This
> > > > >> >> >> > >> >> > > > will make the upgrade process much simpler
> since
> > > no
> > > > >> >> manual
> > > > >> >> >> > >> command is
> > > > >> >> >> > >> >> > > > required to turn on a new feature. Have we
> > > > considered
> > > > >> >> this?
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > > Thanks,
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > > Jun
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > Hi Jun,
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > I guess David commented on this point already,
> but
> > > > I'll
> > > > >> >> comment
> > > > >> >> >> > as
> > > > >> >> >> > >> >> well.
> > > > >> >> >> > >> >> > I
> > > > >> >> >> > >> >> > > always had the perception that users viewed
> rolls
> > as
> > > > >> >> >> potentially
> > > > >> >> >> > >> risky
> > > > >> >> >> > >> >> > and
> > > > >> >> >> > >> >> > > were looking for ways to reduce the risk. Not
> > > enabling
> > > > >> >> features
> > > > >> >> >> > >> right
> > > > >> >> >> > >> >> > away
> > > > >> >> >> > >> >> > > after installing new software seems like one way
> > to
> > > do
> > > > >> >> that. If
> > > > >> >> >> > we
> > > > >> >> >> > >> had
> > > > >> >> >> > >> >> a
> > > > >> >> >> > >> >> > > feature to automatically upgrade during a roll,
> > I'm
> > > > not
> > > > >> >> sure
> > > > >> >> >> > that I
> > > > >> >> >> > >> >> would
> > > > >> >> >> > >> >> > > recommend that people use it, because if
> something
> > > > >> fails,
> > > > >> >> it
> > > > >> >> >> > makes
> > > > >> >> >> > >> it
> > > > >> >> >> > >> >> > > harder to tell if the new feature is at fault,
> or
> > > > >> something
> > > > >> >> >> else
> > > > >> >> >> > in
> > > > >> >> >> > >> the
> > > > >> >> >> > >> >> > new
> > > > >> >> >> > >> >> > > software.
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > We already tell users to do a "double roll" when
> > > going
> > > > >> to
> > > > >> >> a new
> > > > >> >> >> > IBP.
> > > > >> >> >> > >> >> > (Just
> > > > >> >> >> > >> >> > > to give background to people who haven't heard
> > that
> > > > >> >> phrase, the
> > > > >> >> >> > >> first
> > > > >> >> >> > >> >> > roll
> > > > >> >> >> > >> >> > > installs the new software, and the second roll
> > > updates
> > > > >> the
> > > > >> >> >> IBP).
> > > > >> >> >> > So
> > > > >> >> >> > >> >> this
> > > > >> >> >> > >> >> > > KIP-778 mechanism is basically very similar to
> > that,
> > > > >> >> except the
> > > > >> >> >> > >> second
> > > > >> >> >> > >> >> > > thing isn't a roll, but just an upgrade command.
> > So
> > > I
> > > > >> think
> > > > >> >> >> this
> > > > >> >> >> > is
> > > > >> >> >> > >> >> > > consistent with what we currently do.
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > Also, just like David said, we can always add
> > > > >> auto-upgrade
> > > > >> >> >> later
> > > > >> >> >> > if
> > > > >> >> >> > >> >> there
> > > > >> >> >> > >> >> > > is demand...
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > best,
> > > > >> >> >> > >> >> > > Colin
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <
> > > > >> j...@confluent.io
> > > > >> >> >
> > > > >> >> >> > wrote:
> > > > >> >> >> > >> >> > > >
> > > > >> >> >> > >> >> > > >> Hi, David,
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> Thanks for the KIP. A few comments below.
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 10. It would be useful to describe how the
> > > > controller
> > > > >> >> node
> > > > >> >> >> > >> >> determines
> > > > >> >> >> > >> >> > > the
> > > > >> >> >> > >> >> > > >> RPC version used to communicate to other
> > > controller
> > > > >> >> nodes.
> > > > >> >> >> > There
> > > > >> >> >> > >> >> seems
> > > > >> >> >> > >> >> > > to
> > > > >> >> >> > >> >> > > >> be a bootstrap problem. A controller node
> can't
> > > > read
> > > > >> >> the log
> > > > >> >> >> > and
> > > > >> >> >> > >> >> > > >> therefore the feature level until a quorum
> > leader
> > > > is
> > > > >> >> >> elected.
> > > > >> >> >> > But
> > > > >> >> >> > >> >> > leader
> > > > >> >> >> > >> >> > > >> election requires an RPC.
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 11. For downgrades, it would be useful to
> > > describe
> > > > >> how
> > > > >> >> to
> > > > >> >> >> > >> determine
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >> downgrade process (generating new snapshot,
> > > > >> propagating
> > > > >> >> the
> > > > >> >> >> > >> >> snapshot,
> > > > >> >> >> > >> >> > > etc)
> > > > >> >> >> > >> >> > > >> has completed. We could block the
> UpdateFeature
> > > > >> request
> > > > >> >> >> until
> > > > >> >> >> > the
> > > > >> >> >> > >> >> > > process
> > > > >> >> >> > >> >> > > >> is completed. However, since the process
> could
> > > take
> > > > >> >> time,
> > > > >> >> >> the
> > > > >> >> >> > >> >> request
> > > > >> >> >> > >> >> > > could
> > > > >> >> >> > >> >> > > >> time out. Another way is through
> > DescribeFeature
> > > > and
> > > > >> the
> > > > >> >> >> > server
> > > > >> >> >> > >> only
> > > > >> >> >> > >> >> > > >> reports downgraded versions after the process
> > is
> > > > >> >> completed.
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 12. Since we are changing
> > UpdateFeaturesRequest,
> > > do
> > > > >> we
> > > > >> >> need
> > > > >> >> >> to
> > > > >> >> >> > >> >> change
> > > > >> >> >> > >> >> > > the
> > > > >> >> >> > >> >> > > >> AdminClient api for updateFeatures too?
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 13. For the paragraph starting with "In the
> > > absence
> > > > >> of
> > > > >> >> an
> > > > >> >> >> > >> operator
> > > > >> >> >> > >> >> > > >> defined value for metadata.version", in
> > KIP-584,
> > > we
> > > > >> >> >> described
> > > > >> >> >> > >> how to
> > > > >> >> >> > >> >> > > >> finalize features with New cluster bootstrap.
> > In
> > > > that
> > > > >> >> case,
> > > > >> >> >> > it's
> > > > >> >> >> > >> >> > > >> inconvenient for the users to have to run an
> > > admin
> > > > >> tool
> > > > >> >> to
> > > > >> >> >> > >> finalize
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >> version for each feature. Instead, the system
> > > > detects
> > > > >> >> that
> > > > >> >> >> the
> > > > >> >> >> > >> >> > /features
> > > > >> >> >> > >> >> > > >> path is missing in ZK and thus automatically
> > > > >> finalizes
> > > > >> >> every
> > > > >> >> >> > >> feature
> > > > >> >> >> > >> >> > > with
> > > > >> >> >> > >> >> > > >> the latest supported version. Could we do
> > > something
> > > > >> >> similar
> > > > >> >> >> in
> > > > >> >> >> > >> the
> > > > >> >> >> > >> >> > KRaft
> > > > >> >> >> > >> >> > > >> mode?
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 14. After the quorum leader generates a new
> > > > snapshot,
> > > > >> >> how do
> > > > >> >> >> > we
> > > > >> >> >> > >> >> force
> > > > >> >> >> > >> >> > > >> other nodes to pick up the new snapshot?
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 15. I agree with Jose that it will be useful
> to
> > > > >> describe
> > > > >> >> >> when
> > > > >> >> >> > >> >> > > generating a
> > > > >> >> >> > >> >> > > >> new snapshot is needed. To me, it seems the
> new
> > > > >> >> snapshot is
> > > > >> >> >> > only
> > > > >> >> >> > >> >> > needed
> > > > >> >> >> > >> >> > > >> when incompatible changes are made.
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> 7. Jose, what control records were you
> > referring?
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> Thanks,
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> Jun
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <
> > > > >> >> >> > >> davidart...@apache.org
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >> > > >> wrote:
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > > >>> Jose, thanks for the thorough review and
> > > comments!
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> I am out of the office until next week, so I
> > > > >> probably
> > > > >> >> won't
> > > > >> >> >> > be
> > > > >> >> >> > >> able
> > > > >> >> >> > >> >> > to
> > > > >> >> >> > >> >> > > >>> update the KIP until then. Here are some
> > replies
> > > > to
> > > > >> >> your
> > > > >> >> >> > >> questions:
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 1. Generate snapshot on upgrade
> > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and sent
> to
> > > the
> > > > >> >> other
> > > > >> >> >> > nodes
> > > > >> >> >> > >> >> > > >>> > Why does the Active Controller need to
> > > generate
> > > > a
> > > > >> new
> > > > >> >> >> > snapshot
> > > > >> >> >> > >> >> and
> > > > >> >> >> > >> >> > > >>> > force a snapshot fetch from the replicas
> > > > (inactive
> > > > >> >> >> > controller
> > > > >> >> >> > >> and
> > > > >> >> >> > >> >> > > >>> > brokers) on an upgrade? Isn't writing the
> > > > >> >> >> > FeatureLevelRecord
> > > > >> >> >> > >> good
> > > > >> >> >> > >> >> > > >>> > enough to communicate the upgrade to the
> > > > replicas?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> You're right, we don't necessarily need to
> > > > >> _transmit_ a
> > > > >> >> >> > >> snapshot,
> > > > >> >> >> > >> >> > since
> > > > >> >> >> > >> >> > > >>> each node can generate its own equivalent
> > > snapshot
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 2. Generate snapshot on downgrade
> > > > >> >> >> > >> >> > > >>> > > Metadata snapshot is generated and sent
> to
> > > the
> > > > >> >> other
> > > > >> >> >> > >> inactive
> > > > >> >> >> > >> >> > > >>> > controllers and to brokers (this snapshot
> > may
> > > be
> > > > >> >> lossy!)
> > > > >> >> >> > >> >> > > >>> > Why do we need to send this downgraded
> > > snapshot
> > > > to
> > > > >> >> the
> > > > >> >> >> > >> brokers?
> > > > >> >> >> > >> >> The
> > > > >> >> >> > >> >> > > >>> > replicas have seen the FeatureLevelRecord
> > and
> > > > >> >> noticed the
> > > > >> >> >> > >> >> > downgrade.
> > > > >> >> >> > >> >> > > >>> > Can we have the replicas each
> independently
> > > > >> generate
> > > > >> >> a
> > > > >> >> >> > >> downgraded
> > > > >> >> >> > >> >> > > >>> > snapshot at the offset for the downgraded
> > > > >> >> >> > FeatureLevelRecord?
> > > > >> >> >> > >> I
> > > > >> >> >> > >> >> > > assume
> > > > >> >> >> > >> >> > > >>> > that the active controller will guarantee
> > that
> > > > all
> > > > >> >> >> records
> > > > >> >> >> > >> after
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >>> > FatureLevelRecord use the downgraded
> > version.
> > > If
> > > > >> so,
> > > > >> >> it
> > > > >> >> >> > would
> > > > >> >> >> > >> be
> > > > >> >> >> > >> >> > good
> > > > >> >> >> > >> >> > > >>> > to mention that explicitly.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Similar to above, yes a broker that detects
> a
> > > > >> >> downgrade via
> > > > >> >> >> > >> >> > > >>> FeatureLevelRecord could generate its own
> > > > downgrade
> > > > >> >> >> snapshot
> > > > >> >> >> > and
> > > > >> >> >> > >> >> > reload
> > > > >> >> >> > >> >> > > >>> its
> > > > >> >> >> > >> >> > > >>> state from that. This does get a little
> fuzzy
> > > when
> > > > >> we
> > > > >> >> >> > consider
> > > > >> >> >> > >> >> cases
> > > > >> >> >> > >> >> > > where
> > > > >> >> >> > >> >> > > >>> brokers are on different software versions
> and
> > > > >> could be
> > > > >> >> >> > >> generating
> > > > >> >> >> > >> >> a
> > > > >> >> >> > >> >> > > >>> downgrade snapshot for version X, but using
> > > > >> different
> > > > >> >> >> > versions
> > > > >> >> >> > >> of
> > > > >> >> >> > >> >> the
> > > > >> >> >> > >> >> > > >>> code.
> > > > >> >> >> > >> >> > > >>> It might be safer to let the controller
> > generate
> > > > the
> > > > >> >> >> > snapshot so
> > > > >> >> >> > >> >> each
> > > > >> >> >> > >> >> > > >>> broker (regardless of software version) gets
> > the
> > > > >> same
> > > > >> >> >> > records.
> > > > >> >> >> > >> >> > However,
> > > > >> >> >> > >> >> > > >>> for
> > > > >> >> >> > >> >> > > >>> upgrades (or downgrades) we expect the whole
> > > > cluster
> > > > >> >> to be
> > > > >> >> >> > >> running
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >>> same
> > > > >> >> >> > >> >> > > >>> software version before triggering the
> > > > >> metadata.version
> > > > >> >> >> > change,
> > > > >> >> >> > >> so
> > > > >> >> >> > >> >> > > perhaps
> > > > >> >> >> > >> >> > > >>> this isn't a likely scenario. Thoughts?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 3. Max metadata version
> > > > >> >> >> > >> >> > > >>> > >For the first release that supports
> > > > >> >> metadata.version, we
> > > > >> >> >> > can
> > > > >> >> >> > >> >> > simply
> > > > >> >> >> > >> >> > > >>> > initialize metadata.version with the
> current
> > > > (and
> > > > >> >> only)
> > > > >> >> >> > >> version.
> > > > >> >> >> > >> >> > For
> > > > >> >> >> > >> >> > > >>> future
> > > > >> >> >> > >> >> > > >>> > releases, we will need a mechanism to
> > > bootstrap
> > > > a
> > > > >> >> >> > particular
> > > > >> >> >> > >> >> > version.
> > > > >> >> >> > >> >> > > >>> This
> > > > >> >> >> > >> >> > > >>> > could be done using the meta.properties
> file
> > > or
> > > > >> some
> > > > >> >> >> > similar
> > > > >> >> >> > >> >> > > mechanism.
> > > > >> >> >> > >> >> > > >>> The
> > > > >> >> >> > >> >> > > >>> > reason we need the allow for a specific
> > > initial
> > > > >> >> version
> > > > >> >> >> is
> > > > >> >> >> > to
> > > > >> >> >> > >> >> > support
> > > > >> >> >> > >> >> > > >>> the
> > > > >> >> >> > >> >> > > >>> > use case of starting a Kafka cluster at
> > > version
> > > > X
> > > > >> >> with an
> > > > >> >> >> > >> older
> > > > >> >> >> > >> >> > > >>> > metadata.version.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> I assume that the Active Controller will
> learn
> > > the
> > > > >> >> metadata
> > > > >> >> >> > >> version
> > > > >> >> >> > >> >> > of
> > > > >> >> >> > >> >> > > >>> > the broker through the
> > > > BrokerRegistrationRequest.
> > > > >> How
> > > > >> >> >> will
> > > > >> >> >> > the
> > > > >> >> >> > >> >> > Active
> > > > >> >> >> > >> >> > > >>> > Controller learn about the max metadata
> > > version
> > > > of
> > > > >> >> the
> > > > >> >> >> > >> inactive
> > > > >> >> >> > >> >> > > >>> > controller nodes? We currently don't send
> a
> > > > >> >> registration
> > > > >> >> >> > >> request
> > > > >> >> >> > >> >> > from
> > > > >> >> >> > >> >> > > >>> > the inactive controller to the active
> > > > controller.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> This came up during the design, but I
> > neglected
> > > to
> > > > >> add
> > > > >> >> it
> > > > >> >> >> to
> > > > >> >> >> > the
> > > > >> >> >> > >> >> KIP.
> > > > >> >> >> > >> >> > > We
> > > > >> >> >> > >> >> > > >>> will need a mechanism for determining the
> > > > supported
> > > > >> >> >> features
> > > > >> >> >> > of
> > > > >> >> >> > >> >> each
> > > > >> >> >> > >> >> > > >>> controller similar to how brokers use
> > > > >> >> >> > BrokerRegistrationRequest.
> > > > >> >> >> > >> >> > > Perhaps
> > > > >> >> >> > >> >> > > >>> controllers could write a FeatureLevelRecord
> > (or
> > > > >> >> similar)
> > > > >> >> >> to
> > > > >> >> >> > the
> > > > >> >> >> > >> >> > > metadata
> > > > >> >> >> > >> >> > > >>> log indicating their supported version.
> WDYT?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Why do you need to bootstrap a particular
> > > version?
> > > > >> >> Isn't
> > > > >> >> >> the
> > > > >> >> >> > >> intent
> > > > >> >> >> > >> >> > > >>> > that the broker will learn the active
> > metadata
> > > > >> >> version by
> > > > >> >> >> > >> reading
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >>> > metadata before unfencing?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> This bootstrapping is needed for when a
> KRaft
> > > > >> cluster
> > > > >> >> is
> > > > >> >> >> > first
> > > > >> >> >> > >> >> > > started. If
> > > > >> >> >> > >> >> > > >>> we don't have this mechanism, the cluster
> > can't
> > > > >> really
> > > > >> >> do
> > > > >> >> >> > >> anything
> > > > >> >> >> > >> >> > > until
> > > > >> >> >> > >> >> > > >>> the operator finalizes the metadata.version
> > with
> > > > the
> > > > >> >> tool.
> > > > >> >> >> > The
> > > > >> >> >> > >> >> > > >>> bootstrapping will be done by the controller
> > and
> > > > the
> > > > >> >> >> brokers
> > > > >> >> >> > >> will
> > > > >> >> >> > >> >> see
> > > > >> >> >> > >> >> > > this
> > > > >> >> >> > >> >> > > >>> version as a record (like you say). I'll add
> > > some
> > > > >> text
> > > > >> >> to
> > > > >> >> >> > >> clarify
> > > > >> >> >> > >> >> > this.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 4. Reject Registration - This is related to
> > the
> > > > >> bullet
> > > > >> >> >> point
> > > > >> >> >> > >> above.
> > > > >> >> >> > >> >> > > >>> > What will be the behavior of the active
> > > > controller
> > > > >> >> if the
> > > > >> >> >> > >> broker
> > > > >> >> >> > >> >> > > sends
> > > > >> >> >> > >> >> > > >>> > a metadata version that is not compatible
> > with
> > > > the
> > > > >> >> >> cluster
> > > > >> >> >> > >> wide
> > > > >> >> >> > >> >> > > >>> > metadata version?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> If a broker starts up with a lower supported
> > > > version
> > > > >> >> range
> > > > >> >> >> > than
> > > > >> >> >> > >> the
> > > > >> >> >> > >> >> > > >>> current
> > > > >> >> >> > >> >> > > >>> cluster metadata.version, it should log an
> > error
> > > > and
> > > > >> >> >> > shutdown.
> > > > >> >> >> > >> This
> > > > >> >> >> > >> >> > is
> > > > >> >> >> > >> >> > > in
> > > > >> >> >> > >> >> > > >>> line with KIP-584.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 5. Discover upgrade
> > > > >> >> >> > >> >> > > >>> > > This snapshot will be a convenient way
> to
> > > let
> > > > >> >> broker
> > > > >> >> >> and
> > > > >> >> >> > >> >> > controller
> > > > >> >> >> > >> >> > > >>> > components rebuild their entire in-memory
> > > state
> > > > >> >> following
> > > > >> >> >> > an
> > > > >> >> >> > >> >> > upgrade.
> > > > >> >> >> > >> >> > > >>> > Can we rely on the presence of the
> > > > >> >> FeatureLevelRecord for
> > > > >> >> >> > the
> > > > >> >> >> > >> >> > > metadata
> > > > >> >> >> > >> >> > > >>> > version for this functionality? If so, it
> > > avoids
> > > > >> >> having
> > > > >> >> >> to
> > > > >> >> >> > >> reload
> > > > >> >> >> > >> >> > the
> > > > >> >> >> > >> >> > > >>> > snapshot.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> For upgrades, yes probably since we won't
> need
> > > to
> > > > >> >> "rewrite"
> > > > >> >> >> > any
> > > > >> >> >> > >> >> > > records in
> > > > >> >> >> > >> >> > > >>> this case. For downgrades, we will need to
> > > > generate
> > > > >> the
> > > > >> >> >> > snapshot
> > > > >> >> >> > >> >> and
> > > > >> >> >> > >> >> > > >>> reload
> > > > >> >> >> > >> >> > > >>> everything.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 6. Metadata version specification
> > > > >> >> >> > >> >> > > >>> > >  V4(version=4,
> > isBackwardsCompatible=false,
> > > > >> >> >> > description="New
> > > > >> >> >> > >> >> > > metadata
> > > > >> >> >> > >> >> > > >>> > record type Bar"),
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Very cool. Do you have plans to generate
> > Apache
> > > > >> Kafka
> > > > >> >> HTML
> > > > >> >> >> > >> >> > > >>> > documentation for this information? Would
> be
> > > > >> helpful
> > > > >> >> to
> > > > >> >> >> > >> display
> > > > >> >> >> > >> >> > this
> > > > >> >> >> > >> >> > > >>> > information to the user using the
> > > > >> kafka-features.sh
> > > > >> >> and
> > > > >> >> >> > >> feature
> > > > >> >> >> > >> >> > RPC?
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Hm good idea :) I'll add a brief section on
> > > > >> >> documentation.
> > > > >> >> >> > This
> > > > >> >> >> > >> >> would
> > > > >> >> >> > >> >> > > >>> certainly be very useful
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> 7.Downgrade records
> > > > >> >> >> > >> >> > > >>> > I think we should explicitly mention that
> > the
> > > > >> >> downgrade
> > > > >> >> >> > >> process
> > > > >> >> >> > >> >> > will
> > > > >> >> >> > >> >> > > >>> > downgrade both metadata records and
> > controller
> > > > >> >> records.
> > > > >> >> >> In
> > > > >> >> >> > >> >> KIP-630
> > > > >> >> >> > >> >> > we
> > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > snapshots.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Yes, good call. Let me re-read that KIP and
> > > > include
> > > > >> >> some
> > > > >> >> >> > >> details.
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> Thanks again for the comments!
> > > > >> >> >> > >> >> > > >>> -David
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando
> > > > García
> > > > >> >> Sancio
> > > > >> >> >> > >> >> > > >>> <jsan...@confluent.io.invalid> wrote:
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>> > One more comment.
> > > > >> >> >> > >> >> > > >>> >
> > > > >> >> >> > >> >> > > >>> > 7.Downgrade records
> > > > >> >> >> > >> >> > > >>> > I think we should explicitly mention that
> > the
> > > > >> >> downgrade
> > > > >> >> >> > >> process
> > > > >> >> >> > >> >> > will
> > > > >> >> >> > >> >> > > >>> > downgrade both metadata records and
> > controller
> > > > >> >> records.
> > > > >> >> >> In
> > > > >> >> >> > >> >> KIP-630
> > > > >> >> >> > >> >> > we
> > > > >> >> >> > >> >> > > >>> > introduced two control records for
> > snapshots.
> > > > >> >> >> > >> >> > > >>> >
> > > > >> >> >> > >> >> > > >>>
> > > > >> >> >> > >> >> > > >>
> > > > >> >> >> > >> >> > >
> > > > >> >> >> > >> >> >
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >>
> > > > >> >> >> > >> >> --
> > > > >> >> >> > >> >> -David
> > > > >> >> >> > >> >>
> > > > >> >> >> > >>
> > > > >> >> >> >
> > > > >> >> >>
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > -David
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> >
> >
> > --
> > -David
> >
>


-- 
-David

Reply via email to