Re: [DISCUSS] KIP-778 KRaft Upgrades

Colin McCabe Tue, 16 Nov 2021 12:39:07 -0800

On Fri, Nov 5, 2021, at 15:18, Jun Rao wrote:
> Hi, David,
>
> Thanks for the reply.
>
> 16. My first concern is that the KIP picks up meta.version inconsistently
> during the deployment. If a new cluster is started, we pick up the highest
> version. If we upgrade, we leave the feature version unchanged.


Hi Jun,

Thanks again for taking a look.

The proposed behavior in KIP-778 is consistent with how it works today. 
Upgrading the software is distinct from upgrading the IBP.

I think it is important to keep these two operations ("upgrading IBP/metadata 
version" and "upgrading software version") separate. If they are coupled it 
will create a situation where software upgrades are difficult and dangerous.

Consider a situation where you find some bug in your current software, and you 
want to upgrade to new software that fixes the bug. If upgrades and IBP bumps 
are coupled, you can't do this without also bumping the IBP, which is usually 
considered a high-risk change. That means that either you have to make a 
special build that includes only the fix (time-consuming and error-prone), live 
with the bug for longer, or be very conservative about ever introducing new 
IBP/metadata versions. None of those are really good choices.

> Intuitively, it seems that independent of how a cluster is deployed, we
> should always pick the same feature version.

I think it makes sense to draw a distinction between upgrading an existing 
cluster and deploying a new one. What most people want out of upgrades is that 
things should keep working, but with bug fixes. If we change that, it just 
makes people more reluctant to upgrade (which is always a problem...)

> I think we need to think this through in this KIP. My second concern is
> that as a particular version matures, it's inconvenient for a user to manually
> upgrade every feature version. As long as we have a path to achieve that in
> the future, we don't need to address that in this KIP.

If people are managing a large number of Kafka clusters, they will want to do 
some sort of A/B testing with IBP/metadata versions. So if you have 1000 Kafka 
clusters, you roll out the new IBP version to 10 of them and see how it goes. 
If that goes well, you roll it out to more, etc.

So, the automation needs to be at the cluster management layer, not at the 
Kafka layer. Each Kafka cluster doesn't know how well things went in the other 
999 clusters. Automatically upgrading is a bad thing for the same reason Kafka 
automatically upgrading its own software version would be a bad thing -- it 
could lead to a disruption to a sensitive cluster at the wrong time.

For people who are just managing one or two Kafka clusters, automatically 
upgrading feature versions isn't a big burden and can be done manually. This is 
all consistent with how IBP works today.

Also, we already have a command-line option to the feature tool which upgrades 
all features to the latest available, if that is what the administrator 
desires. Perhaps we could add documentation saying that this should be done as 
the last step of the upgrade.

best,
Colin

>
> 21. "./kafka-features.sh delete": Deleting a feature seems a bit weird
> since the logic is always there. Would it be better to use disable?
>
> Jun
>
> On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> <david.art...@confluent.io.invalid> wrote:
>
>> Colin and Jun, thanks for the additional comments!
>>
>> Colin:
>>
>> > We've been talking about having an automated RPC compatibility checker
>>
>> Do we have a way to mark fields in schemas as deprecated? It can stay in
>> the RPC, it just complicates the logic a bit.
>>
>> > It would be nice if the active controller could validate that a majority
>> of the quorum could use the proposed metadata.version. The active
>> controller should have this information, right? If we don't have recent
>> information  from a quorum of voters, we wouldn't be active.
>>
>> I believe we should have this information from the ApiVersionsResponse. It
>> would be good to do this validation to avoid a situation where a
>> quorum leader can't be elected due to unprocessable records.
>>
>> > Do we need delete as a command separate from downgrade?
>>
>> I think from an operator's perspective, it is nice to distinguish between
>> changing a feature flag and unsetting it. It might be surprising to an
>> operator to see the flag's version set to nothing when they requested the
>> downgrade to version 0 (or less).
>>
>> > it seems like we should spell out that metadata.version begins at 1 in
>> KRaft clusters
>>
>> I added this text:
>>
>> Introduce an IBP version to indicate the lowest software version that
>> > supports *metadata.version*. Below this IBP, the *metadata.version* is
>> > undefined and will not be examined. At or above this IBP, the
>> > *metadata.version* must be *0* for ZooKeeper clusters and will be
>> > initialized as *1* for KRaft clusters.
>>
>>
>> > We probably also want an RPC implemented by both brokers and controllers
>> that will reveal the min and max supported versions for each feature level
>> supported by the server
>>
>> This is available in ApiVersionsResponse (we include the server's supported
>> features as well as the cluster's finalized features)
>>
>> --------
>>
>> Jun:
>>
>> 12. I've updated the KIP with AdminClient changes
>>
>> 14. You're right, it looks like I missed a few sections regarding snapshot
>> generation. I've corrected it
>>
>> 16. This feels more like an enhancement to KIP-584. I agree it could be
>> useful, but perhaps we could address it separately from KRaft upgrades?
>>
>> 20. Indeed snapshots are not strictly necessary during an upgrade, I've
>> reworded this
>>
>>
>> Thanks!
>> David
>>
>>
>> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao <j...@confluent.io.invalid> wrote:
>>
>> > Hi, David, Jose and Colin,
>> >
>> > Thanks for the reply. A few more comments.
>> >
>> > 12. It seems that we haven't updated the AdminClient accordingly?
>> >
>> > 14. "Metadata snapshot is generated and sent to the other inactive
>> > controllers and to brokers". I thought we wanted each broker to generate
>> > its own snapshot independently? If only the controller generates the
>> > snapshot, how do we force other brokers to pick it up?
>> >
>> > 16. If a feature version is new, one may not want to enable it
>> immediately
>> > after the cluster is upgraded. However, if a feature version has been
>> > stable, requiring every user to run a command to upgrade to that version
>> > seems inconvenient. One way to improve this is for each feature to define
>> > one version as the default. Then, when we upgrade a cluster, we will
>> > automatically upgrade the feature to the default version. An admin could
>> > use the tool to upgrade to a version higher than the default.
>> >
>> > 20. "The quorum controller can assist with this process by generating a
>> > metadata snapshot after a metadata.version increase has been committed to
>> > the metadata log. This snapshot will be a convenient way to let broker
>> and
>> > controller components rebuild their entire in-memory state following an
>> > upgrade." The new version of the software could read both the new and the
>> > old version. Is generating a new snapshot during upgrade needed?
>> >
>> > Jun
>> >
>> >
>> > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <cmcc...@apache.org> wrote:
>> >
>> > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
>> > > > Hi, David,
>> > > >
>> > > > One more comment.
>> > > >
>> > > > 16. The main reason why KIP-584 requires finalizing a feature
>> manually
>> > is
>> > > > that in the ZK world, the controller doesn't know all brokers in a
>> > > cluster.
>> > > > A broker temporarily down is not registered in ZK. in the KRaft
>> world,
>> > > the
>> > > > controller keeps track of all brokers, including those that are
>> > > temporarily
>> > > > down. This makes it possible for the controller to automatically
>> > > finalize a
>> > > > feature---it's safe to do so when all brokers support that feature.
>> > This
>> > > > will make the upgrade process much simpler since no manual command is
>> > > > required to turn on a new feature. Have we considered this?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > >
>> > > Hi Jun,
>> > >
>> > > I guess David commented on this point already, but I'll comment as
>> well.
>> > I
>> > > always had the perception that users viewed rolls as potentially risky
>> > and
>> > > were looking for ways to reduce the risk. Not enabling features right
>> > away
>> > > after installing new software seems like one way to do that. If we had
>> a
>> > > feature to automatically upgrade during a roll, I'm not sure that I
>> would
>> > > recommend that people use it, because if something fails, it makes it
>> > > harder to tell if the new feature is at fault, or something else in the
>> > new
>> > > software.
>> > >
>> > > We already tell users to do a "double roll" when going to a new IBP.
>> > (Just
>> > > to give background to people who haven't heard that phrase, the first
>> > roll
>> > > installs the new software, and the second roll updates the IBP). So
>> this
>> > > KIP-778 mechanism is basically very similar to that, except the second
>> > > thing isn't a roll, but just an upgrade command. So I think this is
>> > > consistent with what we currently do.
>> > >
>> > > Also, just like David said, we can always add auto-upgrade later if
>> there
>> > > is demand...
>> > >
>> > > best,
>> > > Colin
>> > >
>> > >
>> > > >
>> > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <j...@confluent.io> wrote:
>> > > >
>> > > >> Hi, David,
>> > > >>
>> > > >> Thanks for the KIP. A few comments below.
>> > > >>
>> > > >> 10. It would be useful to describe how the controller node
>> determines
>> > > the
>> > > >> RPC version used to communicate to other controller nodes. There
>> seems
>> > > to
>> > > >> be a bootstrap problem. A controller node can't read the log and
>> > > >> therefore the feature level until a quorum leader is elected. But
>> > leader
>> > > >> election requires an RPC.
>> > > >>
>> > > >> 11. For downgrades, it would be useful to describe how to determine
>> > the
>> > > >> downgrade process (generating new snapshot, propagating the
>> snapshot,
>> > > etc)
>> > > >> has completed. We could block the UpdateFeature request until the
>> > > process
>> > > >> is completed. However, since the process could take time, the
>> request
>> > > could
>> > > >> time out. Another way is through DescribeFeature and the server only
>> > > >> reports downgraded versions after the process is completed.
>> > > >>
>> > > >> 12. Since we are changing UpdateFeaturesRequest, do we need to
>> change
>> > > the
>> > > >> AdminClient api for updateFeatures too?
>> > > >>
>> > > >> 13. For the paragraph starting with "In the absence of an operator
>> > > >> defined value for metadata.version", in KIP-584, we described how to
>> > > >> finalize features with New cluster bootstrap. In that case, it's
>> > > >> inconvenient for the users to have to run an admin tool to finalize
>> > the
>> > > >> version for each feature. Instead, the system detects that the
>> > /features
>> > > >> path is missing in ZK and thus automatically finalizes every feature
>> > > with
>> > > >> the latest supported version. Could we do something similar in the
>> > KRaft
>> > > >> mode?
>> > > >>
>> > > >> 14. After the quorum leader generates a new snapshot, how do we
>> force
>> > > >> other nodes to pick up the new snapshot?
>> > > >>
>> > > >> 15. I agree with Jose that it will be useful to describe when
>> > > generating a
>> > > >> new snapshot is needed. To me, it seems the new snapshot is only
>> > needed
>> > > >> when incompatible changes are made.
>> > > >>
>> > > >> 7. Jose, what control records were you referring?
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jun
>> > > >>
>> > > >>
>> > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <davidart...@apache.org
>> >
>> > > >> wrote:
>> > > >>
>> > > >>> Jose, thanks for the thorough review and comments!
>> > > >>>
>> > > >>> I am out of the office until next week, so I probably won't be able
>> > to
>> > > >>> update the KIP until then. Here are some replies to your questions:
>> > > >>>
>> > > >>> 1. Generate snapshot on upgrade
>> > > >>> > > Metadata snapshot is generated and sent to the other nodes
>> > > >>> > Why does the Active Controller need to generate a new snapshot
>> and
>> > > >>> > force a snapshot fetch from the replicas (inactive controller and
>> > > >>> > brokers) on an upgrade? Isn't writing the FeatureLevelRecord good
>> > > >>> > enough to communicate the upgrade to the replicas?
>> > > >>>
>> > > >>>
>> > > >>> You're right, we don't necessarily need to _transmit_ a snapshot,
>> > since
>> > > >>> each node can generate its own equivalent snapshot
>> > > >>>
>> > > >>> 2. Generate snapshot on downgrade
>> > > >>> > > Metadata snapshot is generated and sent to the other inactive
>> > > >>> > controllers and to brokers (this snapshot may be lossy!)
>> > > >>> > Why do we need to send this downgraded snapshot to the brokers?
>> The
>> > > >>> > replicas have seen the FeatureLevelRecord and noticed the
>> > downgrade.
>> > > >>> > Can we have the replicas each independently generate a downgraded
>> > > >>> > snapshot at the offset for the downgraded FeatureLevelRecord? I
>> > > assume
>> > > >>> > that the active controller will guarantee that all records after
>> > the
>> > > >>> > FatureLevelRecord use the downgraded version. If so, it would be
>> > good
>> > > >>> > to mention that explicitly.
>> > > >>>
>> > > >>>
>> > > >>> Similar to above, yes a broker that detects a downgrade via
>> > > >>> FeatureLevelRecord could generate its own downgrade snapshot and
>> > reload
>> > > >>> its
>> > > >>> state from that. This does get a little fuzzy when we consider
>> cases
>> > > where
>> > > >>> brokers are on different software versions and could be generating
>> a
>> > > >>> downgrade snapshot for version X, but using different versions of
>> the
>> > > >>> code.
>> > > >>> It might be safer to let the controller generate the snapshot so
>> each
>> > > >>> broker (regardless of software version) gets the same records.
>> > However,
>> > > >>> for
>> > > >>> upgrades (or downgrades) we expect the whole cluster to be running
>> > the
>> > > >>> same
>> > > >>> software version before triggering the metadata.version change, so
>> > > perhaps
>> > > >>> this isn't a likely scenario. Thoughts?
>> > > >>>
>> > > >>>
>> > > >>> 3. Max metadata version
>> > > >>> > >For the first release that supports metadata.version, we can
>> > simply
>> > > >>> > initialize metadata.version with the current (and only) version.
>> > For
>> > > >>> future
>> > > >>> > releases, we will need a mechanism to bootstrap a particular
>> > version.
>> > > >>> This
>> > > >>> > could be done using the meta.properties file or some similar
>> > > mechanism.
>> > > >>> The
>> > > >>> > reason we need the allow for a specific initial version is to
>> > support
>> > > >>> the
>> > > >>> > use case of starting a Kafka cluster at version X with an older
>> > > >>> > metadata.version.
>> > > >>>
>> > > >>>
>> > > >>> I assume that the Active Controller will learn the metadata version
>> > of
>> > > >>> > the broker through the BrokerRegistrationRequest. How will the
>> > Active
>> > > >>> > Controller learn about the max metadata version of the inactive
>> > > >>> > controller nodes? We currently don't send a registration request
>> > from
>> > > >>> > the inactive controller to the active controller.
>> > > >>>
>> > > >>>
>> > > >>> This came up during the design, but I neglected to add it to the
>> KIP.
>> > > We
>> > > >>> will need a mechanism for determining the supported features of
>> each
>> > > >>> controller similar to how brokers use BrokerRegistrationRequest.
>> > > Perhaps
>> > > >>> controllers could write a FeatureLevelRecord (or similar) to the
>> > > metadata
>> > > >>> log indicating their supported version. WDYT?
>> > > >>>
>> > > >>> Why do you need to bootstrap a particular version? Isn't the intent
>> > > >>> > that the broker will learn the active metadata version by reading
>> > the
>> > > >>> > metadata before unfencing?
>> > > >>>
>> > > >>>
>> > > >>> This bootstrapping is needed for when a KRaft cluster is first
>> > > started. If
>> > > >>> we don't have this mechanism, the cluster can't really do anything
>> > > until
>> > > >>> the operator finalizes the metadata.version with the tool. The
>> > > >>> bootstrapping will be done by the controller and the brokers will
>> see
>> > > this
>> > > >>> version as a record (like you say). I'll add some text to clarify
>> > this.
>> > > >>>
>> > > >>>
>> > > >>> 4. Reject Registration - This is related to the bullet point above.
>> > > >>> > What will be the behavior of the active controller if the broker
>> > > sends
>> > > >>> > a metadata version that is not compatible with the cluster wide
>> > > >>> > metadata version?
>> > > >>>
>> > > >>>
>> > > >>> If a broker starts up with a lower supported version range than the
>> > > >>> current
>> > > >>> cluster metadata.version, it should log an error and shutdown. This
>> > is
>> > > in
>> > > >>> line with KIP-584.
>> > > >>>
>> > > >>> 5. Discover upgrade
>> > > >>> > > This snapshot will be a convenient way to let broker and
>> > controller
>> > > >>> > components rebuild their entire in-memory state following an
>> > upgrade.
>> > > >>> > Can we rely on the presence of the FeatureLevelRecord for the
>> > > metadata
>> > > >>> > version for this functionality? If so, it avoids having to reload
>> > the
>> > > >>> > snapshot.
>> > > >>>
>> > > >>>
>> > > >>> For upgrades, yes probably since we won't need to "rewrite" any
>> > > records in
>> > > >>> this case. For downgrades, we will need to generate the snapshot
>> and
>> > > >>> reload
>> > > >>> everything.
>> > > >>>
>> > > >>> 6. Metadata version specification
>> > > >>> > >  V4(version=4, isBackwardsCompatible=false, description="New
>> > > metadata
>> > > >>> > record type Bar"),
>> > > >>>
>> > > >>>
>> > > >>> Very cool. Do you have plans to generate Apache Kafka HTML
>> > > >>> > documentation for this information? Would be helpful to display
>> > this
>> > > >>> > information to the user using the kafka-features.sh and feature
>> > RPC?
>> > > >>>
>> > > >>>
>> > > >>> Hm good idea :) I'll add a brief section on documentation. This
>> would
>> > > >>> certainly be very useful
>> > > >>>
>> > > >>> 7.Downgrade records
>> > > >>> > I think we should explicitly mention that the downgrade process
>> > will
>> > > >>> > downgrade both metadata records and controller records. In
>> KIP-630
>> > we
>> > > >>> > introduced two control records for snapshots.
>> > > >>>
>> > > >>>
>> > > >>> Yes, good call. Let me re-read that KIP and include some details.
>> > > >>>
>> > > >>> Thanks again for the comments!
>> > > >>> -David
>> > > >>>
>> > > >>>
>> > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García Sancio
>> > > >>> <jsan...@confluent.io.invalid> wrote:
>> > > >>>
>> > > >>> > One more comment.
>> > > >>> >
>> > > >>> > 7.Downgrade records
>> > > >>> > I think we should explicitly mention that the downgrade process
>> > will
>> > > >>> > downgrade both metadata records and controller records. In
>> KIP-630
>> > we
>> > > >>> > introduced two control records for snapshots.
>> > > >>> >
>> > > >>>
>> > > >>
>> > >
>> >
>>
>>
>> --
>> -David
>>

Re: [DISCUSS] KIP-778 KRaft Upgrades

Reply via email to