Re: [DISCUSS] KIP-778 KRaft Upgrades

Colin McCabe Tue, 16 Nov 2021 12:55:11 -0800

On Fri, Nov 5, 2021, at 08:10, David Arthur wrote:
> Colin and Jun, thanks for the additional comments!
>
> Colin:
>
>> We've been talking about having an automated RPC compatibility checker
>
> Do we have a way to mark fields in schemas as deprecated? It can stay in
> the RPC, it just complicates the logic a bit.
>


Hi David,

The easiest thing to do is probably just to add a comment to the "description" 
field.

Thinking about it more, I really think we should stick to the normal RPC 
versioning rules. Making exceptions just tends to lead to issues down the road.

>> It would be nice if the active controller could validate that a majority
> of the quorum could use the proposed metadata.version. The active
> controller should have this information, right? If we don't have recent
> information  from a quorum of voters, we wouldn't be active.
>
> I believe we should have this information from the ApiVersionsResponse. It
> would be good to do this validation to avoid a situation where a
> quorum leader can't be elected due to unprocessable records.
>

Sounds good... can you add this to the KIP?

>> Do we need delete as a command separate from downgrade?
>
> I think from an operator's perspective, it is nice to distinguish between
> changing a feature flag and unsetting it. It might be surprising to an
> operator to see the flag's version set to nothing when they requested the
> downgrade to version 0 (or less).
>

Fair enough. If we want to go this route we should also specify whether set to 
0 is legal, or whether it's necessary to use "delete".

Jun's proposal of using "disable" instead of delete also seems reasonable 
here...

>
> > it seems like we should spell out that metadata.version begins at 1 in
> >KRaft clusters
>
> I added this text:
>

Thanks.

> > We probably also want an RPC implemented by both brokers and controllers
> > that will reveal the min and max supported versions for each feature level
> > supported by the server
>
> This is available in ApiVersionsResponse (we include the server's supported
> features as well as the cluster's finalized features)

Good point.

>
> Jun:
>
> 20. Indeed snapshots are not strictly necessary during an upgrade, I've
> reworded this
>

I thought we agreed that we would do a snapshot before each upgrade so that if 
there was a big problem with the new metadata version, we would at least have a 
clean snapshot to fall back on?

best,
Colin


>
> Thanks!
> David
>
>
> On Thu, Nov 4, 2021 at 6:51 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
>> Hi, David, Jose and Colin,
>>
>> Thanks for the reply. A few more comments.
>>
>> 12. It seems that we haven't updated the AdminClient accordingly?
>>
>> 14. "Metadata snapshot is generated and sent to the other inactive
>> controllers and to brokers". I thought we wanted each broker to generate
>> its own snapshot independently? If only the controller generates the
>> snapshot, how do we force other brokers to pick it up?
>>
>> 16. If a feature version is new, one may not want to enable it immediately
>> after the cluster is upgraded. However, if a feature version has been
>> stable, requiring every user to run a command to upgrade to that version
>> seems inconvenient. One way to improve this is for each feature to define
>> one version as the default. Then, when we upgrade a cluster, we will
>> automatically upgrade the feature to the default version. An admin could
>> use the tool to upgrade to a version higher than the default.
>>
>> 20. "The quorum controller can assist with this process by generating a
>> metadata snapshot after a metadata.version increase has been committed to
>> the metadata log. This snapshot will be a convenient way to let broker and
>> controller components rebuild their entire in-memory state following an
>> upgrade." The new version of the software could read both the new and the
>> old version. Is generating a new snapshot during upgrade needed?
>>
>> Jun
>>
>>
>> On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <cmcc...@apache.org> wrote:
>>
>> > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
>> > > Hi, David,
>> > >
>> > > One more comment.
>> > >
>> > > 16. The main reason why KIP-584 requires finalizing a feature manually
>> is
>> > > that in the ZK world, the controller doesn't know all brokers in a
>> > cluster.
>> > > A broker temporarily down is not registered in ZK. in the KRaft world,
>> > the
>> > > controller keeps track of all brokers, including those that are
>> > temporarily
>> > > down. This makes it possible for the controller to automatically
>> > finalize a
>> > > feature---it's safe to do so when all brokers support that feature.
>> This
>> > > will make the upgrade process much simpler since no manual command is
>> > > required to turn on a new feature. Have we considered this?
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> >
>> > Hi Jun,
>> >
>> > I guess David commented on this point already, but I'll comment as well.
>> I
>> > always had the perception that users viewed rolls as potentially risky
>> and
>> > were looking for ways to reduce the risk. Not enabling features right
>> away
>> > after installing new software seems like one way to do that. If we had a
>> > feature to automatically upgrade during a roll, I'm not sure that I would
>> > recommend that people use it, because if something fails, it makes it
>> > harder to tell if the new feature is at fault, or something else in the
>> new
>> > software.
>> >
>> > We already tell users to do a "double roll" when going to a new IBP.
>> (Just
>> > to give background to people who haven't heard that phrase, the first
>> roll
>> > installs the new software, and the second roll updates the IBP). So this
>> > KIP-778 mechanism is basically very similar to that, except the second
>> > thing isn't a roll, but just an upgrade command. So I think this is
>> > consistent with what we currently do.
>> >
>> > Also, just like David said, we can always add auto-upgrade later if there
>> > is demand...
>> >
>> > best,
>> > Colin
>> >
>> >
>> > >
>> > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <j...@confluent.io> wrote:
>> > >
>> > >> Hi, David,
>> > >>
>> > >> Thanks for the KIP. A few comments below.
>> > >>
>> > >> 10. It would be useful to describe how the controller node determines
>> > the
>> > >> RPC version used to communicate to other controller nodes. There seems
>> > to
>> > >> be a bootstrap problem. A controller node can't read the log and
>> > >> therefore the feature level until a quorum leader is elected. But
>> leader
>> > >> election requires an RPC.
>> > >>
>> > >> 11. For downgrades, it would be useful to describe how to determine
>> the
>> > >> downgrade process (generating new snapshot, propagating the snapshot,
>> > etc)
>> > >> has completed. We could block the UpdateFeature request until the
>> > process
>> > >> is completed. However, since the process could take time, the request
>> > could
>> > >> time out. Another way is through DescribeFeature and the server only
>> > >> reports downgraded versions after the process is completed.
>> > >>
>> > >> 12. Since we are changing UpdateFeaturesRequest, do we need to change
>> > the
>> > >> AdminClient api for updateFeatures too?
>> > >>
>> > >> 13. For the paragraph starting with "In the absence of an operator
>> > >> defined value for metadata.version", in KIP-584, we described how to
>> > >> finalize features with New cluster bootstrap. In that case, it's
>> > >> inconvenient for the users to have to run an admin tool to finalize
>> the
>> > >> version for each feature. Instead, the system detects that the
>> /features
>> > >> path is missing in ZK and thus automatically finalizes every feature
>> > with
>> > >> the latest supported version. Could we do something similar in the
>> KRaft
>> > >> mode?
>> > >>
>> > >> 14. After the quorum leader generates a new snapshot, how do we force
>> > >> other nodes to pick up the new snapshot?
>> > >>
>> > >> 15. I agree with Jose that it will be useful to describe when
>> > generating a
>> > >> new snapshot is needed. To me, it seems the new snapshot is only
>> needed
>> > >> when incompatible changes are made.
>> > >>
>> > >> 7. Jose, what control records were you referring?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jun
>> > >>
>> > >>
>> > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <davidart...@apache.org>
>> > >> wrote:
>> > >>
>> > >>> Jose, thanks for the thorough review and comments!
>> > >>>
>> > >>> I am out of the office until next week, so I probably won't be able
>> to
>> > >>> update the KIP until then. Here are some replies to your questions:
>> > >>>
>> > >>> 1. Generate snapshot on upgrade
>> > >>> > > Metadata snapshot is generated and sent to the other nodes
>> > >>> > Why does the Active Controller need to generate a new snapshot and
>> > >>> > force a snapshot fetch from the replicas (inactive controller and
>> > >>> > brokers) on an upgrade? Isn't writing the FeatureLevelRecord good
>> > >>> > enough to communicate the upgrade to the replicas?
>> > >>>
>> > >>>
>> > >>> You're right, we don't necessarily need to _transmit_ a snapshot,
>> since
>> > >>> each node can generate its own equivalent snapshot
>> > >>>
>> > >>> 2. Generate snapshot on downgrade
>> > >>> > > Metadata snapshot is generated and sent to the other inactive
>> > >>> > controllers and to brokers (this snapshot may be lossy!)
>> > >>> > Why do we need to send this downgraded snapshot to the brokers? The
>> > >>> > replicas have seen the FeatureLevelRecord and noticed the
>> downgrade.
>> > >>> > Can we have the replicas each independently generate a downgraded
>> > >>> > snapshot at the offset for the downgraded FeatureLevelRecord? I
>> > assume
>> > >>> > that the active controller will guarantee that all records after
>> the
>> > >>> > FatureLevelRecord use the downgraded version. If so, it would be
>> good
>> > >>> > to mention that explicitly.
>> > >>>
>> > >>>
>> > >>> Similar to above, yes a broker that detects a downgrade via
>> > >>> FeatureLevelRecord could generate its own downgrade snapshot and
>> reload
>> > >>> its
>> > >>> state from that. This does get a little fuzzy when we consider cases
>> > where
>> > >>> brokers are on different software versions and could be generating a
>> > >>> downgrade snapshot for version X, but using different versions of the
>> > >>> code.
>> > >>> It might be safer to let the controller generate the snapshot so each
>> > >>> broker (regardless of software version) gets the same records.
>> However,
>> > >>> for
>> > >>> upgrades (or downgrades) we expect the whole cluster to be running
>> the
>> > >>> same
>> > >>> software version before triggering the metadata.version change, so
>> > perhaps
>> > >>> this isn't a likely scenario. Thoughts?
>> > >>>
>> > >>>
>> > >>> 3. Max metadata version
>> > >>> > >For the first release that supports metadata.version, we can
>> simply
>> > >>> > initialize metadata.version with the current (and only) version.
>> For
>> > >>> future
>> > >>> > releases, we will need a mechanism to bootstrap a particular
>> version.
>> > >>> This
>> > >>> > could be done using the meta.properties file or some similar
>> > mechanism.
>> > >>> The
>> > >>> > reason we need the allow for a specific initial version is to
>> support
>> > >>> the
>> > >>> > use case of starting a Kafka cluster at version X with an older
>> > >>> > metadata.version.
>> > >>>
>> > >>>
>> > >>> I assume that the Active Controller will learn the metadata version
>> of
>> > >>> > the broker through the BrokerRegistrationRequest. How will the
>> Active
>> > >>> > Controller learn about the max metadata version of the inactive
>> > >>> > controller nodes? We currently don't send a registration request
>> from
>> > >>> > the inactive controller to the active controller.
>> > >>>
>> > >>>
>> > >>> This came up during the design, but I neglected to add it to the KIP.
>> > We
>> > >>> will need a mechanism for determining the supported features of each
>> > >>> controller similar to how brokers use BrokerRegistrationRequest.
>> > Perhaps
>> > >>> controllers could write a FeatureLevelRecord (or similar) to the
>> > metadata
>> > >>> log indicating their supported version. WDYT?
>> > >>>
>> > >>> Why do you need to bootstrap a particular version? Isn't the intent
>> > >>> > that the broker will learn the active metadata version by reading
>> the
>> > >>> > metadata before unfencing?
>> > >>>
>> > >>>
>> > >>> This bootstrapping is needed for when a KRaft cluster is first
>> > started. If
>> > >>> we don't have this mechanism, the cluster can't really do anything
>> > until
>> > >>> the operator finalizes the metadata.version with the tool. The
>> > >>> bootstrapping will be done by the controller and the brokers will see
>> > this
>> > >>> version as a record (like you say). I'll add some text to clarify
>> this.
>> > >>>
>> > >>>
>> > >>> 4. Reject Registration - This is related to the bullet point above.
>> > >>> > What will be the behavior of the active controller if the broker
>> > sends
>> > >>> > a metadata version that is not compatible with the cluster wide
>> > >>> > metadata version?
>> > >>>
>> > >>>
>> > >>> If a broker starts up with a lower supported version range than the
>> > >>> current
>> > >>> cluster metadata.version, it should log an error and shutdown. This
>> is
>> > in
>> > >>> line with KIP-584.
>> > >>>
>> > >>> 5. Discover upgrade
>> > >>> > > This snapshot will be a convenient way to let broker and
>> controller
>> > >>> > components rebuild their entire in-memory state following an
>> upgrade.
>> > >>> > Can we rely on the presence of the FeatureLevelRecord for the
>> > metadata
>> > >>> > version for this functionality? If so, it avoids having to reload
>> the
>> > >>> > snapshot.
>> > >>>
>> > >>>
>> > >>> For upgrades, yes probably since we won't need to "rewrite" any
>> > records in
>> > >>> this case. For downgrades, we will need to generate the snapshot and
>> > >>> reload
>> > >>> everything.
>> > >>>
>> > >>> 6. Metadata version specification
>> > >>> > >  V4(version=4, isBackwardsCompatible=false, description="New
>> > metadata
>> > >>> > record type Bar"),
>> > >>>
>> > >>>
>> > >>> Very cool. Do you have plans to generate Apache Kafka HTML
>> > >>> > documentation for this information? Would be helpful to display
>> this
>> > >>> > information to the user using the kafka-features.sh and feature
>> RPC?
>> > >>>
>> > >>>
>> > >>> Hm good idea :) I'll add a brief section on documentation. This would
>> > >>> certainly be very useful
>> > >>>
>> > >>> 7.Downgrade records
>> > >>> > I think we should explicitly mention that the downgrade process
>> will
>> > >>> > downgrade both metadata records and controller records. In KIP-630
>> we
>> > >>> > introduced two control records for snapshots.
>> > >>>
>> > >>>
>> > >>> Yes, good call. Let me re-read that KIP and include some details.
>> > >>>
>> > >>> Thanks again for the comments!
>> > >>> -David
>> > >>>
>> > >>>
>> > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García Sancio
>> > >>> <jsan...@confluent.io.invalid> wrote:
>> > >>>
>> > >>> > One more comment.
>> > >>> >
>> > >>> > 7.Downgrade records
>> > >>> > I think we should explicitly mention that the downgrade process
>> will
>> > >>> > downgrade both metadata records and controller records. In KIP-630
>> we
>> > >>> > introduced two control records for snapshots.
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>
>
> -- 
> -David

Re: [DISCUSS] KIP-778 KRaft Upgrades

Reply via email to