Re: [DISCUSS] KIP-778 KRaft Upgrades

Guozhang Wang Tue, 16 Nov 2021 13:37:05 -0800

Hi Colin,

If we allow downgrades which would be appended in metadata.version, then
the length of the __cluster_medata log may not be safe to indicate higher
versions, since older version records could still be appended later than a
new version record right?


On Tue, Nov 16, 2021 at 1:16 PM Colin McCabe <[email protected]> wrote:

> On Tue, Nov 16, 2021, at 06:36, David Arthur wrote:
> > An interesting case here is how to handle a version update if a majority
> of
> > the quorum supports it, but the leader doesn't. For example, if C1 was
> the
> > leader and an upgrade to version 4 was requested. Maybe this would
> trigger
> > C1 to resign and inform the client to retry the update later.
> >
>
> Hmm, wouldn't we want to just reject the version update in this case? I
> don't see what the advantage of allowing it would be.
>
> The reason for requiring a majority rather than all voters is mainly to
> cover the case where a voter is down, I thought. That clearly doesn't apply
> if the un-upgraded voter is the leader itself...
>
> >
> > We may eventually want to consider the metadata.version when electing a
> > leader, but as long as we have the majority requirement before
> committing a
> > new metadata.version, I think we should be safe.
> >
>
> Yeah, this is safe. If we elect a leader at metadata.version X then that
> means that a majority of the cluster is at least at version X. Proof by
> contradiction: assume that this is not the case. Then the newly elected
> leader must have a shorter __cluster_metadata log than a majority of the
> voters. But this is incompatible with winning a Raft election.
>
> In the case where the leader is "behind" some of the other voters, those
> voters will truncate their logs to match the new leader. This will
> downgrade them. Basically this is the case where the feature upgrade was
> proposed, but never fully completed.
>
> best,
> Colin
>
>
> > -David
> >
> > On Mon, Nov 15, 2021 at 12:52 PM Guozhang Wang <[email protected]>
> wrote:
> >
> >> Thanks David,
> >>
> >> 1. Got it. One thing I'm still not very clear is why it's sufficient to
> >> select a metadata.version which is supported by majority of the quorum,
> but
> >> not the whole quorum (i.e. choosing the lowest version among all the
> quorum
> >> members)? Since the leader election today does not take this value into
> >> consideration, we are not guaranteed that newly selected leaders would
> >> always be able to recognize and support the initialized metadata.version
> >> right?
> >>
> >> 2. Yeah I think I agree the behavior-but-not-RPC-change scenario is
> beyond
> >> the scope of this KIP, we can defer it to later discussions.
> >>
> >> On Mon, Nov 15, 2021 at 8:13 AM David Arthur
> >> <[email protected]> wrote:
> >>
> >> > Guozhang, thanks for the review!
> >> >
> >> > 1, As we've defined it so far, the initial metadata.version is set by
> an
> >> > operator via the "kafka-storage.sh" tool. It would be possible for
> >> > different values to be selected, but only the quorum leader would
> commit
> >> a
> >> > FeatureLevelRecord with the version they read locally. See the above
> >> reply
> >> > to Jun's question for a little more detail.
> >> >
> >> > We need to enable the KRaft RPCs regardless of metadata.version (vote,
> >> > heartbeat, fetch, etc) so that the quorum can be formed, a leader can
> be
> >> > elected, and followers can learn about the selected metadata.version.
> I
> >> > believe the quorum startup procedure would go something like:
> >> >
> >> > * Controller nodes start, read their logs, begin leader election
> >> > * While the elected node is becoming leader (in
> >> > QuorumMetaLogListener#handleLeaderChange), initialize
> metadata.version if
> >> > necessary
> >> > * Followers replicate the FeatureLevelRecord
> >> >
> >> > This (probably) means the quorum peers must continue to rely on
> >> ApiVersions
> >> > to negotiate compatible RPC versions and these versions cannot depend
> on
> >> > metadata.version.
> >> >
> >> > Does this make sense?
> >> >
> >> >
> >> > 2, ApiVersionResponse includes the broker's supported feature flags as
> >> well
> >> > as the cluster-wide finalized feature flags. We probably need to add
> >> > something like the feature flag "epoch" to this response payload in
> order
> >> > to see which broker is most up to date.
> >> >
> >> > If the new feature flag version included RPC changes, we are helped by
> >> the
> >> > fact that a client won't attempt to use the new RPC until it has
> >> discovered
> >> > a broker that supports it via ApiVersions. The problem is more
> difficult
> >> > for cases like you described where the feature flag upgrade changes
> the
> >> > behavior of the broker, but not its RPCs. This is actually the same
> >> problem
> >> > as upgrading the IBP. During a rolling restart, clients may hit
> different
> >> > brokers with different capabilities and not know it.
> >> >
> >> > This probably warrants further investigation, but hopefully you agree
> it
> >> is
> >> > beyond the scope of this KIP :)
> >> >
> >> > -David
> >> >
> >> >
> >> > On Mon, Nov 15, 2021 at 10:26 AM David Arthur <
> [email protected]
> >> >
> >> > wrote:
> >> >
> >> > > Jun, thanks for the comments!
> >> > >
> >> > > 16, When a new cluster is deployed, we don't select the highest
> >> available
> >> > > metadata.version, but rather the quorum leader picks a bootstrap
> >> version
> >> > > defined in meta.properties. As mentioned earlier, we should add
> >> > validation
> >> > > here to ensure a majority of the followers could support this
> version
> >> > > before initializing it. This would avoid a situation where a
> failover
> >> > > results in a new leader who can't support the selected
> >> metadata.version.
> >> > >
> >> > > Thinking a bit more on this, we do need to define a state where
> we're
> >> > > running newer software, but we don't have the feature flag set. This
> >> > could
> >> > > happen if we were running an older IBP that did not support KIP-778.
> >> > > Following on this, it doesn't seem too difficult to consider a case
> >> where
> >> > > the IBP has been upgraded, but we still have not finalized a
> >> > > metadata.version. Here are some possible version combinations
> (assuming
> >> > > KIP-778 is added to Kafka 3.2):
> >> > >
> >> > > Case  Software    IBP    metadata.version    effective version
> >> > > --------------------------------------------------------------
> >> > > A     3.1         3.1    -                   0  software too old for
> >> > > feature flag
> >> > > B     3.2         3.1    -                   0  feature flag
> supported,
> >> > > but IBP too old
> >> > > C     3.2         3.2    -                   0  feature flag
> supported,
> >> > > but not initialized
> >> > > D     3.2         3.2    1                   1  feature flag
> >> initialized
> >> > > to 1 (via operator or bootstrap process)
> >> > > ...
> >> > > E     3.8         3.1    -                   0  ...IBP too old
> >> > > F     3.8         3.2    -                   0  ...not initialized
> >> > > G     3.8         3.2    4                   4
> >> > >
> >> > >
> >> > > Here I'm defining version 0 as "no metadata.version set". So back to
> >> your
> >> > > comment, I think the KIP omits discussion of case C from the above
> >> table
> >> > > which I can amend. Does that cover your concerns, or am I missing
> >> > something
> >> > > else?
> >> > >
> >> > >
> >> > > > it's inconvenient for a user to manually upgrade every feature
> >> version
> >> > >
> >> > > For this, we would probably want to extend the capabilities of
> >> KIP-584. I
> >> > > don't think anything we've discussed for KIP-778 will preclude us
> from
> >> > > adding some kind of auto-upgrade in the future.
> >> > >
> >> > > 21, "disable" sounds good to me. I agree "delete feature-x" sounds a
> >> bit
> >> > > weird.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Nov 8, 2021 at 8:47 PM Guozhang Wang <[email protected]>
> >> wrote:
> >> > >
> >> > >> Hello David,
> >> > >>
> >> > >> Thanks for the very nice writeup! It helped me a lot to refresh my
> >> > memory
> >> > >> on KIP-630/590/584 :)
> >> > >>
> >> > >> I just had two clarification questions after reading through the
> KIP:
> >> > >>
> >> > >> 1. For the initialization procedure, do we guarantee that all the
> >> quorum
> >> > >> nodes (inactive candidates and leaders, a.k.a. controllers) would
> >> always
> >> > >> initialize with the same metadata.version? If yes, how is that
> >> > guaranteed?
> >> > >> More specifically, when a quorum candidate is starting up, would it
> >> > avoid
> >> > >> handling any controller requests (including APIVersionsRequest)
> from
> >> its
> >> > >> peers in the quorum until it completes reading the local log? And
> even
> >> > if
> >> > >> yes, what would happen if there's no FeatureLevelRecord found, and
> >> > >> different nodes read different values from their local
> meta.properties
> >> > >> file
> >> > >> or initializing from their binary's hard-code values?
> >> > >>
> >> > >> 2. This is not only for the metadata.version itself, but for
> general
> >> > >> feature.versions: when a version is upgraded / downgraded, since
> >> brokers
> >> > >> would read the FeatureLevelRecord at their own pace, there will
> always
> >> > be
> >> > >> a
> >> > >> race window when some brokers has processed the record and
> completed
> >> the
> >> > >> upgrade while others have not, hence may behave differently --- I'm
> >> > >> thinking for the future like the specific replica selector to allow
> >> > >> fetching from follower, and even more advanced selectors --- i.e.
> >> should
> >> > >> we
> >> > >> consider letting clients to only talk to brokers with the highest
> >> > metadata
> >> > >> log offsets for example?
> >> > >>
> >> > >>
> >> > >> Guozhang
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Fri, Nov 5, 2021 at 3:18 PM Jun Rao <[email protected]>
> >> > wrote:
> >> > >>
> >> > >> > Hi, David,
> >> > >> >
> >> > >> > Thanks for the reply.
> >> > >> >
> >> > >> > 16. My first concern is that the KIP picks up meta.version
> >> > >> inconsistently
> >> > >> > during the deployment. If a new cluster is started, we pick up
> the
> >> > >> highest
> >> > >> > version. If we upgrade, we leave the feature version unchanged.
> >> > >> > Intuitively, it seems that independent of how a cluster is
> deployed,
> >> > we
> >> > >> > should always pick the same feature version. I think we need to
> >> think
> >> > >> > this through in this KIP. My second concern is that as a
> particular
> >> > >> version
> >> > >> > matures, it's inconvenient for a user to manually upgrade every
> >> > feature
> >> > >> > version. As long as we have a path to achieve that in the
> future, we
> >> > >> don't
> >> > >> > need to address that in this KIP.
> >> > >> >
> >> > >> > 21. "./kafka-features.sh delete": Deleting a feature seems a bit
> >> weird
> >> > >> > since the logic is always there. Would it be better to use
> disable?
> >> > >> >
> >> > >> > Jun
> >> > >> >
> >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur
> >> > >> > <[email protected]> wrote:
> >> > >> >
> >> > >> > > Colin and Jun, thanks for the additional comments!
> >> > >> > >
> >> > >> > > Colin:
> >> > >> > >
> >> > >> > > > We've been talking about having an automated RPC
> compatibility
> >> > >> checker
> >> > >> > >
> >> > >> > > Do we have a way to mark fields in schemas as deprecated? It
> can
> >> > stay
> >> > >> in
> >> > >> > > the RPC, it just complicates the logic a bit.
> >> > >> > >
> >> > >> > > > It would be nice if the active controller could validate
> that a
> >> > >> > majority
> >> > >> > > of the quorum could use the proposed metadata.version. The
> active
> >> > >> > > controller should have this information, right? If we don't
> have
> >> > >> recent
> >> > >> > > information  from a quorum of voters, we wouldn't be active.
> >> > >> > >
> >> > >> > > I believe we should have this information from the
> >> > >> ApiVersionsResponse.
> >> > >> > It
> >> > >> > > would be good to do this validation to avoid a situation where
> a
> >> > >> > > quorum leader can't be elected due to unprocessable records.
> >> > >> > >
> >> > >> > > > Do we need delete as a command separate from downgrade?
> >> > >> > >
> >> > >> > > I think from an operator's perspective, it is nice to
> distinguish
> >> > >> between
> >> > >> > > changing a feature flag and unsetting it. It might be
> surprising
> >> to
> >> > an
> >> > >> > > operator to see the flag's version set to nothing when they
> >> > requested
> >> > >> the
> >> > >> > > downgrade to version 0 (or less).
> >> > >> > >
> >> > >> > > > it seems like we should spell out that metadata.version
> begins
> >> at
> >> > 1
> >> > >> in
> >> > >> > > KRaft clusters
> >> > >> > >
> >> > >> > > I added this text:
> >> > >> > >
> >> > >> > > Introduce an IBP version to indicate the lowest software
> version
> >> > that
> >> > >> > > > supports *metadata.version*. Below this IBP, the
> >> > *metadata.version*
> >> > >> is
> >> > >> > > > undefined and will not be examined. At or above this IBP, the
> >> > >> > > > *metadata.version* must be *0* for ZooKeeper clusters and
> will
> >> be
> >> > >> > > > initialized as *1* for KRaft clusters.
> >> > >> > >
> >> > >> > >
> >> > >> > > > We probably also want an RPC implemented by both brokers and
> >> > >> > controllers
> >> > >> > > that will reveal the min and max supported versions for each
> >> feature
> >> > >> > level
> >> > >> > > supported by the server
> >> > >> > >
> >> > >> > > This is available in ApiVersionsResponse (we include the
> server's
> >> > >> > supported
> >> > >> > > features as well as the cluster's finalized features)
> >> > >> > >
> >> > >> > > --------
> >> > >> > >
> >> > >> > > Jun:
> >> > >> > >
> >> > >> > > 12. I've updated the KIP with AdminClient changes
> >> > >> > >
> >> > >> > > 14. You're right, it looks like I missed a few sections
> regarding
> >> > >> > snapshot
> >> > >> > > generation. I've corrected it
> >> > >> > >
> >> > >> > > 16. This feels more like an enhancement to KIP-584. I agree it
> >> could
> >> > >> be
> >> > >> > > useful, but perhaps we could address it separately from KRaft
> >> > >> upgrades?
> >> > >> > >
> >> > >> > > 20. Indeed snapshots are not strictly necessary during an
> upgrade,
> >> > >> I've
> >> > >> > > reworded this
> >> > >> > >
> >> > >> > >
> >> > >> > > Thanks!
> >> > >> > > David
> >> > >> > >
> >> > >> > >
> >> > >> > > On Thu, Nov 4, 2021 at 6:51 PM Jun Rao
> <[email protected]>
> >> > >> wrote:
> >> > >> > >
> >> > >> > > > Hi, David, Jose and Colin,
> >> > >> > > >
> >> > >> > > > Thanks for the reply. A few more comments.
> >> > >> > > >
> >> > >> > > > 12. It seems that we haven't updated the AdminClient
> >> accordingly?
> >> > >> > > >
> >> > >> > > > 14. "Metadata snapshot is generated and sent to the other
> >> inactive
> >> > >> > > > controllers and to brokers". I thought we wanted each broker
> to
> >> > >> > generate
> >> > >> > > > its own snapshot independently? If only the controller
> generates
> >> > the
> >> > >> > > > snapshot, how do we force other brokers to pick it up?
> >> > >> > > >
> >> > >> > > > 16. If a feature version is new, one may not want to enable
> it
> >> > >> > > immediately
> >> > >> > > > after the cluster is upgraded. However, if a feature version
> has
> >> > >> been
> >> > >> > > > stable, requiring every user to run a command to upgrade to
> that
> >> > >> > version
> >> > >> > > > seems inconvenient. One way to improve this is for each
> feature
> >> to
> >> > >> > define
> >> > >> > > > one version as the default. Then, when we upgrade a cluster,
> we
> >> > will
> >> > >> > > > automatically upgrade the feature to the default version. An
> >> admin
> >> > >> > could
> >> > >> > > > use the tool to upgrade to a version higher than the default.
> >> > >> > > >
> >> > >> > > > 20. "The quorum controller can assist with this process by
> >> > >> generating a
> >> > >> > > > metadata snapshot after a metadata.version increase has been
> >> > >> committed
> >> > >> > to
> >> > >> > > > the metadata log. This snapshot will be a convenient way to
> let
> >> > >> broker
> >> > >> > > and
> >> > >> > > > controller components rebuild their entire in-memory state
> >> > >> following an
> >> > >> > > > upgrade." The new version of the software could read both the
> >> new
> >> > >> and
> >> > >> > the
> >> > >> > > > old version. Is generating a new snapshot during upgrade
> needed?
> >> > >> > > >
> >> > >> > > > Jun
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe <
> [email protected]
> >> >
> >> > >> > wrote:
> >> > >> > > >
> >> > >> > > > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote:
> >> > >> > > > > > Hi, David,
> >> > >> > > > > >
> >> > >> > > > > > One more comment.
> >> > >> > > > > >
> >> > >> > > > > > 16. The main reason why KIP-584 requires finalizing a
> >> feature
> >> > >> > > manually
> >> > >> > > > is
> >> > >> > > > > > that in the ZK world, the controller doesn't know all
> >> brokers
> >> > >> in a
> >> > >> > > > > cluster.
> >> > >> > > > > > A broker temporarily down is not registered in ZK. in the
> >> > KRaft
> >> > >> > > world,
> >> > >> > > > > the
> >> > >> > > > > > controller keeps track of all brokers, including those
> that
> >> > are
> >> > >> > > > > temporarily
> >> > >> > > > > > down. This makes it possible for the controller to
> >> > automatically
> >> > >> > > > > finalize a
> >> > >> > > > > > feature---it's safe to do so when all brokers support
> that
> >> > >> feature.
> >> > >> > > > This
> >> > >> > > > > > will make the upgrade process much simpler since no
> manual
> >> > >> command
> >> > >> > is
> >> > >> > > > > > required to turn on a new feature. Have we considered
> this?
> >> > >> > > > > >
> >> > >> > > > > > Thanks,
> >> > >> > > > > >
> >> > >> > > > > > Jun
> >> > >> > > > >
> >> > >> > > > > Hi Jun,
> >> > >> > > > >
> >> > >> > > > > I guess David commented on this point already, but I'll
> >> comment
> >> > as
> >> > >> > > well.
> >> > >> > > > I
> >> > >> > > > > always had the perception that users viewed rolls as
> >> potentially
> >> > >> > risky
> >> > >> > > > and
> >> > >> > > > > were looking for ways to reduce the risk. Not enabling
> >> features
> >> > >> right
> >> > >> > > > away
> >> > >> > > > > after installing new software seems like one way to do
> that.
> >> If
> >> > we
> >> > >> > had
> >> > >> > > a
> >> > >> > > > > feature to automatically upgrade during a roll, I'm not
> sure
> >> > that
> >> > >> I
> >> > >> > > would
> >> > >> > > > > recommend that people use it, because if something fails,
> it
> >> > >> makes it
> >> > >> > > > > harder to tell if the new feature is at fault, or something
> >> else
> >> > >> in
> >> > >> > the
> >> > >> > > > new
> >> > >> > > > > software.
> >> > >> > > > >
> >> > >> > > > > We already tell users to do a "double roll" when going to a
> >> new
> >> > >> IBP.
> >> > >> > > > (Just
> >> > >> > > > > to give background to people who haven't heard that phrase,
> >> the
> >> > >> first
> >> > >> > > > roll
> >> > >> > > > > installs the new software, and the second roll updates the
> >> IBP).
> >> > >> So
> >> > >> > > this
> >> > >> > > > > KIP-778 mechanism is basically very similar to that, except
> >> the
> >> > >> > second
> >> > >> > > > > thing isn't a roll, but just an upgrade command. So I think
> >> this
> >> > >> is
> >> > >> > > > > consistent with what we currently do.
> >> > >> > > > >
> >> > >> > > > > Also, just like David said, we can always add auto-upgrade
> >> later
> >> > >> if
> >> > >> > > there
> >> > >> > > > > is demand...
> >> > >> > > > >
> >> > >> > > > > best,
> >> > >> > > > > Colin
> >> > >> > > > >
> >> > >> > > > >
> >> > >> > > > > >
> >> > >> > > > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <[email protected]
> >
> >> > >> wrote:
> >> > >> > > > > >
> >> > >> > > > > >> Hi, David,
> >> > >> > > > > >>
> >> > >> > > > > >> Thanks for the KIP. A few comments below.
> >> > >> > > > > >>
> >> > >> > > > > >> 10. It would be useful to describe how the controller
> node
> >> > >> > > determines
> >> > >> > > > > the
> >> > >> > > > > >> RPC version used to communicate to other controller
> nodes.
> >> > >> There
> >> > >> > > seems
> >> > >> > > > > to
> >> > >> > > > > >> be a bootstrap problem. A controller node can't read the
> >> log
> >> > >> and
> >> > >> > > > > >> therefore the feature level until a quorum leader is
> >> elected.
> >> > >> But
> >> > >> > > > leader
> >> > >> > > > > >> election requires an RPC.
> >> > >> > > > > >>
> >> > >> > > > > >> 11. For downgrades, it would be useful to describe how
> to
> >> > >> > determine
> >> > >> > > > the
> >> > >> > > > > >> downgrade process (generating new snapshot, propagating
> the
> >> > >> > > snapshot,
> >> > >> > > > > etc)
> >> > >> > > > > >> has completed. We could block the UpdateFeature request
> >> until
> >> > >> the
> >> > >> > > > > process
> >> > >> > > > > >> is completed. However, since the process could take
> time,
> >> the
> >> > >> > > request
> >> > >> > > > > could
> >> > >> > > > > >> time out. Another way is through DescribeFeature and the
> >> > server
> >> > >> > only
> >> > >> > > > > >> reports downgraded versions after the process is
> completed.
> >> > >> > > > > >>
> >> > >> > > > > >> 12. Since we are changing UpdateFeaturesRequest, do we
> need
> >> > to
> >> > >> > > change
> >> > >> > > > > the
> >> > >> > > > > >> AdminClient api for updateFeatures too?
> >> > >> > > > > >>
> >> > >> > > > > >> 13. For the paragraph starting with "In the absence of
> an
> >> > >> operator
> >> > >> > > > > >> defined value for metadata.version", in KIP-584, we
> >> described
> >> > >> how
> >> > >> > to
> >> > >> > > > > >> finalize features with New cluster bootstrap. In that
> case,
> >> > >> it's
> >> > >> > > > > >> inconvenient for the users to have to run an admin tool
> to
> >> > >> > finalize
> >> > >> > > > the
> >> > >> > > > > >> version for each feature. Instead, the system detects
> that
> >> > the
> >> > >> > > > /features
> >> > >> > > > > >> path is missing in ZK and thus automatically finalizes
> >> every
> >> > >> > feature
> >> > >> > > > > with
> >> > >> > > > > >> the latest supported version. Could we do something
> similar
> >> > in
> >> > >> the
> >> > >> > > > KRaft
> >> > >> > > > > >> mode?
> >> > >> > > > > >>
> >> > >> > > > > >> 14. After the quorum leader generates a new snapshot,
> how
> >> do
> >> > we
> >> > >> > > force
> >> > >> > > > > >> other nodes to pick up the new snapshot?
> >> > >> > > > > >>
> >> > >> > > > > >> 15. I agree with Jose that it will be useful to describe
> >> when
> >> > >> > > > > generating a
> >> > >> > > > > >> new snapshot is needed. To me, it seems the new
> snapshot is
> >> > >> only
> >> > >> > > > needed
> >> > >> > > > > >> when incompatible changes are made.
> >> > >> > > > > >>
> >> > >> > > > > >> 7. Jose, what control records were you referring?
> >> > >> > > > > >>
> >> > >> > > > > >> Thanks,
> >> > >> > > > > >>
> >> > >> > > > > >> Jun
> >> > >> > > > > >>
> >> > >> > > > > >>
> >> > >> > > > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur <
> >> > >> > [email protected]
> >> > >> > > >
> >> > >> > > > > >> wrote:
> >> > >> > > > > >>
> >> > >> > > > > >>> Jose, thanks for the thorough review and comments!
> >> > >> > > > > >>>
> >> > >> > > > > >>> I am out of the office until next week, so I probably
> >> won't
> >> > be
> >> > >> > able
> >> > >> > > > to
> >> > >> > > > > >>> update the KIP until then. Here are some replies to
> your
> >> > >> > questions:
> >> > >> > > > > >>>
> >> > >> > > > > >>> 1. Generate snapshot on upgrade
> >> > >> > > > > >>> > > Metadata snapshot is generated and sent to the
> other
> >> > nodes
> >> > >> > > > > >>> > Why does the Active Controller need to generate a new
> >> > >> snapshot
> >> > >> > > and
> >> > >> > > > > >>> > force a snapshot fetch from the replicas (inactive
> >> > >> controller
> >> > >> > and
> >> > >> > > > > >>> > brokers) on an upgrade? Isn't writing the
> >> > FeatureLevelRecord
> >> > >> > good
> >> > >> > > > > >>> > enough to communicate the upgrade to the replicas?
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> You're right, we don't necessarily need to _transmit_ a
> >> > >> snapshot,
> >> > >> > > > since
> >> > >> > > > > >>> each node can generate its own equivalent snapshot
> >> > >> > > > > >>>
> >> > >> > > > > >>> 2. Generate snapshot on downgrade
> >> > >> > > > > >>> > > Metadata snapshot is generated and sent to the
> other
> >> > >> inactive
> >> > >> > > > > >>> > controllers and to brokers (this snapshot may be
> lossy!)
> >> > >> > > > > >>> > Why do we need to send this downgraded snapshot to
> the
> >> > >> brokers?
> >> > >> > > The
> >> > >> > > > > >>> > replicas have seen the FeatureLevelRecord and noticed
> >> the
> >> > >> > > > downgrade.
> >> > >> > > > > >>> > Can we have the replicas each independently generate
> a
> >> > >> > downgraded
> >> > >> > > > > >>> > snapshot at the offset for the downgraded
> >> > >> FeatureLevelRecord? I
> >> > >> > > > > assume
> >> > >> > > > > >>> > that the active controller will guarantee that all
> >> records
> >> > >> > after
> >> > >> > > > the
> >> > >> > > > > >>> > FatureLevelRecord use the downgraded version. If so,
> it
> >> > >> would
> >> > >> > be
> >> > >> > > > good
> >> > >> > > > > >>> > to mention that explicitly.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> Similar to above, yes a broker that detects a downgrade
> >> via
> >> > >> > > > > >>> FeatureLevelRecord could generate its own downgrade
> >> snapshot
> >> > >> and
> >> > >> > > > reload
> >> > >> > > > > >>> its
> >> > >> > > > > >>> state from that. This does get a little fuzzy when we
> >> > consider
> >> > >> > > cases
> >> > >> > > > > where
> >> > >> > > > > >>> brokers are on different software versions and could be
> >> > >> > generating
> >> > >> > > a
> >> > >> > > > > >>> downgrade snapshot for version X, but using different
> >> > >> versions of
> >> > >> > > the
> >> > >> > > > > >>> code.
> >> > >> > > > > >>> It might be safer to let the controller generate the
> >> > snapshot
> >> > >> so
> >> > >> > > each
> >> > >> > > > > >>> broker (regardless of software version) gets the same
> >> > records.
> >> > >> > > > However,
> >> > >> > > > > >>> for
> >> > >> > > > > >>> upgrades (or downgrades) we expect the whole cluster
> to be
> >> > >> > running
> >> > >> > > > the
> >> > >> > > > > >>> same
> >> > >> > > > > >>> software version before triggering the metadata.version
> >> > >> change,
> >> > >> > so
> >> > >> > > > > perhaps
> >> > >> > > > > >>> this isn't a likely scenario. Thoughts?
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> 3. Max metadata version
> >> > >> > > > > >>> > >For the first release that supports
> metadata.version,
> >> we
> >> > >> can
> >> > >> > > > simply
> >> > >> > > > > >>> > initialize metadata.version with the current (and
> only)
> >> > >> > version.
> >> > >> > > > For
> >> > >> > > > > >>> future
> >> > >> > > > > >>> > releases, we will need a mechanism to bootstrap a
> >> > particular
> >> > >> > > > version.
> >> > >> > > > > >>> This
> >> > >> > > > > >>> > could be done using the meta.properties file or some
> >> > similar
> >> > >> > > > > mechanism.
> >> > >> > > > > >>> The
> >> > >> > > > > >>> > reason we need the allow for a specific initial
> version
> >> is
> >> > >> to
> >> > >> > > > support
> >> > >> > > > > >>> the
> >> > >> > > > > >>> > use case of starting a Kafka cluster at version X
> with
> >> an
> >> > >> older
> >> > >> > > > > >>> > metadata.version.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> I assume that the Active Controller will learn the
> >> metadata
> >> > >> > version
> >> > >> > > > of
> >> > >> > > > > >>> > the broker through the BrokerRegistrationRequest. How
> >> will
> >> > >> the
> >> > >> > > > Active
> >> > >> > > > > >>> > Controller learn about the max metadata version of
> the
> >> > >> inactive
> >> > >> > > > > >>> > controller nodes? We currently don't send a
> registration
> >> > >> > request
> >> > >> > > > from
> >> > >> > > > > >>> > the inactive controller to the active controller.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> This came up during the design, but I neglected to add
> it
> >> to
> >> > >> the
> >> > >> > > KIP.
> >> > >> > > > > We
> >> > >> > > > > >>> will need a mechanism for determining the supported
> >> features
> >> > >> of
> >> > >> > > each
> >> > >> > > > > >>> controller similar to how brokers use
> >> > >> BrokerRegistrationRequest.
> >> > >> > > > > Perhaps
> >> > >> > > > > >>> controllers could write a FeatureLevelRecord (or
> similar)
> >> to
> >> > >> the
> >> > >> > > > > metadata
> >> > >> > > > > >>> log indicating their supported version. WDYT?
> >> > >> > > > > >>>
> >> > >> > > > > >>> Why do you need to bootstrap a particular version?
> Isn't
> >> the
> >> > >> > intent
> >> > >> > > > > >>> > that the broker will learn the active metadata
> version
> >> by
> >> > >> > reading
> >> > >> > > > the
> >> > >> > > > > >>> > metadata before unfencing?
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> This bootstrapping is needed for when a KRaft cluster
> is
> >> > first
> >> > >> > > > > started. If
> >> > >> > > > > >>> we don't have this mechanism, the cluster can't really
> do
> >> > >> > anything
> >> > >> > > > > until
> >> > >> > > > > >>> the operator finalizes the metadata.version with the
> tool.
> >> > The
> >> > >> > > > > >>> bootstrapping will be done by the controller and the
> >> brokers
> >> > >> will
> >> > >> > > see
> >> > >> > > > > this
> >> > >> > > > > >>> version as a record (like you say). I'll add some text
> to
> >> > >> clarify
> >> > >> > > > this.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> 4. Reject Registration - This is related to the bullet
> >> point
> >> > >> > above.
> >> > >> > > > > >>> > What will be the behavior of the active controller if
> >> the
> >> > >> > broker
> >> > >> > > > > sends
> >> > >> > > > > >>> > a metadata version that is not compatible with the
> >> cluster
> >> > >> wide
> >> > >> > > > > >>> > metadata version?
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> If a broker starts up with a lower supported version
> range
> >> > >> than
> >> > >> > the
> >> > >> > > > > >>> current
> >> > >> > > > > >>> cluster metadata.version, it should log an error and
> >> > shutdown.
> >> > >> > This
> >> > >> > > > is
> >> > >> > > > > in
> >> > >> > > > > >>> line with KIP-584.
> >> > >> > > > > >>>
> >> > >> > > > > >>> 5. Discover upgrade
> >> > >> > > > > >>> > > This snapshot will be a convenient way to let
> broker
> >> and
> >> > >> > > > controller
> >> > >> > > > > >>> > components rebuild their entire in-memory state
> >> following
> >> > an
> >> > >> > > > upgrade.
> >> > >> > > > > >>> > Can we rely on the presence of the FeatureLevelRecord
> >> for
> >> > >> the
> >> > >> > > > > metadata
> >> > >> > > > > >>> > version for this functionality? If so, it avoids
> having
> >> to
> >> > >> > reload
> >> > >> > > > the
> >> > >> > > > > >>> > snapshot.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> For upgrades, yes probably since we won't need to
> >> "rewrite"
> >> > >> any
> >> > >> > > > > records in
> >> > >> > > > > >>> this case. For downgrades, we will need to generate the
> >> > >> snapshot
> >> > >> > > and
> >> > >> > > > > >>> reload
> >> > >> > > > > >>> everything.
> >> > >> > > > > >>>
> >> > >> > > > > >>> 6. Metadata version specification
> >> > >> > > > > >>> > >  V4(version=4, isBackwardsCompatible=false,
> >> > >> description="New
> >> > >> > > > > metadata
> >> > >> > > > > >>> > record type Bar"),
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> Very cool. Do you have plans to generate Apache Kafka
> HTML
> >> > >> > > > > >>> > documentation for this information? Would be helpful
> to
> >> > >> display
> >> > >> > > > this
> >> > >> > > > > >>> > information to the user using the kafka-features.sh
> and
> >> > >> feature
> >> > >> > > > RPC?
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> Hm good idea :) I'll add a brief section on
> documentation.
> >> > >> This
> >> > >> > > would
> >> > >> > > > > >>> certainly be very useful
> >> > >> > > > > >>>
> >> > >> > > > > >>> 7.Downgrade records
> >> > >> > > > > >>> > I think we should explicitly mention that the
> downgrade
> >> > >> process
> >> > >> > > > will
> >> > >> > > > > >>> > downgrade both metadata records and controller
> records.
> >> In
> >> > >> > > KIP-630
> >> > >> > > > we
> >> > >> > > > > >>> > introduced two control records for snapshots.
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> Yes, good call. Let me re-read that KIP and include
> some
> >> > >> details.
> >> > >> > > > > >>>
> >> > >> > > > > >>> Thanks again for the comments!
> >> > >> > > > > >>> -David
> >> > >> > > > > >>>
> >> > >> > > > > >>>
> >> > >> > > > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García
> Sancio
> >> > >> > > > > >>> <[email protected]> wrote:
> >> > >> > > > > >>>
> >> > >> > > > > >>> > One more comment.
> >> > >> > > > > >>> >
> >> > >> > > > > >>> > 7.Downgrade records
> >> > >> > > > > >>> > I think we should explicitly mention that the
> downgrade
> >> > >> process
> >> > >> > > > will
> >> > >> > > > > >>> > downgrade both metadata records and controller
> records.
> >> In
> >> > >> > > KIP-630
> >> > >> > > > we
> >> > >> > > > > >>> > introduced two control records for snapshots.
> >> > >> > > > > >>> >
> >> > >> > > > > >>>
> >> > >> > > > > >>
> >> > >> > > > >
> >> > >> > > >
> >> > >> > >
> >> > >> > >
> >> > >> > > --
> >> > >> > > -David
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >>
> >> > >> --
> >> > >> -- Guozhang
> >> > >>
> >> > >
> >> > >
> >> > > --
> >> > > -David
> >> > >
> >> >
> >> >
> >> > --
> >> > -David
> >> >
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
> > --
> > -David
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-778 KRaft Upgrades

Reply via email to