Hi Colin, If we allow downgrades which would be appended in metadata.version, then the length of the __cluster_medata log may not be safe to indicate higher versions, since older version records could still be appended later than a new version record right?
On Tue, Nov 16, 2021 at 1:16 PM Colin McCabe <cmcc...@apache.org> wrote: > On Tue, Nov 16, 2021, at 06:36, David Arthur wrote: > > An interesting case here is how to handle a version update if a majority > of > > the quorum supports it, but the leader doesn't. For example, if C1 was > the > > leader and an upgrade to version 4 was requested. Maybe this would > trigger > > C1 to resign and inform the client to retry the update later. > > > > Hmm, wouldn't we want to just reject the version update in this case? I > don't see what the advantage of allowing it would be. > > The reason for requiring a majority rather than all voters is mainly to > cover the case where a voter is down, I thought. That clearly doesn't apply > if the un-upgraded voter is the leader itself... > > > > > We may eventually want to consider the metadata.version when electing a > > leader, but as long as we have the majority requirement before > committing a > > new metadata.version, I think we should be safe. > > > > Yeah, this is safe. If we elect a leader at metadata.version X then that > means that a majority of the cluster is at least at version X. Proof by > contradiction: assume that this is not the case. Then the newly elected > leader must have a shorter __cluster_metadata log than a majority of the > voters. But this is incompatible with winning a Raft election. > > In the case where the leader is "behind" some of the other voters, those > voters will truncate their logs to match the new leader. This will > downgrade them. Basically this is the case where the feature upgrade was > proposed, but never fully completed. > > best, > Colin > > > > -David > > > > On Mon, Nov 15, 2021 at 12:52 PM Guozhang Wang <wangg...@gmail.com> > wrote: > > > >> Thanks David, > >> > >> 1. Got it. One thing I'm still not very clear is why it's sufficient to > >> select a metadata.version which is supported by majority of the quorum, > but > >> not the whole quorum (i.e. choosing the lowest version among all the > quorum > >> members)? Since the leader election today does not take this value into > >> consideration, we are not guaranteed that newly selected leaders would > >> always be able to recognize and support the initialized metadata.version > >> right? > >> > >> 2. Yeah I think I agree the behavior-but-not-RPC-change scenario is > beyond > >> the scope of this KIP, we can defer it to later discussions. > >> > >> On Mon, Nov 15, 2021 at 8:13 AM David Arthur > >> <david.art...@confluent.io.invalid> wrote: > >> > >> > Guozhang, thanks for the review! > >> > > >> > 1, As we've defined it so far, the initial metadata.version is set by > an > >> > operator via the "kafka-storage.sh" tool. It would be possible for > >> > different values to be selected, but only the quorum leader would > commit > >> a > >> > FeatureLevelRecord with the version they read locally. See the above > >> reply > >> > to Jun's question for a little more detail. > >> > > >> > We need to enable the KRaft RPCs regardless of metadata.version (vote, > >> > heartbeat, fetch, etc) so that the quorum can be formed, a leader can > be > >> > elected, and followers can learn about the selected metadata.version. > I > >> > believe the quorum startup procedure would go something like: > >> > > >> > * Controller nodes start, read their logs, begin leader election > >> > * While the elected node is becoming leader (in > >> > QuorumMetaLogListener#handleLeaderChange), initialize > metadata.version if > >> > necessary > >> > * Followers replicate the FeatureLevelRecord > >> > > >> > This (probably) means the quorum peers must continue to rely on > >> ApiVersions > >> > to negotiate compatible RPC versions and these versions cannot depend > on > >> > metadata.version. > >> > > >> > Does this make sense? > >> > > >> > > >> > 2, ApiVersionResponse includes the broker's supported feature flags as > >> well > >> > as the cluster-wide finalized feature flags. We probably need to add > >> > something like the feature flag "epoch" to this response payload in > order > >> > to see which broker is most up to date. > >> > > >> > If the new feature flag version included RPC changes, we are helped by > >> the > >> > fact that a client won't attempt to use the new RPC until it has > >> discovered > >> > a broker that supports it via ApiVersions. The problem is more > difficult > >> > for cases like you described where the feature flag upgrade changes > the > >> > behavior of the broker, but not its RPCs. This is actually the same > >> problem > >> > as upgrading the IBP. During a rolling restart, clients may hit > different > >> > brokers with different capabilities and not know it. > >> > > >> > This probably warrants further investigation, but hopefully you agree > it > >> is > >> > beyond the scope of this KIP :) > >> > > >> > -David > >> > > >> > > >> > On Mon, Nov 15, 2021 at 10:26 AM David Arthur < > david.art...@confluent.io > >> > > >> > wrote: > >> > > >> > > Jun, thanks for the comments! > >> > > > >> > > 16, When a new cluster is deployed, we don't select the highest > >> available > >> > > metadata.version, but rather the quorum leader picks a bootstrap > >> version > >> > > defined in meta.properties. As mentioned earlier, we should add > >> > validation > >> > > here to ensure a majority of the followers could support this > version > >> > > before initializing it. This would avoid a situation where a > failover > >> > > results in a new leader who can't support the selected > >> metadata.version. > >> > > > >> > > Thinking a bit more on this, we do need to define a state where > we're > >> > > running newer software, but we don't have the feature flag set. This > >> > could > >> > > happen if we were running an older IBP that did not support KIP-778. > >> > > Following on this, it doesn't seem too difficult to consider a case > >> where > >> > > the IBP has been upgraded, but we still have not finalized a > >> > > metadata.version. Here are some possible version combinations > (assuming > >> > > KIP-778 is added to Kafka 3.2): > >> > > > >> > > Case Software IBP metadata.version effective version > >> > > -------------------------------------------------------------- > >> > > A 3.1 3.1 - 0 software too old for > >> > > feature flag > >> > > B 3.2 3.1 - 0 feature flag > supported, > >> > > but IBP too old > >> > > C 3.2 3.2 - 0 feature flag > supported, > >> > > but not initialized > >> > > D 3.2 3.2 1 1 feature flag > >> initialized > >> > > to 1 (via operator or bootstrap process) > >> > > ... > >> > > E 3.8 3.1 - 0 ...IBP too old > >> > > F 3.8 3.2 - 0 ...not initialized > >> > > G 3.8 3.2 4 4 > >> > > > >> > > > >> > > Here I'm defining version 0 as "no metadata.version set". So back to > >> your > >> > > comment, I think the KIP omits discussion of case C from the above > >> table > >> > > which I can amend. Does that cover your concerns, or am I missing > >> > something > >> > > else? > >> > > > >> > > > >> > > > it's inconvenient for a user to manually upgrade every feature > >> version > >> > > > >> > > For this, we would probably want to extend the capabilities of > >> KIP-584. I > >> > > don't think anything we've discussed for KIP-778 will preclude us > from > >> > > adding some kind of auto-upgrade in the future. > >> > > > >> > > 21, "disable" sounds good to me. I agree "delete feature-x" sounds a > >> bit > >> > > weird. > >> > > > >> > > > >> > > > >> > > On Mon, Nov 8, 2021 at 8:47 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > > >> > >> Hello David, > >> > >> > >> > >> Thanks for the very nice writeup! It helped me a lot to refresh my > >> > memory > >> > >> on KIP-630/590/584 :) > >> > >> > >> > >> I just had two clarification questions after reading through the > KIP: > >> > >> > >> > >> 1. For the initialization procedure, do we guarantee that all the > >> quorum > >> > >> nodes (inactive candidates and leaders, a.k.a. controllers) would > >> always > >> > >> initialize with the same metadata.version? If yes, how is that > >> > guaranteed? > >> > >> More specifically, when a quorum candidate is starting up, would it > >> > avoid > >> > >> handling any controller requests (including APIVersionsRequest) > from > >> its > >> > >> peers in the quorum until it completes reading the local log? And > even > >> > if > >> > >> yes, what would happen if there's no FeatureLevelRecord found, and > >> > >> different nodes read different values from their local > meta.properties > >> > >> file > >> > >> or initializing from their binary's hard-code values? > >> > >> > >> > >> 2. This is not only for the metadata.version itself, but for > general > >> > >> feature.versions: when a version is upgraded / downgraded, since > >> brokers > >> > >> would read the FeatureLevelRecord at their own pace, there will > always > >> > be > >> > >> a > >> > >> race window when some brokers has processed the record and > completed > >> the > >> > >> upgrade while others have not, hence may behave differently --- I'm > >> > >> thinking for the future like the specific replica selector to allow > >> > >> fetching from follower, and even more advanced selectors --- i.e. > >> should > >> > >> we > >> > >> consider letting clients to only talk to brokers with the highest > >> > metadata > >> > >> log offsets for example? > >> > >> > >> > >> > >> > >> Guozhang > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Fri, Nov 5, 2021 at 3:18 PM Jun Rao <j...@confluent.io.invalid> > >> > wrote: > >> > >> > >> > >> > Hi, David, > >> > >> > > >> > >> > Thanks for the reply. > >> > >> > > >> > >> > 16. My first concern is that the KIP picks up meta.version > >> > >> inconsistently > >> > >> > during the deployment. If a new cluster is started, we pick up > the > >> > >> highest > >> > >> > version. If we upgrade, we leave the feature version unchanged. > >> > >> > Intuitively, it seems that independent of how a cluster is > deployed, > >> > we > >> > >> > should always pick the same feature version. I think we need to > >> think > >> > >> > this through in this KIP. My second concern is that as a > particular > >> > >> version > >> > >> > matures, it's inconvenient for a user to manually upgrade every > >> > feature > >> > >> > version. As long as we have a path to achieve that in the > future, we > >> > >> don't > >> > >> > need to address that in this KIP. > >> > >> > > >> > >> > 21. "./kafka-features.sh delete": Deleting a feature seems a bit > >> weird > >> > >> > since the logic is always there. Would it be better to use > disable? > >> > >> > > >> > >> > Jun > >> > >> > > >> > >> > On Fri, Nov 5, 2021 at 8:11 AM David Arthur > >> > >> > <david.art...@confluent.io.invalid> wrote: > >> > >> > > >> > >> > > Colin and Jun, thanks for the additional comments! > >> > >> > > > >> > >> > > Colin: > >> > >> > > > >> > >> > > > We've been talking about having an automated RPC > compatibility > >> > >> checker > >> > >> > > > >> > >> > > Do we have a way to mark fields in schemas as deprecated? It > can > >> > stay > >> > >> in > >> > >> > > the RPC, it just complicates the logic a bit. > >> > >> > > > >> > >> > > > It would be nice if the active controller could validate > that a > >> > >> > majority > >> > >> > > of the quorum could use the proposed metadata.version. The > active > >> > >> > > controller should have this information, right? If we don't > have > >> > >> recent > >> > >> > > information from a quorum of voters, we wouldn't be active. > >> > >> > > > >> > >> > > I believe we should have this information from the > >> > >> ApiVersionsResponse. > >> > >> > It > >> > >> > > would be good to do this validation to avoid a situation where > a > >> > >> > > quorum leader can't be elected due to unprocessable records. > >> > >> > > > >> > >> > > > Do we need delete as a command separate from downgrade? > >> > >> > > > >> > >> > > I think from an operator's perspective, it is nice to > distinguish > >> > >> between > >> > >> > > changing a feature flag and unsetting it. It might be > surprising > >> to > >> > an > >> > >> > > operator to see the flag's version set to nothing when they > >> > requested > >> > >> the > >> > >> > > downgrade to version 0 (or less). > >> > >> > > > >> > >> > > > it seems like we should spell out that metadata.version > begins > >> at > >> > 1 > >> > >> in > >> > >> > > KRaft clusters > >> > >> > > > >> > >> > > I added this text: > >> > >> > > > >> > >> > > Introduce an IBP version to indicate the lowest software > version > >> > that > >> > >> > > > supports *metadata.version*. Below this IBP, the > >> > *metadata.version* > >> > >> is > >> > >> > > > undefined and will not be examined. At or above this IBP, the > >> > >> > > > *metadata.version* must be *0* for ZooKeeper clusters and > will > >> be > >> > >> > > > initialized as *1* for KRaft clusters. > >> > >> > > > >> > >> > > > >> > >> > > > We probably also want an RPC implemented by both brokers and > >> > >> > controllers > >> > >> > > that will reveal the min and max supported versions for each > >> feature > >> > >> > level > >> > >> > > supported by the server > >> > >> > > > >> > >> > > This is available in ApiVersionsResponse (we include the > server's > >> > >> > supported > >> > >> > > features as well as the cluster's finalized features) > >> > >> > > > >> > >> > > -------- > >> > >> > > > >> > >> > > Jun: > >> > >> > > > >> > >> > > 12. I've updated the KIP with AdminClient changes > >> > >> > > > >> > >> > > 14. You're right, it looks like I missed a few sections > regarding > >> > >> > snapshot > >> > >> > > generation. I've corrected it > >> > >> > > > >> > >> > > 16. This feels more like an enhancement to KIP-584. I agree it > >> could > >> > >> be > >> > >> > > useful, but perhaps we could address it separately from KRaft > >> > >> upgrades? > >> > >> > > > >> > >> > > 20. Indeed snapshots are not strictly necessary during an > upgrade, > >> > >> I've > >> > >> > > reworded this > >> > >> > > > >> > >> > > > >> > >> > > Thanks! > >> > >> > > David > >> > >> > > > >> > >> > > > >> > >> > > On Thu, Nov 4, 2021 at 6:51 PM Jun Rao > <j...@confluent.io.invalid> > >> > >> wrote: > >> > >> > > > >> > >> > > > Hi, David, Jose and Colin, > >> > >> > > > > >> > >> > > > Thanks for the reply. A few more comments. > >> > >> > > > > >> > >> > > > 12. It seems that we haven't updated the AdminClient > >> accordingly? > >> > >> > > > > >> > >> > > > 14. "Metadata snapshot is generated and sent to the other > >> inactive > >> > >> > > > controllers and to brokers". I thought we wanted each broker > to > >> > >> > generate > >> > >> > > > its own snapshot independently? If only the controller > generates > >> > the > >> > >> > > > snapshot, how do we force other brokers to pick it up? > >> > >> > > > > >> > >> > > > 16. If a feature version is new, one may not want to enable > it > >> > >> > > immediately > >> > >> > > > after the cluster is upgraded. However, if a feature version > has > >> > >> been > >> > >> > > > stable, requiring every user to run a command to upgrade to > that > >> > >> > version > >> > >> > > > seems inconvenient. One way to improve this is for each > feature > >> to > >> > >> > define > >> > >> > > > one version as the default. Then, when we upgrade a cluster, > we > >> > will > >> > >> > > > automatically upgrade the feature to the default version. An > >> admin > >> > >> > could > >> > >> > > > use the tool to upgrade to a version higher than the default. > >> > >> > > > > >> > >> > > > 20. "The quorum controller can assist with this process by > >> > >> generating a > >> > >> > > > metadata snapshot after a metadata.version increase has been > >> > >> committed > >> > >> > to > >> > >> > > > the metadata log. This snapshot will be a convenient way to > let > >> > >> broker > >> > >> > > and > >> > >> > > > controller components rebuild their entire in-memory state > >> > >> following an > >> > >> > > > upgrade." The new version of the software could read both the > >> new > >> > >> and > >> > >> > the > >> > >> > > > old version. Is generating a new snapshot during upgrade > needed? > >> > >> > > > > >> > >> > > > Jun > >> > >> > > > > >> > >> > > > > >> > >> > > > On Wed, Nov 3, 2021 at 5:42 PM Colin McCabe < > cmcc...@apache.org > >> > > >> > >> > wrote: > >> > >> > > > > >> > >> > > > > On Tue, Oct 12, 2021, at 10:34, Jun Rao wrote: > >> > >> > > > > > Hi, David, > >> > >> > > > > > > >> > >> > > > > > One more comment. > >> > >> > > > > > > >> > >> > > > > > 16. The main reason why KIP-584 requires finalizing a > >> feature > >> > >> > > manually > >> > >> > > > is > >> > >> > > > > > that in the ZK world, the controller doesn't know all > >> brokers > >> > >> in a > >> > >> > > > > cluster. > >> > >> > > > > > A broker temporarily down is not registered in ZK. in the > >> > KRaft > >> > >> > > world, > >> > >> > > > > the > >> > >> > > > > > controller keeps track of all brokers, including those > that > >> > are > >> > >> > > > > temporarily > >> > >> > > > > > down. This makes it possible for the controller to > >> > automatically > >> > >> > > > > finalize a > >> > >> > > > > > feature---it's safe to do so when all brokers support > that > >> > >> feature. > >> > >> > > > This > >> > >> > > > > > will make the upgrade process much simpler since no > manual > >> > >> command > >> > >> > is > >> > >> > > > > > required to turn on a new feature. Have we considered > this? > >> > >> > > > > > > >> > >> > > > > > Thanks, > >> > >> > > > > > > >> > >> > > > > > Jun > >> > >> > > > > > >> > >> > > > > Hi Jun, > >> > >> > > > > > >> > >> > > > > I guess David commented on this point already, but I'll > >> comment > >> > as > >> > >> > > well. > >> > >> > > > I > >> > >> > > > > always had the perception that users viewed rolls as > >> potentially > >> > >> > risky > >> > >> > > > and > >> > >> > > > > were looking for ways to reduce the risk. Not enabling > >> features > >> > >> right > >> > >> > > > away > >> > >> > > > > after installing new software seems like one way to do > that. > >> If > >> > we > >> > >> > had > >> > >> > > a > >> > >> > > > > feature to automatically upgrade during a roll, I'm not > sure > >> > that > >> > >> I > >> > >> > > would > >> > >> > > > > recommend that people use it, because if something fails, > it > >> > >> makes it > >> > >> > > > > harder to tell if the new feature is at fault, or something > >> else > >> > >> in > >> > >> > the > >> > >> > > > new > >> > >> > > > > software. > >> > >> > > > > > >> > >> > > > > We already tell users to do a "double roll" when going to a > >> new > >> > >> IBP. > >> > >> > > > (Just > >> > >> > > > > to give background to people who haven't heard that phrase, > >> the > >> > >> first > >> > >> > > > roll > >> > >> > > > > installs the new software, and the second roll updates the > >> IBP). > >> > >> So > >> > >> > > this > >> > >> > > > > KIP-778 mechanism is basically very similar to that, except > >> the > >> > >> > second > >> > >> > > > > thing isn't a roll, but just an upgrade command. So I think > >> this > >> > >> is > >> > >> > > > > consistent with what we currently do. > >> > >> > > > > > >> > >> > > > > Also, just like David said, we can always add auto-upgrade > >> later > >> > >> if > >> > >> > > there > >> > >> > > > > is demand... > >> > >> > > > > > >> > >> > > > > best, > >> > >> > > > > Colin > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > > >> > >> > > > > > On Thu, Oct 7, 2021 at 5:19 PM Jun Rao <j...@confluent.io > > > >> > >> wrote: > >> > >> > > > > > > >> > >> > > > > >> Hi, David, > >> > >> > > > > >> > >> > >> > > > > >> Thanks for the KIP. A few comments below. > >> > >> > > > > >> > >> > >> > > > > >> 10. It would be useful to describe how the controller > node > >> > >> > > determines > >> > >> > > > > the > >> > >> > > > > >> RPC version used to communicate to other controller > nodes. > >> > >> There > >> > >> > > seems > >> > >> > > > > to > >> > >> > > > > >> be a bootstrap problem. A controller node can't read the > >> log > >> > >> and > >> > >> > > > > >> therefore the feature level until a quorum leader is > >> elected. > >> > >> But > >> > >> > > > leader > >> > >> > > > > >> election requires an RPC. > >> > >> > > > > >> > >> > >> > > > > >> 11. For downgrades, it would be useful to describe how > to > >> > >> > determine > >> > >> > > > the > >> > >> > > > > >> downgrade process (generating new snapshot, propagating > the > >> > >> > > snapshot, > >> > >> > > > > etc) > >> > >> > > > > >> has completed. We could block the UpdateFeature request > >> until > >> > >> the > >> > >> > > > > process > >> > >> > > > > >> is completed. However, since the process could take > time, > >> the > >> > >> > > request > >> > >> > > > > could > >> > >> > > > > >> time out. Another way is through DescribeFeature and the > >> > server > >> > >> > only > >> > >> > > > > >> reports downgraded versions after the process is > completed. > >> > >> > > > > >> > >> > >> > > > > >> 12. Since we are changing UpdateFeaturesRequest, do we > need > >> > to > >> > >> > > change > >> > >> > > > > the > >> > >> > > > > >> AdminClient api for updateFeatures too? > >> > >> > > > > >> > >> > >> > > > > >> 13. For the paragraph starting with "In the absence of > an > >> > >> operator > >> > >> > > > > >> defined value for metadata.version", in KIP-584, we > >> described > >> > >> how > >> > >> > to > >> > >> > > > > >> finalize features with New cluster bootstrap. In that > case, > >> > >> it's > >> > >> > > > > >> inconvenient for the users to have to run an admin tool > to > >> > >> > finalize > >> > >> > > > the > >> > >> > > > > >> version for each feature. Instead, the system detects > that > >> > the > >> > >> > > > /features > >> > >> > > > > >> path is missing in ZK and thus automatically finalizes > >> every > >> > >> > feature > >> > >> > > > > with > >> > >> > > > > >> the latest supported version. Could we do something > similar > >> > in > >> > >> the > >> > >> > > > KRaft > >> > >> > > > > >> mode? > >> > >> > > > > >> > >> > >> > > > > >> 14. After the quorum leader generates a new snapshot, > how > >> do > >> > we > >> > >> > > force > >> > >> > > > > >> other nodes to pick up the new snapshot? > >> > >> > > > > >> > >> > >> > > > > >> 15. I agree with Jose that it will be useful to describe > >> when > >> > >> > > > > generating a > >> > >> > > > > >> new snapshot is needed. To me, it seems the new > snapshot is > >> > >> only > >> > >> > > > needed > >> > >> > > > > >> when incompatible changes are made. > >> > >> > > > > >> > >> > >> > > > > >> 7. Jose, what control records were you referring? > >> > >> > > > > >> > >> > >> > > > > >> Thanks, > >> > >> > > > > >> > >> > >> > > > > >> Jun > >> > >> > > > > >> > >> > >> > > > > >> > >> > >> > > > > >> On Tue, Oct 5, 2021 at 8:53 AM David Arthur < > >> > >> > davidart...@apache.org > >> > >> > > > > >> > >> > > > > >> wrote: > >> > >> > > > > >> > >> > >> > > > > >>> Jose, thanks for the thorough review and comments! > >> > >> > > > > >>> > >> > >> > > > > >>> I am out of the office until next week, so I probably > >> won't > >> > be > >> > >> > able > >> > >> > > > to > >> > >> > > > > >>> update the KIP until then. Here are some replies to > your > >> > >> > questions: > >> > >> > > > > >>> > >> > >> > > > > >>> 1. Generate snapshot on upgrade > >> > >> > > > > >>> > > Metadata snapshot is generated and sent to the > other > >> > nodes > >> > >> > > > > >>> > Why does the Active Controller need to generate a new > >> > >> snapshot > >> > >> > > and > >> > >> > > > > >>> > force a snapshot fetch from the replicas (inactive > >> > >> controller > >> > >> > and > >> > >> > > > > >>> > brokers) on an upgrade? Isn't writing the > >> > FeatureLevelRecord > >> > >> > good > >> > >> > > > > >>> > enough to communicate the upgrade to the replicas? > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> You're right, we don't necessarily need to _transmit_ a > >> > >> snapshot, > >> > >> > > > since > >> > >> > > > > >>> each node can generate its own equivalent snapshot > >> > >> > > > > >>> > >> > >> > > > > >>> 2. Generate snapshot on downgrade > >> > >> > > > > >>> > > Metadata snapshot is generated and sent to the > other > >> > >> inactive > >> > >> > > > > >>> > controllers and to brokers (this snapshot may be > lossy!) > >> > >> > > > > >>> > Why do we need to send this downgraded snapshot to > the > >> > >> brokers? > >> > >> > > The > >> > >> > > > > >>> > replicas have seen the FeatureLevelRecord and noticed > >> the > >> > >> > > > downgrade. > >> > >> > > > > >>> > Can we have the replicas each independently generate > a > >> > >> > downgraded > >> > >> > > > > >>> > snapshot at the offset for the downgraded > >> > >> FeatureLevelRecord? I > >> > >> > > > > assume > >> > >> > > > > >>> > that the active controller will guarantee that all > >> records > >> > >> > after > >> > >> > > > the > >> > >> > > > > >>> > FatureLevelRecord use the downgraded version. If so, > it > >> > >> would > >> > >> > be > >> > >> > > > good > >> > >> > > > > >>> > to mention that explicitly. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> Similar to above, yes a broker that detects a downgrade > >> via > >> > >> > > > > >>> FeatureLevelRecord could generate its own downgrade > >> snapshot > >> > >> and > >> > >> > > > reload > >> > >> > > > > >>> its > >> > >> > > > > >>> state from that. This does get a little fuzzy when we > >> > consider > >> > >> > > cases > >> > >> > > > > where > >> > >> > > > > >>> brokers are on different software versions and could be > >> > >> > generating > >> > >> > > a > >> > >> > > > > >>> downgrade snapshot for version X, but using different > >> > >> versions of > >> > >> > > the > >> > >> > > > > >>> code. > >> > >> > > > > >>> It might be safer to let the controller generate the > >> > snapshot > >> > >> so > >> > >> > > each > >> > >> > > > > >>> broker (regardless of software version) gets the same > >> > records. > >> > >> > > > However, > >> > >> > > > > >>> for > >> > >> > > > > >>> upgrades (or downgrades) we expect the whole cluster > to be > >> > >> > running > >> > >> > > > the > >> > >> > > > > >>> same > >> > >> > > > > >>> software version before triggering the metadata.version > >> > >> change, > >> > >> > so > >> > >> > > > > perhaps > >> > >> > > > > >>> this isn't a likely scenario. Thoughts? > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> 3. Max metadata version > >> > >> > > > > >>> > >For the first release that supports > metadata.version, > >> we > >> > >> can > >> > >> > > > simply > >> > >> > > > > >>> > initialize metadata.version with the current (and > only) > >> > >> > version. > >> > >> > > > For > >> > >> > > > > >>> future > >> > >> > > > > >>> > releases, we will need a mechanism to bootstrap a > >> > particular > >> > >> > > > version. > >> > >> > > > > >>> This > >> > >> > > > > >>> > could be done using the meta.properties file or some > >> > similar > >> > >> > > > > mechanism. > >> > >> > > > > >>> The > >> > >> > > > > >>> > reason we need the allow for a specific initial > version > >> is > >> > >> to > >> > >> > > > support > >> > >> > > > > >>> the > >> > >> > > > > >>> > use case of starting a Kafka cluster at version X > with > >> an > >> > >> older > >> > >> > > > > >>> > metadata.version. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> I assume that the Active Controller will learn the > >> metadata > >> > >> > version > >> > >> > > > of > >> > >> > > > > >>> > the broker through the BrokerRegistrationRequest. How > >> will > >> > >> the > >> > >> > > > Active > >> > >> > > > > >>> > Controller learn about the max metadata version of > the > >> > >> inactive > >> > >> > > > > >>> > controller nodes? We currently don't send a > registration > >> > >> > request > >> > >> > > > from > >> > >> > > > > >>> > the inactive controller to the active controller. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> This came up during the design, but I neglected to add > it > >> to > >> > >> the > >> > >> > > KIP. > >> > >> > > > > We > >> > >> > > > > >>> will need a mechanism for determining the supported > >> features > >> > >> of > >> > >> > > each > >> > >> > > > > >>> controller similar to how brokers use > >> > >> BrokerRegistrationRequest. > >> > >> > > > > Perhaps > >> > >> > > > > >>> controllers could write a FeatureLevelRecord (or > similar) > >> to > >> > >> the > >> > >> > > > > metadata > >> > >> > > > > >>> log indicating their supported version. WDYT? > >> > >> > > > > >>> > >> > >> > > > > >>> Why do you need to bootstrap a particular version? > Isn't > >> the > >> > >> > intent > >> > >> > > > > >>> > that the broker will learn the active metadata > version > >> by > >> > >> > reading > >> > >> > > > the > >> > >> > > > > >>> > metadata before unfencing? > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> This bootstrapping is needed for when a KRaft cluster > is > >> > first > >> > >> > > > > started. If > >> > >> > > > > >>> we don't have this mechanism, the cluster can't really > do > >> > >> > anything > >> > >> > > > > until > >> > >> > > > > >>> the operator finalizes the metadata.version with the > tool. > >> > The > >> > >> > > > > >>> bootstrapping will be done by the controller and the > >> brokers > >> > >> will > >> > >> > > see > >> > >> > > > > this > >> > >> > > > > >>> version as a record (like you say). I'll add some text > to > >> > >> clarify > >> > >> > > > this. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> 4. Reject Registration - This is related to the bullet > >> point > >> > >> > above. > >> > >> > > > > >>> > What will be the behavior of the active controller if > >> the > >> > >> > broker > >> > >> > > > > sends > >> > >> > > > > >>> > a metadata version that is not compatible with the > >> cluster > >> > >> wide > >> > >> > > > > >>> > metadata version? > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> If a broker starts up with a lower supported version > range > >> > >> than > >> > >> > the > >> > >> > > > > >>> current > >> > >> > > > > >>> cluster metadata.version, it should log an error and > >> > shutdown. > >> > >> > This > >> > >> > > > is > >> > >> > > > > in > >> > >> > > > > >>> line with KIP-584. > >> > >> > > > > >>> > >> > >> > > > > >>> 5. Discover upgrade > >> > >> > > > > >>> > > This snapshot will be a convenient way to let > broker > >> and > >> > >> > > > controller > >> > >> > > > > >>> > components rebuild their entire in-memory state > >> following > >> > an > >> > >> > > > upgrade. > >> > >> > > > > >>> > Can we rely on the presence of the FeatureLevelRecord > >> for > >> > >> the > >> > >> > > > > metadata > >> > >> > > > > >>> > version for this functionality? If so, it avoids > having > >> to > >> > >> > reload > >> > >> > > > the > >> > >> > > > > >>> > snapshot. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> For upgrades, yes probably since we won't need to > >> "rewrite" > >> > >> any > >> > >> > > > > records in > >> > >> > > > > >>> this case. For downgrades, we will need to generate the > >> > >> snapshot > >> > >> > > and > >> > >> > > > > >>> reload > >> > >> > > > > >>> everything. > >> > >> > > > > >>> > >> > >> > > > > >>> 6. Metadata version specification > >> > >> > > > > >>> > > V4(version=4, isBackwardsCompatible=false, > >> > >> description="New > >> > >> > > > > metadata > >> > >> > > > > >>> > record type Bar"), > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> Very cool. Do you have plans to generate Apache Kafka > HTML > >> > >> > > > > >>> > documentation for this information? Would be helpful > to > >> > >> display > >> > >> > > > this > >> > >> > > > > >>> > information to the user using the kafka-features.sh > and > >> > >> feature > >> > >> > > > RPC? > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> Hm good idea :) I'll add a brief section on > documentation. > >> > >> This > >> > >> > > would > >> > >> > > > > >>> certainly be very useful > >> > >> > > > > >>> > >> > >> > > > > >>> 7.Downgrade records > >> > >> > > > > >>> > I think we should explicitly mention that the > downgrade > >> > >> process > >> > >> > > > will > >> > >> > > > > >>> > downgrade both metadata records and controller > records. > >> In > >> > >> > > KIP-630 > >> > >> > > > we > >> > >> > > > > >>> > introduced two control records for snapshots. > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> Yes, good call. Let me re-read that KIP and include > some > >> > >> details. > >> > >> > > > > >>> > >> > >> > > > > >>> Thanks again for the comments! > >> > >> > > > > >>> -David > >> > >> > > > > >>> > >> > >> > > > > >>> > >> > >> > > > > >>> On Wed, Sep 29, 2021 at 5:09 PM José Armando García > Sancio > >> > >> > > > > >>> <jsan...@confluent.io.invalid> wrote: > >> > >> > > > > >>> > >> > >> > > > > >>> > One more comment. > >> > >> > > > > >>> > > >> > >> > > > > >>> > 7.Downgrade records > >> > >> > > > > >>> > I think we should explicitly mention that the > downgrade > >> > >> process > >> > >> > > > will > >> > >> > > > > >>> > downgrade both metadata records and controller > records. > >> In > >> > >> > > KIP-630 > >> > >> > > > we > >> > >> > > > > >>> > introduced two control records for snapshots. > >> > >> > > > > >>> > > >> > >> > > > > >>> > >> > >> > > > > >> > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > -- > >> > >> > > -David > >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> -- > >> > >> -- Guozhang > >> > >> > >> > > > >> > > > >> > > -- > >> > > -David > >> > > > >> > > >> > > >> > -- > >> > -David > >> > > >> > >> > >> -- > >> -- Guozhang > >> > > > > > > -- > > -David > -- -- Guozhang