Hi Jun, Thanks for the discussion. Yeah, those are the scenarios for using these tools. I have documented their usage in the KIP.
Best, Kevin Wu On Thu, Apr 23, 2026 at 11:51 AM Jun Rao via dev <[email protected]> wrote: > Hi, Kevin, > > Thanks for the reply. > > Your suggestion sounds good to me. It would be useful to document the usage > of those tools. The scenarios are: > 1. Remove a voter in dynamic KRaft quorum > 2. Unregister an observer controller > 3. Unregister a voter in a static KRaft quorum when the static voter set is > mistakenly configured. > > For item 3, could you document how it works? Does one need to stop the > misconfigured voter first and then unregister it? > > Are there other scenarios? > > Jun > > On Thu, Apr 23, 2026 at 8:22 AM Kevin Wu <[email protected]> wrote: > > > Hi Jun, > > > > Thanks for the replies. > > > > RE JR3: I would like the design of this feature to not introduce more > > coupling of the KRaft and metadata layers. Observer controllers are > > supported, but they are a KRaft concept, so it should not be known to the > > metadata layer whether or not a given controller is a voter or observer. > > > > What do you think about the following documentation and execution pattern > > regarding these CLI commands? > > > > `kafka-cluster unregister-controller` is a command for users when they > want > > to unregister a controller from the cluster. We can document that this is > > potentially unsafe and should only be done if the operator does not > intend > > to bring back up that controller. `kafka-cluster unregister-controller` > > works irrespective of the quorum mode. > > > > Going forward, running `kafka-metadata-quorum remove-controller` removes > a > > controller as a KRaft voter, and continues to only be supported in a > > dynamic quorum cluster. I still think the unregistering behavior should > be > > an additional flag, because having an observer controller that is still > > registered to the cluster is a valid configuration in Kafka. I think of > > `kafka-metadata-quorum remove-controller --unregister` as a "built-in" > CLI > > script, since removing a voter and unregistering it from the cluster is > > probably a very common usage pattern. This command will only send > > UnregisterController RPC if the cluster supports dynamic quorum, so the > > overall command behavior is consistent with how it is today with respect > to > > the kraft.version level of the cluster. If the cluster does not support > > dynamic quorum, the CLI can direct the user to instead run the > > `kafka-cluster unregister-controller` command. > > > > Best, > > Kevin Wu > > > > On Tue, Apr 21, 2026 at 5:39 PM Jun Rao via dev <[email protected]> > > wrote: > > > > > Hi, Kevin, > > > > > > Thanks for the reply. > > > > > > JR2. Good point on auto-join. I think we can introduce the > > > new UnregisterControllerRequest and keep the auto-join behavior as is > > > (i.e., without unregistering the controller when removing the old > > instance > > > from the voter). The command "kafka-metadata-quorum remove-controller" > > will > > > send two separate RPC requests, RemoveRaftVoterRequest and > > > UnregisterControllerRequest as documented in the KIP. > > > > > > JR3. When will a user use the command "kafka-cluster > > > unregister-controller"? Is this only for unregistering an observer > > > controller? If the observer controller is currently supported, we can > add > > > that command. It would be useful to document the usage for both > commands. > > > > > > Jun > > > > > > > > > On Tue, Apr 21, 2026 at 9:25 AM Kevin Wu <[email protected]> > wrote: > > > > > > > Hi Jun, > > > > > > > > Thanks for the reply. > > > > > > > > RE JR1: Yeah, I will update KIP to touch on this static quorum edge > > case. > > > > > > > > RE JR2: That seems reasonable to me, since we would avoid two RPC > hops > > > (one > > > > for RemoveVoter, one for UnregisterController). One thing to note is > > that > > > > with KIP-1186 > > > > < > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1186*3A*Update*AddRaftVoterRequest*RPC*to*support*auto-join__;JSsrKysrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtGeJkFHCg$ > > > > >, > > > > besides operators manually removing controllers, observer controllers > > > > themselves can send `RemoveRaftVoter` to remove their old > incarnations > > > from > > > > the voter set as part of the auto-join feature. With auto-join and > this > > > > proposed behavior, explicitly removing a controller's old > registration > > > > alongside its old voter set entry can lead to "unsupported" upgrades > in > > > the > > > > cluster. An operator doing these steps manually can be argued as > > > > misconfiguring the cluster, but the auto-join feature allowing for > this > > > > scenario seems like a bug. > > > > > > > > Consider the below example with auto-join enabled: 3 controllers in > the > > > > voter set (A,B,C) where A supports feature levels X=[0-1], B supports > > > > feature levels X=[0-1], but C only supports X=0. Currently, node A is > > the > > > > active controller, all 3 controllers are registered, but upgrading > > > feature > > > > X to feature level 1 is not supported because C does not support it. > > > > Controller C restarts with a new disk (now represented as C'). The > > > > auto-join code runs to first remove C from the voter set, and then > > remove > > > > the registration for C. These records are committed via nodes A and > B. > > > Now, > > > > from the active controller's perspective, the cluster does support > > > > upgrading feature X to level 1. There is a race between C' adding > > itself > > > > back to the KRaft voter set and re-registering itself, and a > potential > > > > feature level upgrade. Another interesting thing to note after > looking > > at > > > > the code is that controllers can register even if they do not support > > the > > > > finalized features of the cluster, which is different from broker > > > > registration. In Kafka's current code, the original registration for > C > > > > stays in the log after C is removed as a voter by auto-join, which > > > prevents > > > > an upgrade of feature X. At some point, the registration for C is > > updated > > > > by C' because C' is a different process incarnation, but a > registration > > > > that blocks X's upgrade is always in the log. > > > > > > > > Therefore, Kafka should not unregister a controller when auto-join > > > removes > > > > a controller from the voter set. This means including a new RPC > version > > > for > > > > `RemoveRaftVoter` that introduces a boolean field telling the active > > > > controller whether to also unregister the controller. This field > would > > be > > > > completely ignored by the raft layer, and instead would be handled at > > the > > > > ControllerApis level. I think it is fine to unregister a controller > > > > whenever the operator runs `kafka-metadata-quorum remove-controller` > > for > > > a > > > > smooth UX with dynamic quorum. What do you think? > > > > > > > > RE JR3: Maybe we can document this better as part of the code changes > > to > > > > this KIP, but in my opinion, the kafka-cluster tool deals with > cluster > > > > membership (brokers and controllers), which is a metadata layer > > concept. > > > If > > > > you look at the `list-endpoints` command, you can list out the > > registered > > > > controller endpoints. Alternatively, the kafka-metadata-quorum tool > > deals > > > > with KRaft, which knows about concepts like leader, voter, and > > observers. > > > > The `add-controller` and `remove-controller` sub-commands > inadvertently > > > > deal with controllers (since controllers can be voters), but the > > > `describe` > > > > sub-command tree also shows information about brokers, which are > > > observers > > > > to KRaft. My decision to include the `unregister-controller` command > in > > > the > > > > `kafka-cluster` tool is mainly motivated by this distinction. > > > Additionally, > > > > if we only send `RemoveVoterRequest` in `remove-controller`, it seems > > > hacky > > > > to direct users to use that command for unregistering any controller, > > > since > > > > for observers, the remove voter logic of that request will always > fail > > in > > > > the raft layer. What do you think? > > > > > > > > Best, > > > > Kevin Wu > > > > > > > > > > > > On Tue, Apr 21, 2026 at 8:17 AM Paolo Patierno < > > [email protected] > > > > > > > > wrote: > > > > > > > > > Hi Kevin, > > > > > thanks for the KIP. > > > > > From reading it, it's not clear because not explicit, but I would > > > assume > > > > > you are going to expose a new unregisterController method through > the > > > > > AdminClient API as well, is my assumption right? > > > > > I expect it would be used underneath by the tools you are going to > > > > modify. > > > > > Having such support within the AdminClient API is important when > the > > > > > operator is not a human to run the tool but a Kubernetes operator > > (i.e. > > > > > Strimzi) with the need to unregister a controller. > > > > > > > > > > Thanks, > > > > > Paolo. > > > > > > > > > > On Mon, 20 Apr 2026 at 21:57, Kevin Wu <[email protected]> > > wrote: > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > RE JR1: I would say the main use case is dynamic quorums, since > the > > > > > concept > > > > > > of the observer controller becomes a thing in that world. > However, > > > > there > > > > > is > > > > > > a static quorum edge case if the operator misconfigures > > > > > > `controller.quorum.voters`. If a new controller voter mistakenly > > > joins > > > > > the > > > > > > cluster, it will also persist a registration record. In my > opinion, > > > > there > > > > > > should be a way to remove a controller registration via > AdminClient > > > CLI > > > > > in > > > > > > all quorum modes. > > > > > > > > > > > > RE JR2: Yes, the existing command only removes the voter, but > does > > > not > > > > > > unregister the controller. I left it as a separate flag for now > > > because > > > > > > they are "separate" operations in that being a raft voter is a > > subset > > > > of > > > > > > being a controller in dynamic quorums, but I am not opposed to > > making > > > > > this > > > > > > command try to do both (remove voter and unregister the > controller) > > > by > > > > > > default. In my opinion, an observer controller is "useless" in > that > > > it > > > > > does > > > > > > not participate in the leader election or replication parts of > the > > > > KRaft > > > > > > protocol, so I see no issue with doing both operations always. > > > However, > > > > > an > > > > > > operator may want observer controllers around for other reasons > > like > > > > > > redundancy. Do you (or others) have any insight into how users > may > > be > > > > > > configuring clusters with observer controllers? If not, I think > it > > is > > > > > okay > > > > > > to remove the flag and make it the default behavior of > > > > > > `kafka-metadata-quorum remove-controller`. > > > > > > > > > > > > RE JR3: Not exactly. The `kafka-metadata-quorum remove-controller > > ... > > > > > > --unregister` sends 2 RPCs to the active controller, one to > remove > > a > > > > node > > > > > > from the voter set, and another to unregister the node. The > > > > > `kafka-cluster > > > > > > unregister-controller` command just sends 1 RPC to the active > > > > controller > > > > > to > > > > > > unregister the node. My motivation for having two separate > commands > > > is > > > > > > because `remove-controller` is associated with dynamic quorum, > > since > > > > the > > > > > > `RemoveRaftVoterRPC` will fail if the kraft.version=0. What do > you > > > > think? > > > > > > > > > > > > RE JR4: I have updated the sections for the CLI commands in the > KIP > > > to > > > > > add > > > > > > this information. > > > > > > > > > > > > RE JR5: This is describing the current implementation of the > > > > > > ControllerRegistrationManager, which will listen to the metadata > > log > > > > and > > > > > > send ControllerRegistrationRequest when the local node id is not > > > > > registered > > > > > > in the log. It looks like this is slightly different from how we > > > handle > > > > > > broker registration in BrokerLifecycleManager. Currently, this > code > > > > path > > > > > > never executes because controller registrations cannot be > removed. > > > > > > > > > > > > Best, > > > > > > Kevin Wu > > > > > > > > > > > > On Fri, Apr 17, 2026 at 2:08 PM Jun Rao via dev < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi, Kevin, > > > > > > > > > > > > > > Thanks for the KIP. A few comments. > > > > > > > > > > > > > > JR1. I guess this is only intended for dynamic KRaft quorums? > If > > > so, > > > > it > > > > > > > would be useful to clarify that. > > > > > > > > > > > > > > JR2. kafka-metadata-quorum remove-controller --controller-id > 9990 > > > > > > > --controller-directory-id EXAMPLE_UUID --unregister > > > > > > > So, the existing remove-controller logic only changes the voter > > > set, > > > > > but > > > > > > > doesn't unregister the controller? Should we just always do > these > > > two > > > > > > > together? Is there a use case for only removing a controller > from > > > the > > > > > > voter > > > > > > > set, but not unregsitering? > > > > > > > > > > > > > > JR3. Is kafka-cluster unregister-controller equivalent to > > > > > > > kafka-metadata-quorum remove-controller --controller-id 9990 > > > > > > > --controller-directory-id EXAMPLE_UUID --unregister? > > > > > > > > > > > > > > JR4. Could you describe the underlying workflow for each new > > > command > > > > > > (RPCs > > > > > > > sent, metadata records generated, actions taken by the > > controller, > > > > > etc)? > > > > > > > > > > > > > > JR5. "The registration manager of an unregistered controller > > > already > > > > > > > attempts to re-register with the active controller. This is to > > > > prevent > > > > > > > accidental unregistrations." > > > > > > > I don't quite understand this. Why will an unregistered > > controller > > > > > > attempt > > > > > > > to re-register? > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > On Fri, Apr 3, 2026 at 11:31 AM Kevin Wu < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I would like to start a discussion on KIP-1312: Support > > > > unregistering > > > > > > > > controllers. Below is the KIP link. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1312*3A*Support*unregistering*controllers__;JSsrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFeUg-7gg$ > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Paolo Patierno > > > > > > > > > > *Senior Principal Software Engineer @ IBM**CNCF Ambassador* > > > > > > > > > > Twitter : @ppatierno < > > > https://urldefense.com/v3/__http://twitter.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtHGG-mS-Q$ > > > > > > > > Linkedin : paolopatierno < > > > https://urldefense.com/v3/__http://it.linkedin.com/in/paolopatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFcWWCD5g$ > > > > > > > > GitHub : ppatierno < > > > https://urldefense.com/v3/__https://github.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtEK-wncPw$ > > > > > > > > > > > > > > > > > >
