Hi, Kevin, Thanks for the reply.
Your suggestion sounds good to me. It would be useful to document the usage of those tools. The scenarios are: 1. Remove a voter in dynamic KRaft quorum 2. Unregister an observer controller 3. Unregister a voter in a static KRaft quorum when the static voter set is mistakenly configured. For item 3, could you document how it works? Does one need to stop the misconfigured voter first and then unregister it? Are there other scenarios? Jun On Thu, Apr 23, 2026 at 8:22 AM Kevin Wu <[email protected]> wrote: > Hi Jun, > > Thanks for the replies. > > RE JR3: I would like the design of this feature to not introduce more > coupling of the KRaft and metadata layers. Observer controllers are > supported, but they are a KRaft concept, so it should not be known to the > metadata layer whether or not a given controller is a voter or observer. > > What do you think about the following documentation and execution pattern > regarding these CLI commands? > > `kafka-cluster unregister-controller` is a command for users when they want > to unregister a controller from the cluster. We can document that this is > potentially unsafe and should only be done if the operator does not intend > to bring back up that controller. `kafka-cluster unregister-controller` > works irrespective of the quorum mode. > > Going forward, running `kafka-metadata-quorum remove-controller` removes a > controller as a KRaft voter, and continues to only be supported in a > dynamic quorum cluster. I still think the unregistering behavior should be > an additional flag, because having an observer controller that is still > registered to the cluster is a valid configuration in Kafka. I think of > `kafka-metadata-quorum remove-controller --unregister` as a "built-in" CLI > script, since removing a voter and unregistering it from the cluster is > probably a very common usage pattern. This command will only send > UnregisterController RPC if the cluster supports dynamic quorum, so the > overall command behavior is consistent with how it is today with respect to > the kraft.version level of the cluster. If the cluster does not support > dynamic quorum, the CLI can direct the user to instead run the > `kafka-cluster unregister-controller` command. > > Best, > Kevin Wu > > On Tue, Apr 21, 2026 at 5:39 PM Jun Rao via dev <[email protected]> > wrote: > > > Hi, Kevin, > > > > Thanks for the reply. > > > > JR2. Good point on auto-join. I think we can introduce the > > new UnregisterControllerRequest and keep the auto-join behavior as is > > (i.e., without unregistering the controller when removing the old > instance > > from the voter). The command "kafka-metadata-quorum remove-controller" > will > > send two separate RPC requests, RemoveRaftVoterRequest and > > UnregisterControllerRequest as documented in the KIP. > > > > JR3. When will a user use the command "kafka-cluster > > unregister-controller"? Is this only for unregistering an observer > > controller? If the observer controller is currently supported, we can add > > that command. It would be useful to document the usage for both commands. > > > > Jun > > > > > > On Tue, Apr 21, 2026 at 9:25 AM Kevin Wu <[email protected]> wrote: > > > > > Hi Jun, > > > > > > Thanks for the reply. > > > > > > RE JR1: Yeah, I will update KIP to touch on this static quorum edge > case. > > > > > > RE JR2: That seems reasonable to me, since we would avoid two RPC hops > > (one > > > for RemoveVoter, one for UnregisterController). One thing to note is > that > > > with KIP-1186 > > > < > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1186*3A*Update*AddRaftVoterRequest*RPC*to*support*auto-join__;JSsrKysrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtGeJkFHCg$ > > > >, > > > besides operators manually removing controllers, observer controllers > > > themselves can send `RemoveRaftVoter` to remove their old incarnations > > from > > > the voter set as part of the auto-join feature. With auto-join and this > > > proposed behavior, explicitly removing a controller's old registration > > > alongside its old voter set entry can lead to "unsupported" upgrades in > > the > > > cluster. An operator doing these steps manually can be argued as > > > misconfiguring the cluster, but the auto-join feature allowing for this > > > scenario seems like a bug. > > > > > > Consider the below example with auto-join enabled: 3 controllers in the > > > voter set (A,B,C) where A supports feature levels X=[0-1], B supports > > > feature levels X=[0-1], but C only supports X=0. Currently, node A is > the > > > active controller, all 3 controllers are registered, but upgrading > > feature > > > X to feature level 1 is not supported because C does not support it. > > > Controller C restarts with a new disk (now represented as C'). The > > > auto-join code runs to first remove C from the voter set, and then > remove > > > the registration for C. These records are committed via nodes A and B. > > Now, > > > from the active controller's perspective, the cluster does support > > > upgrading feature X to level 1. There is a race between C' adding > itself > > > back to the KRaft voter set and re-registering itself, and a potential > > > feature level upgrade. Another interesting thing to note after looking > at > > > the code is that controllers can register even if they do not support > the > > > finalized features of the cluster, which is different from broker > > > registration. In Kafka's current code, the original registration for C > > > stays in the log after C is removed as a voter by auto-join, which > > prevents > > > an upgrade of feature X. At some point, the registration for C is > updated > > > by C' because C' is a different process incarnation, but a registration > > > that blocks X's upgrade is always in the log. > > > > > > Therefore, Kafka should not unregister a controller when auto-join > > removes > > > a controller from the voter set. This means including a new RPC version > > for > > > `RemoveRaftVoter` that introduces a boolean field telling the active > > > controller whether to also unregister the controller. This field would > be > > > completely ignored by the raft layer, and instead would be handled at > the > > > ControllerApis level. I think it is fine to unregister a controller > > > whenever the operator runs `kafka-metadata-quorum remove-controller` > for > > a > > > smooth UX with dynamic quorum. What do you think? > > > > > > RE JR3: Maybe we can document this better as part of the code changes > to > > > this KIP, but in my opinion, the kafka-cluster tool deals with cluster > > > membership (brokers and controllers), which is a metadata layer > concept. > > If > > > you look at the `list-endpoints` command, you can list out the > registered > > > controller endpoints. Alternatively, the kafka-metadata-quorum tool > deals > > > with KRaft, which knows about concepts like leader, voter, and > observers. > > > The `add-controller` and `remove-controller` sub-commands inadvertently > > > deal with controllers (since controllers can be voters), but the > > `describe` > > > sub-command tree also shows information about brokers, which are > > observers > > > to KRaft. My decision to include the `unregister-controller` command in > > the > > > `kafka-cluster` tool is mainly motivated by this distinction. > > Additionally, > > > if we only send `RemoveVoterRequest` in `remove-controller`, it seems > > hacky > > > to direct users to use that command for unregistering any controller, > > since > > > for observers, the remove voter logic of that request will always fail > in > > > the raft layer. What do you think? > > > > > > Best, > > > Kevin Wu > > > > > > > > > On Tue, Apr 21, 2026 at 8:17 AM Paolo Patierno < > [email protected] > > > > > > wrote: > > > > > > > Hi Kevin, > > > > thanks for the KIP. > > > > From reading it, it's not clear because not explicit, but I would > > assume > > > > you are going to expose a new unregisterController method through the > > > > AdminClient API as well, is my assumption right? > > > > I expect it would be used underneath by the tools you are going to > > > modify. > > > > Having such support within the AdminClient API is important when the > > > > operator is not a human to run the tool but a Kubernetes operator > (i.e. > > > > Strimzi) with the need to unregister a controller. > > > > > > > > Thanks, > > > > Paolo. > > > > > > > > On Mon, 20 Apr 2026 at 21:57, Kevin Wu <[email protected]> > wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > Thanks for the reply. > > > > > > > > > > RE JR1: I would say the main use case is dynamic quorums, since the > > > > concept > > > > > of the observer controller becomes a thing in that world. However, > > > there > > > > is > > > > > a static quorum edge case if the operator misconfigures > > > > > `controller.quorum.voters`. If a new controller voter mistakenly > > joins > > > > the > > > > > cluster, it will also persist a registration record. In my opinion, > > > there > > > > > should be a way to remove a controller registration via AdminClient > > CLI > > > > in > > > > > all quorum modes. > > > > > > > > > > RE JR2: Yes, the existing command only removes the voter, but does > > not > > > > > unregister the controller. I left it as a separate flag for now > > because > > > > > they are "separate" operations in that being a raft voter is a > subset > > > of > > > > > being a controller in dynamic quorums, but I am not opposed to > making > > > > this > > > > > command try to do both (remove voter and unregister the controller) > > by > > > > > default. In my opinion, an observer controller is "useless" in that > > it > > > > does > > > > > not participate in the leader election or replication parts of the > > > KRaft > > > > > protocol, so I see no issue with doing both operations always. > > However, > > > > an > > > > > operator may want observer controllers around for other reasons > like > > > > > redundancy. Do you (or others) have any insight into how users may > be > > > > > configuring clusters with observer controllers? If not, I think it > is > > > > okay > > > > > to remove the flag and make it the default behavior of > > > > > `kafka-metadata-quorum remove-controller`. > > > > > > > > > > RE JR3: Not exactly. The `kafka-metadata-quorum remove-controller > ... > > > > > --unregister` sends 2 RPCs to the active controller, one to remove > a > > > node > > > > > from the voter set, and another to unregister the node. The > > > > `kafka-cluster > > > > > unregister-controller` command just sends 1 RPC to the active > > > controller > > > > to > > > > > unregister the node. My motivation for having two separate commands > > is > > > > > because `remove-controller` is associated with dynamic quorum, > since > > > the > > > > > `RemoveRaftVoterRPC` will fail if the kraft.version=0. What do you > > > think? > > > > > > > > > > RE JR4: I have updated the sections for the CLI commands in the KIP > > to > > > > add > > > > > this information. > > > > > > > > > > RE JR5: This is describing the current implementation of the > > > > > ControllerRegistrationManager, which will listen to the metadata > log > > > and > > > > > send ControllerRegistrationRequest when the local node id is not > > > > registered > > > > > in the log. It looks like this is slightly different from how we > > handle > > > > > broker registration in BrokerLifecycleManager. Currently, this code > > > path > > > > > never executes because controller registrations cannot be removed. > > > > > > > > > > Best, > > > > > Kevin Wu > > > > > > > > > > On Fri, Apr 17, 2026 at 2:08 PM Jun Rao via dev < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi, Kevin, > > > > > > > > > > > > Thanks for the KIP. A few comments. > > > > > > > > > > > > JR1. I guess this is only intended for dynamic KRaft quorums? If > > so, > > > it > > > > > > would be useful to clarify that. > > > > > > > > > > > > JR2. kafka-metadata-quorum remove-controller --controller-id 9990 > > > > > > --controller-directory-id EXAMPLE_UUID --unregister > > > > > > So, the existing remove-controller logic only changes the voter > > set, > > > > but > > > > > > doesn't unregister the controller? Should we just always do these > > two > > > > > > together? Is there a use case for only removing a controller from > > the > > > > > voter > > > > > > set, but not unregsitering? > > > > > > > > > > > > JR3. Is kafka-cluster unregister-controller equivalent to > > > > > > kafka-metadata-quorum remove-controller --controller-id 9990 > > > > > > --controller-directory-id EXAMPLE_UUID --unregister? > > > > > > > > > > > > JR4. Could you describe the underlying workflow for each new > > command > > > > > (RPCs > > > > > > sent, metadata records generated, actions taken by the > controller, > > > > etc)? > > > > > > > > > > > > JR5. "The registration manager of an unregistered controller > > already > > > > > > attempts to re-register with the active controller. This is to > > > prevent > > > > > > accidental unregistrations." > > > > > > I don't quite understand this. Why will an unregistered > controller > > > > > attempt > > > > > > to re-register? > > > > > > > > > > > > Jun > > > > > > > > > > > > On Fri, Apr 3, 2026 at 11:31 AM Kevin Wu <[email protected] > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I would like to start a discussion on KIP-1312: Support > > > unregistering > > > > > > > controllers. Below is the KIP link. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1312*3A*Support*unregistering*controllers__;JSsrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFeUg-7gg$ > > > > > > > > > > > > > > Thanks, > > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Paolo Patierno > > > > > > > > *Senior Principal Software Engineer @ IBM**CNCF Ambassador* > > > > > > > > Twitter : @ppatierno < > https://urldefense.com/v3/__http://twitter.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtHGG-mS-Q$ > > > > > > Linkedin : paolopatierno < > https://urldefense.com/v3/__http://it.linkedin.com/in/paolopatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFcWWCD5g$ > > > > > > GitHub : ppatierno < > https://urldefense.com/v3/__https://github.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtEK-wncPw$ > > > > > > > > > > > >
