Hi Jonah, Thanks for the reply.
RE JH1: Sure, an operator can mistakenly try to unregister a controller that is part of the KRaft voter set. The implementation can have the active controller reject unregistration of KRaft voters. Although, the registration manager will attempt to re-register, so the cluster would "recover" with respect to this state. Best, Kevin Wu On Tue, Apr 21, 2026 at 12:34 PM Jonah Hooper via dev <[email protected]> wrote: > Thanks for the KIP Kevin! > > > This is to prevent accidental unregistrations. The intention for > unregistration is for it to occur after the operator decommissions a > controller node. > > JH1: The KIP discusses the case where a controller is registered but no > voter exists. However, the KIP could potentially add an inverse: the > ControllerRegistration (for Node A, for example) is removed successfully > but Node A remains healthy and is still a Voter. > ControllerRegistrationManager listens to the MetadataPublisher pipeline > which is derived with some delay from Raft layer. Since the "controller > metadata" layer might not know it is unregistered, it could attempt to > reregister. A correctness-property of the UnregisterController workflow is > that the controller being unregistered is actually decomissioned. This may > be reasonable in this case, but I wonder if there is a way to design this > so that it would even work on a controller which is active and part of the > quorum. > > Best, > Jonah > > > > On Tue, Apr 21, 2026 at 12:38 PM Kevin Wu <[email protected]> wrote: > > > Hi Paolo, > > > > I have included a section outlining the AdminClient API changes for this > > KIP. Thanks for pointing that out. > > > > Best, > > Kevin Wu > > > > On Tue, Apr 21, 2026 at 11:24 AM Kevin Wu <[email protected]> > wrote: > > > > > Hi Jun, > > > > > > Thanks for the reply. > > > > > > RE JR1: Yeah, I will update KIP to touch on this static quorum edge > case. > > > > > > RE JR2: That seems reasonable to me, since we would avoid two RPC hops > > > (one for RemoveVoter, one for UnregisterController). One thing to note > is > > > that with KIP-1186 > > > < > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1186%3A+Update+AddRaftVoterRequest+RPC+to+support+auto-join > > >, > > > besides operators manually removing controllers, observer controllers > > > themselves can send `RemoveRaftVoter` to remove their old incarnations > > from > > > the voter set as part of the auto-join feature. With auto-join and this > > > proposed behavior, explicitly removing a controller's old registration > > > alongside its old voter set entry can lead to "unsupported" upgrades in > > the > > > cluster. An operator doing these steps manually can be argued as > > > misconfiguring the cluster, but the auto-join feature allowing for this > > > scenario seems like a bug. > > > > > > Consider the below example with auto-join enabled: 3 controllers in the > > > voter set (A,B,C) where A supports feature levels X=[0-1], B supports > > > feature levels X=[0-1], but C only supports X=0. Currently, node A is > the > > > active controller, all 3 controllers are registered, but upgrading > > feature > > > X to feature level 1 is not supported because C does not support it. > > > Controller C restarts with a new disk (now represented as C'). The > > > auto-join code runs to first remove C from the voter set, and then > remove > > > the registration for C. These records are committed via nodes A and B. > > Now, > > > from the active controller's perspective, the cluster does support > > > upgrading feature X to level 1. There is a race between C' adding > itself > > > back to the KRaft voter set and re-registering itself, and a potential > > > feature level upgrade. Another interesting thing to note after looking > at > > > the code is that controllers can register even if they do not support > the > > > finalized features of the cluster, which is different from broker > > > registration. In Kafka's current code, the original registration for C > > > stays in the log after C is removed as a voter by auto-join, which > > prevents > > > an upgrade of feature X. At some point, the registration for C is > updated > > > by C' because C' is a different process incarnation, but a registration > > > that blocks X's upgrade is always in the log. > > > > > > Therefore, Kafka should not unregister a controller when auto-join > > removes > > > a controller from the voter set. This means including a new RPC version > > for > > > `RemoveRaftVoter` that introduces a boolean field telling the active > > > controller whether to also unregister the controller. This field would > be > > > completely ignored by the raft layer, and instead would be handled at > the > > > ControllerApis level. I think it is fine to unregister a controller > > > whenever the operator runs `kafka-metadata-quorum remove-controller` > for > > a > > > smooth UX with dynamic quorum. What do you think? > > > > > > RE JR3: Maybe we can document this better as part of the code changes > to > > > this KIP, but in my opinion, the kafka-cluster tool deals with cluster > > > membership (brokers and controllers), which is a metadata layer > concept. > > If > > > you look at the `list-endpoints` command, you can list out the > registered > > > controller endpoints. Alternatively, the kafka-metadata-quorum tool > deals > > > with KRaft, which knows about concepts like leader, voter, and > observers. > > > The `add-controller` and `remove-controller` sub-commands inadvertently > > > deal with controllers (since controllers can be voters), but the > > `describe` > > > sub-command tree also shows information about brokers, which are > > observers > > > to KRaft. My decision to include the `unregister-controller` command in > > the > > > `kafka-cluster` tool is mainly motivated by this distinction. > > Additionally, > > > if we only send `RemoveVoterRequest` in `remove-controller`, it seems > > hacky > > > to direct users to use that command for unregistering any controller, > > since > > > for observers, the remove voter logic of that request will always fail > in > > > the raft layer. What do you think? > > > > > > Best, > > > Kevin Wu > > > > > > > > > On Tue, Apr 21, 2026 at 8:17 AM Paolo Patierno < > [email protected] > > > > > > wrote: > > > > > >> Hi Kevin, > > >> thanks for the KIP. > > >> From reading it, it's not clear because not explicit, but I would > assume > > >> you are going to expose a new unregisterController method through the > > >> AdminClient API as well, is my assumption right? > > >> I expect it would be used underneath by the tools you are going to > > modify. > > >> Having such support within the AdminClient API is important when the > > >> operator is not a human to run the tool but a Kubernetes operator > (i.e. > > >> Strimzi) with the need to unregister a controller. > > >> > > >> Thanks, > > >> Paolo. > > >> > > >> On Mon, 20 Apr 2026 at 21:57, Kevin Wu <[email protected]> > wrote: > > >> > > >> > Hi Jun, > > >> > > > >> > Thanks for the reply. > > >> > > > >> > RE JR1: I would say the main use case is dynamic quorums, since the > > >> concept > > >> > of the observer controller becomes a thing in that world. However, > > >> there is > > >> > a static quorum edge case if the operator misconfigures > > >> > `controller.quorum.voters`. If a new controller voter mistakenly > joins > > >> the > > >> > cluster, it will also persist a registration record. In my opinion, > > >> there > > >> > should be a way to remove a controller registration via AdminClient > > CLI > > >> in > > >> > all quorum modes. > > >> > > > >> > RE JR2: Yes, the existing command only removes the voter, but does > not > > >> > unregister the controller. I left it as a separate flag for now > > because > > >> > they are "separate" operations in that being a raft voter is a > subset > > of > > >> > being a controller in dynamic quorums, but I am not opposed to > making > > >> this > > >> > command try to do both (remove voter and unregister the controller) > by > > >> > default. In my opinion, an observer controller is "useless" in that > it > > >> does > > >> > not participate in the leader election or replication parts of the > > KRaft > > >> > protocol, so I see no issue with doing both operations always. > > However, > > >> an > > >> > operator may want observer controllers around for other reasons like > > >> > redundancy. Do you (or others) have any insight into how users may > be > > >> > configuring clusters with observer controllers? If not, I think it > is > > >> okay > > >> > to remove the flag and make it the default behavior of > > >> > `kafka-metadata-quorum remove-controller`. > > >> > > > >> > RE JR3: Not exactly. The `kafka-metadata-quorum remove-controller > ... > > >> > --unregister` sends 2 RPCs to the active controller, one to remove a > > >> node > > >> > from the voter set, and another to unregister the node. The > > >> `kafka-cluster > > >> > unregister-controller` command just sends 1 RPC to the active > > >> controller to > > >> > unregister the node. My motivation for having two separate commands > is > > >> > because `remove-controller` is associated with dynamic quorum, since > > the > > >> > `RemoveRaftVoterRPC` will fail if the kraft.version=0. What do you > > >> think? > > >> > > > >> > RE JR4: I have updated the sections for the CLI commands in the KIP > to > > >> add > > >> > this information. > > >> > > > >> > RE JR5: This is describing the current implementation of the > > >> > ControllerRegistrationManager, which will listen to the metadata log > > and > > >> > send ControllerRegistrationRequest when the local node id is not > > >> registered > > >> > in the log. It looks like this is slightly different from how we > > handle > > >> > broker registration in BrokerLifecycleManager. Currently, this code > > path > > >> > never executes because controller registrations cannot be removed. > > >> > > > >> > Best, > > >> > Kevin Wu > > >> > > > >> > On Fri, Apr 17, 2026 at 2:08 PM Jun Rao via dev < > [email protected] > > > > > >> > wrote: > > >> > > > >> > > Hi, Kevin, > > >> > > > > >> > > Thanks for the KIP. A few comments. > > >> > > > > >> > > JR1. I guess this is only intended for dynamic KRaft quorums? If > so, > > >> it > > >> > > would be useful to clarify that. > > >> > > > > >> > > JR2. kafka-metadata-quorum remove-controller --controller-id 9990 > > >> > > --controller-directory-id EXAMPLE_UUID --unregister > > >> > > So, the existing remove-controller logic only changes the voter > set, > > >> but > > >> > > doesn't unregister the controller? Should we just always do these > > two > > >> > > together? Is there a use case for only removing a controller from > > the > > >> > voter > > >> > > set, but not unregsitering? > > >> > > > > >> > > JR3. Is kafka-cluster unregister-controller equivalent to > > >> > > kafka-metadata-quorum remove-controller --controller-id 9990 > > >> > > --controller-directory-id EXAMPLE_UUID --unregister? > > >> > > > > >> > > JR4. Could you describe the underlying workflow for each new > command > > >> > (RPCs > > >> > > sent, metadata records generated, actions taken by the controller, > > >> etc)? > > >> > > > > >> > > JR5. "The registration manager of an unregistered controller > already > > >> > > attempts to re-register with the active controller. This is to > > prevent > > >> > > accidental unregistrations." > > >> > > I don't quite understand this. Why will an unregistered controller > > >> > attempt > > >> > > to re-register? > > >> > > > > >> > > Jun > > >> > > > > >> > > On Fri, Apr 3, 2026 at 11:31 AM Kevin Wu <[email protected]> > > >> wrote: > > >> > > > > >> > > > Hi all, > > >> > > > > > >> > > > I would like to start a discussion on KIP-1312: Support > > >> unregistering > > >> > > > controllers. Below is the KIP link. > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1312%3A+Support+unregistering+controllers > > >> > > > > > >> > > > Thanks, > > >> > > > Kevin Wu > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> -- > > >> Paolo Patierno > > >> > > >> *Senior Principal Software Engineer @ IBM**CNCF Ambassador* > > >> > > >> Twitter : @ppatierno <http://twitter.com/ppatierno> > > >> Linkedin : paolopatierno <http://it.linkedin.com/in/paolopatierno> > > >> GitHub : ppatierno <https://github.com/ppatierno> > > >> > > > > > >
