Hi Jun,

Thanks for the discussion.
Yeah, those are the scenarios for using these tools. I have documented
their usage in the KIP.

Best,
Kevin Wu

On Thu, Apr 23, 2026 at 11:51 AM Jun Rao via dev <[email protected]>
wrote:

> Hi, Kevin,
>
> Thanks for the reply.
>
> Your suggestion sounds good to me. It would be useful to document the usage
> of those tools. The scenarios are:
> 1. Remove a voter in dynamic KRaft quorum
> 2. Unregister an observer controller
> 3. Unregister a voter in a static KRaft quorum when the static voter set is
> mistakenly configured.
>
> For item 3, could you document how it works? Does one need to stop the
> misconfigured voter first and then unregister it?
>
> Are there other scenarios?
>
> Jun
>
> On Thu, Apr 23, 2026 at 8:22 AM Kevin Wu <[email protected]> wrote:
>
> > Hi Jun,
> >
> > Thanks for the replies.
> >
> > RE JR3: I would like the design of this feature to not introduce more
> > coupling of the KRaft and metadata layers. Observer controllers are
> > supported, but they are a KRaft concept, so it should not be known to the
> > metadata layer whether or not a given controller is a voter or observer.
> >
> > What do you think about the following documentation and execution pattern
> > regarding these CLI commands?
> >
> > `kafka-cluster unregister-controller` is a command for users when they
> want
> > to unregister a controller from the cluster. We can document that this is
> > potentially unsafe and should only be done if the operator does not
> intend
> > to bring back up that controller. `kafka-cluster unregister-controller`
> > works irrespective of the quorum mode.
> >
> > Going forward, running `kafka-metadata-quorum remove-controller` removes
> a
> > controller as a KRaft voter, and continues to only be supported in a
> > dynamic quorum cluster. I still think the unregistering behavior should
> be
> > an additional flag, because having an observer controller that is still
> > registered to the cluster is a valid configuration in Kafka. I think of
> > `kafka-metadata-quorum remove-controller --unregister` as a "built-in"
> CLI
> > script, since removing a voter and unregistering it from the cluster is
> > probably a very common usage pattern. This command will only send
> > UnregisterController RPC if the cluster supports dynamic quorum, so the
> > overall command behavior is consistent with how it is today with respect
> to
> > the kraft.version level of the cluster. If the cluster does not support
> > dynamic quorum, the CLI can direct the user to instead run the
> > `kafka-cluster unregister-controller` command.
> >
> > Best,
> > Kevin Wu
> >
> > On Tue, Apr 21, 2026 at 5:39 PM Jun Rao via dev <[email protected]>
> > wrote:
> >
> > > Hi, Kevin,
> > >
> > > Thanks for the reply.
> > >
> > > JR2. Good point on auto-join. I think we can introduce the
> > > new UnregisterControllerRequest and keep the auto-join behavior as is
> > > (i.e., without unregistering the controller when removing the old
> > instance
> > > from the voter). The command "kafka-metadata-quorum remove-controller"
> > will
> > > send two separate RPC requests, RemoveRaftVoterRequest and
> > > UnregisterControllerRequest as documented in the KIP.
> > >
> > > JR3. When will a user use the command "kafka-cluster
> > > unregister-controller"? Is this only for unregistering an observer
> > > controller? If the observer controller is currently supported, we can
> add
> > > that command. It would be useful to document the usage for both
> commands.
> > >
> > > Jun
> > >
> > >
> > > On Tue, Apr 21, 2026 at 9:25 AM Kevin Wu <[email protected]>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > RE JR1: Yeah, I will update KIP to touch on this static quorum edge
> > case.
> > > >
> > > > RE JR2: That seems reasonable to me, since we would avoid two RPC
> hops
> > > (one
> > > > for RemoveVoter, one for UnregisterController). One thing to note is
> > that
> > > > with KIP-1186
> > > > <
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1186*3A*Update*AddRaftVoterRequest*RPC*to*support*auto-join__;JSsrKysrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtGeJkFHCg$
> > > > >,
> > > > besides operators manually removing controllers, observer controllers
> > > > themselves can send `RemoveRaftVoter` to remove their old
> incarnations
> > > from
> > > > the voter set as part of the auto-join feature. With auto-join and
> this
> > > > proposed behavior, explicitly removing a controller's old
> registration
> > > > alongside its old voter set entry can lead to "unsupported" upgrades
> in
> > > the
> > > > cluster. An operator doing these steps manually can be argued as
> > > > misconfiguring the cluster, but the auto-join feature allowing for
> this
> > > > scenario seems like a bug.
> > > >
> > > > Consider the below example with auto-join enabled: 3 controllers in
> the
> > > > voter set (A,B,C) where A supports feature levels X=[0-1], B supports
> > > > feature levels X=[0-1], but C only supports X=0. Currently, node A is
> > the
> > > > active controller, all 3 controllers are registered, but upgrading
> > > feature
> > > > X to feature level 1 is not supported because C does not support it.
> > > > Controller C restarts with a new disk (now represented as C'). The
> > > > auto-join code runs to first remove C from the voter set, and then
> > remove
> > > > the registration for C. These records are committed via nodes A and
> B.
> > > Now,
> > > > from the active controller's perspective, the cluster does support
> > > > upgrading feature X to level 1. There is a race between C' adding
> > itself
> > > > back to the KRaft voter set and re-registering itself, and a
> potential
> > > > feature level upgrade. Another interesting thing to note after
> looking
> > at
> > > > the code is that controllers can register even if they do not support
> > the
> > > > finalized features of the cluster, which is different from broker
> > > > registration. In Kafka's current code, the original registration for
> C
> > > > stays in the log after C is removed as a voter by auto-join, which
> > > prevents
> > > > an upgrade of feature X. At some point, the registration for C is
> > updated
> > > > by C' because C' is a different process incarnation, but a
> registration
> > > > that blocks X's upgrade is always in the log.
> > > >
> > > > Therefore, Kafka should not unregister a controller when auto-join
> > > removes
> > > > a controller from the voter set. This means including a new RPC
> version
> > > for
> > > > `RemoveRaftVoter` that introduces a boolean field telling the active
> > > > controller whether to also unregister the controller. This field
> would
> > be
> > > > completely ignored by the raft layer, and instead would be handled at
> > the
> > > > ControllerApis level. I think it is fine to unregister a controller
> > > > whenever the operator runs `kafka-metadata-quorum remove-controller`
> > for
> > > a
> > > > smooth UX with dynamic quorum. What do you think?
> > > >
> > > > RE JR3: Maybe we can document this better as part of the code changes
> > to
> > > > this KIP, but in my opinion, the kafka-cluster tool deals with
> cluster
> > > > membership (brokers and controllers), which is a metadata layer
> > concept.
> > > If
> > > > you look at the `list-endpoints` command, you can list out the
> > registered
> > > > controller endpoints. Alternatively, the kafka-metadata-quorum tool
> > deals
> > > > with KRaft, which knows about concepts like leader, voter, and
> > observers.
> > > > The `add-controller` and `remove-controller` sub-commands
> inadvertently
> > > > deal with controllers (since controllers can be voters), but the
> > > `describe`
> > > > sub-command tree also shows information about brokers, which are
> > > observers
> > > > to KRaft. My decision to include the `unregister-controller` command
> in
> > > the
> > > > `kafka-cluster` tool is mainly motivated by this distinction.
> > > Additionally,
> > > > if we only send `RemoveVoterRequest` in `remove-controller`, it seems
> > > hacky
> > > > to direct users to use that command for unregistering any controller,
> > > since
> > > > for observers, the remove voter logic of that request will always
> fail
> > in
> > > > the raft layer. What do you think?
> > > >
> > > > Best,
> > > > Kevin Wu
> > > >
> > > >
> > > > On Tue, Apr 21, 2026 at 8:17 AM Paolo Patierno <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hi Kevin,
> > > > > thanks for the KIP.
> > > > > From reading it, it's not clear because not explicit, but I would
> > > assume
> > > > > you are going to expose a new unregisterController method through
> the
> > > > > AdminClient API as well, is my assumption right?
> > > > > I expect it would be used underneath by the tools you are going to
> > > > modify.
> > > > > Having such support within the AdminClient API is important when
> the
> > > > > operator is not a human to run the tool but a Kubernetes operator
> > (i.e.
> > > > > Strimzi) with the need to unregister a controller.
> > > > >
> > > > > Thanks,
> > > > > Paolo.
> > > > >
> > > > > On Mon, 20 Apr 2026 at 21:57, Kevin Wu <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > RE JR1: I would say the main use case is dynamic quorums, since
> the
> > > > > concept
> > > > > > of the observer controller becomes a thing in that world.
> However,
> > > > there
> > > > > is
> > > > > > a static quorum edge case if the operator misconfigures
> > > > > > `controller.quorum.voters`. If a new controller voter mistakenly
> > > joins
> > > > > the
> > > > > > cluster, it will also persist a registration record. In my
> opinion,
> > > > there
> > > > > > should be a way to remove a controller registration via
> AdminClient
> > > CLI
> > > > > in
> > > > > > all quorum modes.
> > > > > >
> > > > > > RE JR2: Yes, the existing command only removes the voter, but
> does
> > > not
> > > > > > unregister the controller. I left it as a separate flag for now
> > > because
> > > > > > they are "separate" operations in that being a raft voter is a
> > subset
> > > > of
> > > > > > being a controller in dynamic quorums, but I am not opposed to
> > making
> > > > > this
> > > > > > command try to do both (remove voter and unregister the
> controller)
> > > by
> > > > > > default. In my opinion, an observer controller is "useless" in
> that
> > > it
> > > > > does
> > > > > > not participate in the leader election or replication parts of
> the
> > > > KRaft
> > > > > > protocol, so I see no issue with doing both operations always.
> > > However,
> > > > > an
> > > > > > operator may want observer controllers around for other reasons
> > like
> > > > > > redundancy. Do you (or others) have any insight into how users
> may
> > be
> > > > > > configuring clusters with observer controllers? If not, I think
> it
> > is
> > > > > okay
> > > > > > to remove the flag and make it the default behavior of
> > > > > > `kafka-metadata-quorum remove-controller`.
> > > > > >
> > > > > > RE JR3: Not exactly. The `kafka-metadata-quorum remove-controller
> > ...
> > > > > > --unregister` sends 2 RPCs to the active controller, one to
> remove
> > a
> > > > node
> > > > > > from the voter set, and another to unregister the node. The
> > > > > `kafka-cluster
> > > > > > unregister-controller` command just sends 1 RPC to the active
> > > > controller
> > > > > to
> > > > > > unregister the node. My motivation for having two separate
> commands
> > > is
> > > > > > because `remove-controller` is associated with dynamic quorum,
> > since
> > > > the
> > > > > > `RemoveRaftVoterRPC` will fail if the kraft.version=0. What do
> you
> > > > think?
> > > > > >
> > > > > > RE JR4: I have updated the sections for the CLI commands in the
> KIP
> > > to
> > > > > add
> > > > > > this information.
> > > > > >
> > > > > > RE JR5: This is describing the current implementation of the
> > > > > > ControllerRegistrationManager, which will listen to the metadata
> > log
> > > > and
> > > > > > send ControllerRegistrationRequest when the local node id is not
> > > > > registered
> > > > > > in the log. It looks like this is slightly different from how we
> > > handle
> > > > > > broker registration in BrokerLifecycleManager. Currently, this
> code
> > > > path
> > > > > > never executes because controller registrations cannot be
> removed.
> > > > > >
> > > > > > Best,
> > > > > > Kevin Wu
> > > > > >
> > > > > > On Fri, Apr 17, 2026 at 2:08 PM Jun Rao via dev <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, Kevin,
> > > > > > >
> > > > > > > Thanks for the KIP. A few comments.
> > > > > > >
> > > > > > > JR1. I guess this is only intended for dynamic KRaft quorums?
> If
> > > so,
> > > > it
> > > > > > > would be useful to clarify that.
> > > > > > >
> > > > > > > JR2. kafka-metadata-quorum remove-controller --controller-id
> 9990
> > > > > > > --controller-directory-id EXAMPLE_UUID --unregister
> > > > > > > So, the existing remove-controller logic only changes the voter
> > > set,
> > > > > but
> > > > > > > doesn't unregister the controller? Should we just always do
> these
> > > two
> > > > > > > together? Is there a use case for only removing a controller
> from
> > > the
> > > > > > voter
> > > > > > > set, but not unregsitering?
> > > > > > >
> > > > > > > JR3. Is kafka-cluster unregister-controller equivalent to
> > > > > > > kafka-metadata-quorum remove-controller --controller-id 9990
> > > > > > > --controller-directory-id EXAMPLE_UUID --unregister?
> > > > > > >
> > > > > > > JR4. Could you describe the underlying workflow for each new
> > > command
> > > > > > (RPCs
> > > > > > > sent, metadata records generated, actions taken by the
> > controller,
> > > > > etc)?
> > > > > > >
> > > > > > > JR5. "The registration manager of an unregistered controller
> > > already
> > > > > > > attempts to re-register with the active controller. This is to
> > > > prevent
> > > > > > > accidental unregistrations."
> > > > > > > I don't quite understand this. Why will an unregistered
> > controller
> > > > > > attempt
> > > > > > > to re-register?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Fri, Apr 3, 2026 at 11:31 AM Kevin Wu <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I would like to start a discussion on KIP-1312: Support
> > > > unregistering
> > > > > > > > controllers. Below is the KIP link.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1312*3A*Support*unregistering*controllers__;JSsrKw!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFeUg-7gg$
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Kevin Wu
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Paolo Patierno
> > > > >
> > > > > *Senior Principal Software Engineer @ IBM**CNCF Ambassador*
> > > > >
> > > > > Twitter : @ppatierno <
> >
> https://urldefense.com/v3/__http://twitter.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtHGG-mS-Q$
> > >
> > > > > Linkedin : paolopatierno <
> >
> https://urldefense.com/v3/__http://it.linkedin.com/in/paolopatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtFcWWCD5g$
> > >
> > > > > GitHub : ppatierno <
> >
> https://urldefense.com/v3/__https://github.com/ppatierno__;!!Ayb5sqE7!phwOrPrBZoQb1P44rCfpPBt74v80NjCTOGhgaRQx1XFXCy1x61QR9b9xw3zfvo-aFvVsFYczOxbTVtEK-wncPw$
> > >
> > > > >
> > > >
> > >
> >
>

Reply via email to