Hi Kaushik,

I've spent a good part of the last year building and operating automated
KRaft migrations for a large fleet of managed Kafka clusters with my team
at Aiven, so I'll share what I think are the reusable parts of that
experience.

The short answer is that the automation is entirely feasible, and we run it
online with
no downtime. But the migration sequence documented — provision the
controller quorum in
migration mode, move the brokers through migration mode onto KRaft, then
finalize — is
not a sufficient guide for implementing this. It does tell you what to wait
for between
steps, but the signals it points to are operator-facing rather than
programmatic: to
know the metadata migration has completed, for example, it directs you to
watch for an
INFO log line on the active controller, "Completed migration of metadata
from Zookeeper
to KRaft", and to raise log verbosity to TRACE while the migration runs.
What it does
not describe is how to coordinate the rolling restarts and those waits
across the nodes
in code. Both are reasonable omissions for a runbook executed by a human
operator, but
not the best from a perspective of automating migrations.

The parts that took the real work were these:

Gating every step on the cluster being healthy first. A migration raises
risk across the
board, so before each transition we require the cluster to be in a boring,
stable state,
and we refuse to proceed while anything else is in flight — an in-progress
partition
reassignment, an unexpected node count, a controller quorum that has not
fully formed.
Waiting is always the safe default, and we implement alerts for when stages
take
unexpectedly long, such that a human operator can take a look.

Coordinating the fan-out and fan-in. A side-car running for each node
carries out its
own reconfiguration and then confirms it actually took effect before
recording itself as
done. The confirmations come from signals the Kafka processes already
expose: a broker
reports whether it is in migration mode through the zkMigrationReady flag
in its
ApiVersions response, which the broker only sets once it has a valid
migration
configuration; the controller quorum reports its progress through the
ZkMigrationState
metric, which we wait to see settle into the dual-write migration state,
and we confirm
the quorum itself is formed and caught up by reading its voter set and
per-voter
replication lag from DescribeQuorum. A single elected node advances the
shared migration
state only once every node has confirmed the current step. That state lives
in a store
all the nodes agree on, and every advance is a compare-and-swap, so the
coordination
stays correct even if two nodes briefly believe they are in charge.

| Stage                         | Inspected metric / response
                           | Expected value
                       |
|-------------------------------|----------------------------------------------------------------------|-----------------------------------------------------------------------|
| Controller quorum provisioned | DescribeQuorum
                            | all expected voters present, not lagging
                         |
| Brokers in migration mode     | broker `ApiVersions.zkMigrationReady`;
controller `ZkMigrationState` | `zkMigrationReady=true` on every broker;
`ZkMigrationState=MIGRATION` |
| Brokers migrated to KRaft     | broker `ApiVersions.zkMigrationReady`
                           | `zkMigrationReady=false` on every broker
                       |
| Migration finalized           | controller `ZkMigrationState`
                           | `ZkMigrationState=POST_MIGRATION`
                        |

We keep a reversible window. Once every broker is running on KRaft while
the controllers
keep writing metadata back to ZooKeeper (dual-write phase), the migration
is fully
functional yet still reversible. We deliberately hold the cluster in this
phase for a
while before the irreversible finalize step, which gives a real opportunity
to observe
production workload and allow operators to roll back a cluster if needed.

We treat topology change as a first-class concern. Nodes can be added or
replaced in the
middle of a migration, and that interacts with both the forward sequence
and the
rollback path in ways that are easy to get wrong. We treat topology changes
as supported
at every stage, but one recent case shows how subtle this is: a node
replacement during
the migration turns out to be incompatible with rolling back, because
formatting a
replacement node's storage for KRaft produces a meta.properties file that
cannot
afterwards be reverted to ZooKeeper mode. This was in general the most
tricky part to
get right, and we rely a lot on compare-and-swap guarantees from a shared
KV store for
correctness in our implementation. When node replacements happen during the
migration
sequence, in general we wait for that node replacement to complete in
whichever stage
the migration is, and continue only ones the cluster is fully stable again.

For the Kubernetes-operator case you mention, we don't run that model
ourselves, so I'll
avoid prescribing specifics — though I believe Strimzi already supports
automated KRaft
migration. It is likely worth checking
out: https://strimzi.io/blog/2024/03/22/strimzi-kraft-migration/.

One concrete contribution that may be valuable to the community is
extending the
documentation to describe the means already available for programmatically
verifying that each stage has completed — the broker and controller signals
above,
rather than observing log lines.

It may be worth also noting that we use KIP-853 membership changes. We are
really happy
with that choice and find in general that it provides really great
stability to our operations.
Where we routinely have issues with operating ZK, we do not see that with
KRaft.

I am glad to see this discussed, and I'm happy to share more about our
approach if it's
valuable to the community. I'm also planning a more detailed write-up of
our design and
will share it on this list when it's ready.

BR,
Anton

Den ons 17 juni 2026 kl 09:33 skrev Kaushik Srinivas (Nokia) via dev <
[email protected]>:

> Hi Kafka team,
> With the recent changes to Apache Kafka movement towards kraft from zk
> involving migration, if there is a workable solution to automate this
> migration, is the community open to such proposals ?
> What is the community's view of such automation and sharing across the
> kafka community ? specifically for k8s deployments of kafka where the
> brokers are managed via workload k8s controllers.
> -Kaushik.
>

Reply via email to