Re: [DISCUSS] KIP-1279: Cluster Mirroring

Luke Chen Tue, 24 Feb 2026 04:40:25 -0800

Hi Andrew,

Thanks for the review.


I think Fede already answered all the questions.
But about AS5, you made me think more about the possibility of supporting
the unclean leader election. (though I already did that many times :))

So, what we can do is:
1. In the destination cluster leader, we mirror the batches from the source
cluster, and keep the leader epoch in the batch as is. That is, the leader
epoch in the batch can be 10, while the local leader epoch is 1. The leader
epoch cache also updates when the receiving batches from the source cluster
leader, instead of when the local cluster leadership changes.
2. Because of (1), this destination cluster leader node can act as a
follower in the source cluster, to find out the diverging log offset when
unclean leader election happens in the source cluster, because the
"LastFetchedEpoch" in the fetch request can be set as the correct value.
3. To avoid the unclean leader election issue like the KIP described
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring#KIP1279:ClusterMirroring-Uncleanleaderelection(LMOisnotsufficient)>,
we should do some things for it:
3.1. When fail over to the destination cluster, we should store the [last
mirrored leader epoch] instead of the last mirrored offset.
3.2. Force bumping the leader epoch in the destination cluster to the
leader epoch > the latest batch leader epoch. That means, any leader epoch
<= last mirrored leader epoch is already synced up with the source cluster.
3.3. When fail back, we first query the last mirrored leader epoch from the
source cluster, then truncate based on the last mirrored leader epoch. This
is the last mirrored leader epoch matching the source cluster. That means
every record beyond the leader epoch should be truncated.
3.4. After (3.3), all records should have leader epoch <= last mirrored
leader epoch, then we can send fetch request as usual, and let the fetch
response to handle the truncation if any. For example, the leader epoch 3
in the destination cluster is offset10, but the leader epoch 3 in source
cluster ends at offset 8, so that means it can detect it and truncate to
offset 8 in the destination cluster.
3.5. After (3.4), all the leader epoch and records should sync up with the
source cluster. Then we can jump back to step 1 to fetch as normal
follower, and detect the log divergence even if the source cluster has
unclean leader election.

Does this make sense?
In theory, this might work. I need to think more about it, discuss with my
team members, and try to implement it to verify.

Thank you,
Luke


On Wed, Feb 18, 2026 at 10:28 PM Federico Valeri <[email protected]>
wrote:

> Hi Andrew, thanks for the review.
>
> Let me try to answer your questions and then other authors can join
> the discussion.
>
> AS1
> ------
>
> Destination topics are created with the same topic IDs using the
> extended CreateTopics API. Then, data is replicated starting from
> offset 0 with byte-for-byte batch copying, so destination offsets
> always match source offsets. When failing over, we record the last
> mirrored offset (LMO) in the destination cluster. When failing back,
> the LMO is used for truncating and then start mirroring the delta,
> otherwise we start mirroring from scratch by truncating to zero.
>
> Retention: If the mirror leader attempts to fetch an offset that is
> below the current log start offset of the source leader (e.g. fetching
> offset 50 when log start offset is 100), the source broker returns an
> OffsetOutOfRangeException that the mirror leader handles by truncating
> to the source's current log start offset and resuming fetching from
> that point. Compaction: The mirror leader replicates these compacted
> log segments exactly as they exist in the source cluster, maintaining
> the same offset assignments and gaps.
>
> Do you have any specific corner case in mind?
>
> AS2
> ------
>
> Agreed. The current AlterShareGroupOffsetsRequest (v0) only includes
> PartitionIndex and StartOffset with no epoch field. When mirroring
> share group offsets across clusters, the epoch is needed to ensure the
> offset alteration targets the correct leader generation.
>
> AS3
> ------
>
> Right, the enum is now fixed. Yes, we will parse from the right and
> apply the same naming rules used for topic name ;)
>
> AS4
> -------
>
> Agreed. I'll try to improve those paragraphs because they are crucial
> from an operational point of view.
>
> Let me shortly explain how it is supposed to work:
>
> 9091 (source) -----> 9094 (destination)
>
> The single operation that allows an operator to switch all topics at
> once in case of disaster is the following:
>
> bin/kafka-mirror.sh --bootstrap-server :9094 --remove --topic .*
> --mirror my-mirror
>
> 9091 (source) --x--> 9094 (destination)
>
> After that, all mirror topics become detached from the source cluster
> and start accepting writes (the two cluster are allowed to diverge).
>
> When the source cluster is back, the operator can failback by creating
> a mirror with the same name on the source cluster (new destination):
>
> echo "bootstrap.servers=localhost:9094" > /tmp/my-mirror.properties
> bin/kafka-mirrors.sh --bootstrap-server :9091 --create --mirror
> my-mirror --mirror-config /tmp/my-mirror.properties
> bin/kafka-mirrors.sh --bootstrap-server :"9091 --add --topic .*
> --mirror my-mirror
>
> 9091 (destination) <----- 9094 (source)
>
> AS5
> -------
>
> This is the core of our design and we reached that empirically by
> trying out different options. We didn't want to change local
> replication, and this is something you need to do when preserving the
> source leader epoch. The current design is simple and keeps the epoch
> domains entirely separate. Destination cluster is in charge of the
> leader epoch for its own log. The source epoch is only used during the
> fetch protocol to validate responses and detect divergence.
>
> The polarity idea of tracking whether an epoch bump originated from
> replication vs. local leadership change is interesting, but adds
> significant complexity and coupling between source and destination
> epochs. Could you clarify what specific scenario polarity tracking
> would address that the current separation doesn't handle? One case we
> don't support is unclean leader election reconciliation across
> clusters, is that the gap you're aiming at?
>
> I tried to rewrite the unclean leader election paragraph in the
> rejected alternatives to be easier to digest. Let me know if it works.
>
> On Tue, Feb 17, 2026 at 2:57 PM Andrew Schofield
> <[email protected]> wrote:
> >
> > Hi Fede and friends,
> > Thanks for the KIP.
> >
> > It’s a comprehensive design, easy to read and has clearly taken a lot of
> work.
> > The principle of integrating mirroring into the brokers makes total
> sense to me.
> >
> > The main comment I have is that mirroring like this cannot handle
> situations
> > in which multiple topic-partitions are logically related, such as
> transactions,
> > with total fidelity. Each topic-partition is being replicated as a
> separate entity.
> > The KIP calls this out and describes the behaviour thoroughly.
> >
> > A few initial comments.
> >
> > AS1) Is it true that offsets are always preserved by this KIP? I *think*
> so but
> > not totally sure that it’s true in all cases. It would certainly be nice.
> >
> > AS2) I think you need to add epoch information to
> AlterShareGroupOffsetsRequest.
> > It really should already be there in hindsight, but I think this KIP
> requires it.
> >
> > AS3) The CoordinatorType enum for MIRROR will need to be 3 because 2 is
> SHARE.
> > I’m sure you’ll parse the keys from the right ;)
> >
> > AS4) The procedure for achieving a failover could be clearer. Let’s say
> that I am
> > using cluster mirroring to achieve DR replication. My source cluster is
> utterly lost
> > due to a disaster. What’s the single operation that I perform to switch
> all of the
> > topics mirrored from the lost source cluster to become the active topics?
> > Similarly for failback.
> >
> > AS5) The only piece that I’m really unsure of is the epoch management. I
> would
> > have thought that the cluster which currently has the writable
> topic-partition
> > would be in charge of the leader epoch and it would not be necessary to
> > perform all of the gymnastics described in the section on epoch
> rewriting.
> > I have read the Rejected Alternatives section too, but I don’t fully
> grasp
> > why it was necessary to reject it.
> >
> > I wonder if we could store the “polarity” of an epoch, essentially
> whether the
> > epoch bump was observed by replication from a source cluster, or whether
> > it was bumped by a local leadership change when the topic is locally
> writable.
> > When a topic-partition switches from read-only to writable, we should
> definitely
> > bump the epoch, and we could record the fact that it was a local epoch.
> > When connectivity is re-established, you might find that both ends have
> > declared a local epoch N, but someone has to win.
> >
> > Thanks,
> > Andrew
> >
> > > On 14 Feb 2026, at 07:17, Federico Valeri <[email protected]>
> wrote:
> > >
> > > Hi, we would like to start a discussion thread about KIP-1279: Cluster
> > > Mirroring.
> > >
> > > Cluster Mirroring is a new Kafka feature that enables native,
> > > broker-level topic replication across clusters. Unlike MirrorMaker 2
> > > (which runs as an external Connect-based tool), Cluster Mirroring is
> > > built into the broker itself, allowing tighter integration with the
> > > controller, coordinator, and partition lifecycle.
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring
> > >
> > > There are a few missing bits, but most of the design is there, so we
> > > think it is the right time to involve the community and get feedback.
> > > Please help validating our approach.
> > >
> > > Thanks
> > > Fede
> >
>

Re: [DISCUSS] KIP-1279: Cluster Mirroring

Reply via email to