Hello all, I want to highlight a couple of new paragraphs:

1. Leader Epoch Invariant: Cluster mirroring enforces the invariant
that the destination leader epoch must always be greater than or equal
to the source leader epoch (DLE>=SLE). Without this, consumers on the
destination cluster can get stuck in an infinite metadata refresh loop
when they encounter committed offsets carrying source epochs higher
than the local epoch. The invariant is maintained through three
mechanisms: reactive bumping (epoch fencing triggered when SLE > DLE
during fetch), proactive bumping (scheduled when SLE approaches DLE
within a threshold), and periodic bumping (checked during coordinator
metadata sync).

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620973#KIP1279:ClusterMirroring-LeaderEpochInvariant

2. Group Offsets: The coordinator periodically syncs consumer and
share group offsets from the source cluster to the destination for all
mirrored topics. Groups are filtered by configurable include/exclude
patterns, and offsets are only synced for groups that are not
currently active on the destination cluster, preventing overwrites of
local consumer progress. Because source and destination share the same
topic offsets (no offset translation), synced offsets can be used
directly without mapping.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620973#KIP1279:ClusterMirroring-GroupOffsets

These new paragraphs directly address some of your questions, but let
me list them here:

JR2: Yes, we removed the incorrect phrase and added more details to
the paragraph.

JR4: When source cluster topic has tiered storage enabled, CM works by
mirroring remote and local log into destination cluster. When
destination cluster topic has tiered storage enabled, CM fails in
PREPARING state because the LME may be in remote storage, but works
fine if already MIRRORING because no truncation is needed.

JR11: See "Leader Epoch Invariant" paragraph mentioned above.

JR13: We can't support stateful Streams application because
asynchronous replication cannot preserve the transactional boundaries
between input offset commits, state store mutations written to
changelog topics, and intermediate records written to repartition
topics. The synchronous extension of this design will be able to
support them. Existing Features Integration paragraph updated.

JR18: See "Group Offsets" paragraph mentioned above.

IY1: See "Group Offsets" paragraph mentioned above.

Thanks
Fede

On Mon, May 11, 2026 at 6:08 PM Federico Valeri <[email protected]> wrote:
>
> Hi Vaquar,
>
> VK4/VK8: We don't do PID mapping anymore. The KIP was updated some
> time ago with the new approach based on the new PID reset control
> record.
>
> VK11: The transaction index is always built locally during log append,
> never copied.
>
> VK1: The 50,000 * 1MB = 50GB calculation misunderstands the fetch
> model. Fetcher threads don't allocate one buffer per partition. Actual
> peak memory is roughly num_fetcher_threads * response_max_bytes, not
> num_partitions * partition_max_bytes. With 1 fetcher thread and the
> default response max, the memory footprint is modest regardless of
> partition count. We are leveraging the same proven pattern used by the
> internal replication.
>
> VK3: The __mirror_state topic uses hash-based partitioning based on
> mirrorName, topicId and partition number. With the production default
> of 50 partitions, 50,000 partition transitions distribute across ~50
> partition leaders on different brokers, not a single broker. This is
> the same proven pattern as __consumer_offsets, which handles millions
> of commits.
>
> VK12: The scenario described can only occur if offsets are
> force-written to an active group, which the design prevents.
>
> Cheers
> Fede

Reply via email to