Hey all!

I realized that the information above is a bit in-the-weeds, and I think a
re-framing of the situation might be necessary.

Since the release of MM2, offsets translation has been limited to only
performing translation ahead of the most recent offset sync. This
limitation appears to have worked for existing use-cases where offset syncs
are infrequent.
For topics which emit offset syncs frequently, the window for offset
translation becomes shorter, and may become unusable. In those unusable
cases, offset translation may stop completely for an otherwise
fully-functional steady-state MM2 instance.
Recently, we have been interested in improving the correctness of offset
translation to address data loss, and those fixes end up causing more
offset syncs to be emitted, making the translation window smaller than
before, and often unusable.

Q1. Would an improvement to allow translation from earlier in the topic be
reasonable to propose in a KIP?
Q2. Is anyone relying on the current poor correctness and high availability
translation, such that making the availability worse is a
backwards-incompatible regression?
Q3. Should we prioritize correctness, even if it hurts availability?
Q4. Should we address correctness and availability of this feature in a
patch or only minor releases?
Q5. Is there some tactical improvement to availability we can make which
does not count as backwards-incompatible, allowing us to land the
correctness fix without a KIP?
Q6. Do you have any suggestions on how to improve availability of offset
translation?

I'm interested in finding a tactical solution that we can backport, and a
holistic solution for more future use-cases.
I hope that the above is more clear.

Thanks!
Greg

On Fri, Mar 10, 2023 at 12:16 PM Greg Harris <greg.har...@aiven.io> wrote:

> Hi all,
>
> Recently, we've been experimenting with using MM2 to mirror topics that
> were populated by transactional producers. We've noticed that MM2
> replicates records but not transaction markers, causing certain offsets to
> appear in the source topic but not destination topic. These behaviors can
> also be seen when using Filter SMTs, or when replicating topics which have
> undergone compaction, which cause the same concentration of offsets in the
> target topic.
>
> This has the following negative effects with offset translation:
> P1. When starting replication on an existing topic with existing consumer
> groups, offsets are translated beyond the end of the topic, leading to
> "negative lag" for the downstream consumer group
> P2. When in a "negative lag" situation, and a consumer fail-over from
> source to is triggered, downstream consumption will stall until the
> downstream offsets exceed the "negative lag" offsets.
> P3. When failing over from source to target, certain records may have been
> ahead of the upstream consumer group and behind the downstream consumer
> group, leading to records not being delivered at least once.
>
> We merged a solution the above by making a change to the translation logic
> in https://issues.apache.org/jira/browse/KAFKA-12468 , and settled on a
> strategy to make offset translation more conservative, effectively making
> it such that the MirrorCheckpointTask only emits offsets at or immediately
> after the latest offset sync. This has the effect that offsets are more
> correct than previously, but that did not come without costs:
>
> P4. More offset syncs must be emitted to the offset syncs topic to enforce
> the `offset.lag.max` config property, once per `offset.max.lag` records
> (regression in the original PR, addressed by
> https://issues.apache.org/jira/browse/KAFKA-14797)
> P5. More recent offset syncs narrow the window in which translation can
> take place, leading to some translated offsets becoming excessively stale.
> This limitation is captured in
> https://issues.apache.org/jira/browse/KAFKA-14666 .
> P6. Even with the above fixes, offset translation won't be able to
> translate ahead the latest offset sync, and offsets may not converge
> exactly to the end of the topic.
>
> Fixing KAFKA-14797 appears possible without a KIP, but it is unclear
> whether KAFKA-14666 requires a KIP to resolve.
>
> To summarize:
> * Released versions of Kafka have reasonable behavior for normal topics,
> and correctness problems for compacted, filtered, and transactional topics.
> * KAFKA-12468 fixes correctness for compacted, filtered, and transactional
> topics, and regresses availability for all topics
> * KAFKA-14797 makes availability better for normal topics, but still worse
> than release.
> * KAFKA-14666 makes availability better for all topics, but still worse
> than release.
>
> Questions:
> Q1. Does KAFKA-14666 require a KIP to resolve?
> Q2. Is the increased likelihood of KAFKA-14666 caused by KAFKA-14797 a
> regression in behavior?
> Q3. Is the KAFKA-12468 correctness fix worth the general availability loss
> (P6) that is bounded by offset.lag.max?
> Q4. Is some or all of the above eligible for release in a patch release,
> or should these fixes be contained to just a minor release?
> Q5. Can we make a tactical fix for KAFKA-14666 to enable users to
> workaround the issue?
> Q6. Do you have any alternative solutions for KAFKA-14666 that we should
> consider?
>
> I want to understand if we need to revert the correctness fix already
> merged, or if we can address correctness now and availability later.
>
> Thanks,
> Greg
>

Reply via email to