Hi dev community,

I'd like to share some findings on how rotation of active segment differ
depending on whether topic retention is time- or size-based.

I was (wrongly) under the assumption that active segments are only rotated
when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global
log configs (log.roll.ms) force it  -- regardless of the retention
configuration.
This seems to be different depending on how retention is defined:

- If a topic has a retention based on time[1], the condition to rotate the
active segment is based on the latest timestamp. If the difference with
current time is largest than retention time, then segment (including
active) should be deleted. Active segment is rotated, and in next round is
deleted.

- If a topic has retention based on size[2] though, the condition not only
depends on the size of the segment itself but first on the total log size,
forcing to always have at least a single (active) segment: first difference
between total log size and retention is calculated, let's say a single
segment of 5MB and retention is 1MB; then total difference is 4MB, then the
condition to delete validates if the difference of the current segment and
the total difference is higher than zero, then delete. As the segment size
will always be higher than the total difference when there is a single
segment, then there will always be at least 1 segment. In this case the
only case where active segment is rotated it is when a new message arrives.

Added steps to reproduce[3].

Maybe I missing something obvious, but this seems inconsistent to me.
Either both retention configs should rotate active segments, or none of
them should and active segment should be only governed by segment bytes|ms
configs or log.roll config.

I believe it's a useful feature to "force" active segment rotation without
changing segment of global log rotation given that features like Compaction
and Tiered Storage can benefit from this; but would like to clarify this
behavior and make it consistent for both retention options, and/or call it
out explicitly in the documentation.

Looking forward to your feedback!

Jorge.

[1]:
https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566
[2]:
https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583

[3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2

Reply via email to