Hey Henry,

Thank you for your time and response! I really like your KIP-1248 about
offloading the consumption of remote log away from the broker, and I think
with that change, the topic that enables the tiered storage can also have
longer retention configurations and would benefit from this KIP too.

Some suggestions: In your example scenarios, it would also be good to add
> an example of remote log segment deletion triggered by retention policy
> which will trigger generation of tombstone event into metadata topic and
> trigger log compaction/deletion 24 hour later, I think this is the key
> event to cap the metadata topic size.


Regarding to this suggestion, I am not sure whether Scenario 4
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion>
has
covered it. I can add more rows in the Timeline Table like T5+24hour to
indicate the messages are gone by then to explicitly show that messages are
deleted, thus the number of messages are capped in the topic.

Regarding whether the topic __remote_log_metadata is still necessary, I am
inclined to continue to have this topic at least for debugging purposes so
we can build confidence about the compacted topic change, we can
always choose to remove this topic in the future once we all agree it
provides limited value for the users.

Thanks,
Lijun Tong


Henry Haiying Cai via dev <[email protected]> 于2026年1月5日周一 16:19写道:

> Lijun,
>
> Thanks for the proposal and I liked your idea of using a compacted topic
> for tiered storage metadata topic.
>
> In our setup, we have set a shorter retention (3 days) for the tiered
> storage metadata topic to control the size growth.  We can do that since we
> control all topic's retention policy in our clusters and we set a uniform
> retention.policy for all our tiered storage topics.  I can see other
> users/companies will not be able to enforce that retention policy to all
> tiered storage topics.
>
> Some suggestions: In your example scenarios, it would also be good to add
> an example of remote log segment deletion triggered by retention policy
> which will trigger generation of tombstone event into metadata topic and
> trigger log compaction/deletion 24 hour later, I think this is the key
> event to cap the metadata topic size.
>
> For the original unbounded remote_log_metadata topic, I am not sure
> whether we still need it or not.  If it is left only for audit trail
> purpose, people can set up a data ingestion pipeline to ingest the content
> of metadata topic into a separate storage location.  I think we can have a
> flag to have only one metadata topic (the compacted version).
>
>
> On Monday, January 5, 2026 at 01:22:42 PM PST, Lijun Tong <
> [email protected]> wrote:
>
>
>
>
>
> Hello Kafka Community,
>
> I would like to start a discussion on KIP-1266, which proposes to add
> another new compacted remote log metadata topic for the tiered storage, to
> limit the number of messages that need to be iterated to build the remote
> metadata state.
>
> KIP link: KIP-1266 Bounding The Number Of RemoteLogMetadata Messages via
> Compacted RemoteLogMetadata Topic
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic
> >
>
> Background:
> The current Tiered Storage implementation uses a __remote_log_metadata
> topic with infinite retention and delete-based cleanup policy, causing
> unbounded growth, slow broker bootstrap, no mechanism to clean up expired
> segment metadata, and inefficient re-reading from offset 0 during
> leadership changes.
>
> Proposal:
> A dual-topic approach that introduces a new __remote_log_metadata_compacted
> topic using log compaction with deterministic offset-based keys, while
> preserving the existing topic for audit history; this allows brokers to
> build their metadata cache exclusively from the compacted topic, enables
> cleanup of expired segment metadata through tombstones, and includes a
> migration strategy to populate the new topic during upgrade—delivering
> bounded metadata growth and faster broker startup while maintaining
> backward compatibility.
>
> More details are in the attached KIP link.
> Looking forward to your thoughts.
>
> Thank you for your time!
>
> Best,
> Lijun Tong
>

Reply via email to