Hi Henry and all,

Thank you for this KIP! Compaction is a noticeable feature gap between tiered 
and non-tiered topics. Besides, in Diskless the same feature will have to be 
implemented either through unification with tiered storage (i.e. 
quasi-automatically when this KIP is accepted) or differently (but still 
largely reusing the cleaner code).

I have a couple of comments.

IY1: Instead of buffering remote segments on the broker disk, have you 
considered running the whole operation in the streaming manner? The current 
RemoteStorageManager interface allows this as it returns InputStream, which can 
be read in the controlled manner by small chunks (i.e. by batch). Indeed, it 
will effectively mean double downloading (i.e. 2x more paid GET requests), but 
broker disk will not be needed, the page cache will be less disturbed, so it 
may be a good trade-off.
This may be a bit more difficult to do the the output segment, so it's still 
probably better be created on the temp disk.

IY2: Regardless of the implementation (streaming or temp disk), I think it may 
be a good idea to consider expanding the RemoteStorageManager interface in 
order to let fetchLogSegment know that log data is requested for clearing and 
not for consumer. A RemoteStorageManager implementation may use internal cache 
and without this distinction, clearing will be constantly evicting it.

Best,
Ivan

On Mon, Jan 12, 2026, at 07:51, Henry Haiying Cai via dev wrote:
> Hi all,
> 
> I’d like to start a discussion on KIP-1272 which adds support for compacted 
> topic in tiered storage, completes one of the missing gaps in KIP-405.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272%3A+Support+compacted+topic+in+tiered+storage
> 
> I’d appreciate your feedback and questions on the proposal.
> 
> Thanks,
> 
> Henry Cai, Tom Thornton and Greg Harris
> 

Reply via email to