Hi Henry and all, Thank you for this KIP! Compaction is a noticeable feature gap between tiered and non-tiered topics. Besides, in Diskless the same feature will have to be implemented either through unification with tiered storage (i.e. quasi-automatically when this KIP is accepted) or differently (but still largely reusing the cleaner code).
I have a couple of comments. IY1: Instead of buffering remote segments on the broker disk, have you considered running the whole operation in the streaming manner? The current RemoteStorageManager interface allows this as it returns InputStream, which can be read in the controlled manner by small chunks (i.e. by batch). Indeed, it will effectively mean double downloading (i.e. 2x more paid GET requests), but broker disk will not be needed, the page cache will be less disturbed, so it may be a good trade-off. This may be a bit more difficult to do the the output segment, so it's still probably better be created on the temp disk. IY2: Regardless of the implementation (streaming or temp disk), I think it may be a good idea to consider expanding the RemoteStorageManager interface in order to let fetchLogSegment know that log data is requested for clearing and not for consumer. A RemoteStorageManager implementation may use internal cache and without this distinction, clearing will be constantly evicting it. Best, Ivan On Mon, Jan 12, 2026, at 07:51, Henry Haiying Cai via dev wrote: > Hi all, > > I’d like to start a discussion on KIP-1272 which adds support for compacted > topic in tiered storage, completes one of the missing gaps in KIP-405. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272%3A+Support+compacted+topic+in+tiered+storage > > I’d appreciate your feedback and questions on the proposal. > > Thanks, > > Henry Cai, Tom Thornton and Greg Harris >
