[jira] [Updated] (KAFKA-16511) Leaking tiered segments
[ https://issues.apache.org/jira/browse/KAFKA-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen updated KAFKA-16511: -- Fix Version/s: 3.7.1 > Leaking tiered segments > --- > > Key: KAFKA-16511 > URL: https://issues.apache.org/jira/browse/KAFKA-16511 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.7.0 >Reporter: Francois Visconte >Assignee: Kamal Chandraprakash >Priority: Major > Labels: tiered-storage > Fix For: 3.8.0, 3.7.1 > > > I have some topics there were not written since a few days (having 12h > retention) where some data remains on tiered storage (in our case S3) and > they are never deleted. > > Looking at the log history, it appears that we never even tried to delete > these segments: > When looking at one of the non-leaking segment, I get the following > interesting messages: > > {code:java} > "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current > earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), > segment-end-offset: 2976819 and segment-epochs: [5]" > "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data > for completed successfully > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" > "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data > for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" > "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied > 02968418.log to remote storage with segment-id: > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ}" > "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data > completed successfully, metadata: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" > "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, > metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" > {code} > > Which looks right because we can see logs from both the plugin and remote log > manager indicating that the remote log segment was removed. > Now if I look on one of the leaked segment, here is what I see > > {code:java} > "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied > 02971163.log to remote storage with segment-id: > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, > id=8dP13VDYSaiFlubl9SNBTQ}" > "2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data > completed successfully, metadata: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, > id=8dP13VDYSaiFlubl9SNBTQ} > , startOffset=2971163, endOffset=2978396, brokerId=10001, > maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, > segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" >
[jira] [Updated] (KAFKA-16511) Leaking tiered segments
[ https://issues.apache.org/jira/browse/KAFKA-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash updated KAFKA-16511: - Description: I have some topics there were not written since a few days (having 12h retention) where some data remains on tiered storage (in our case S3) and they are never deleted. Looking at the log history, it appears that we never even tried to delete these segments: When looking at one of the non-leaking segment, I get the following interesting messages: {code:java} "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), segment-end-offset: 2976819 and segment-epochs: [5]" "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data for completed successfully RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied 02968418.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ}" "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} Which looks right because we can see logs from both the plugin and remote log manager indicating that the remote log segment was removed. Now if I look on one of the leaked segment, here is what I see {code:java} "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied 02971163.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ}" "2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-02T00:43:20.003Z","""kafka""","""10001""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} I have no errors whatsoever indicating that the remote log deletion was actually triggered and failed. I tried rolling restarting my cluster to see if refreshing
[jira] [Updated] (KAFKA-16511) Leaking tiered segments
[ https://issues.apache.org/jira/browse/KAFKA-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Visconte updated KAFKA-16511: -- Description: I have some topics there were not written since a few days (having 12h retention) where some data remains on tiered storage (in our case S3) and they are never deleted. Looking at the log history, it appears that we never even tried to delete these segments: When looking at one of the non-leaking segment, I get the following interesting messages: {code:java} "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), segment-end-offset: 2976819 and segment-epochs: [5]" "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data for completed successfully RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied 02968418.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ}" "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} Which looks right because we can see logs from both the plugin and remote log manager indicating that the remote log segment was removed. Now if I look on one of the leaked segment, here is what I see {code:java} "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied 02971163.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ}" "2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-02T00:43:20.003Z","""kafka""","""10001""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} I have no errors whatsoever indicating that the remote log deletion was actually triggered and failed. I tried rolling restarting my cluster to see if refreshing
[jira] [Updated] (KAFKA-16511) Leaking tiered segments
[ https://issues.apache.org/jira/browse/KAFKA-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Visconte updated KAFKA-16511: -- Labels: tiered-storage (was: ) > Leaking tiered segments > --- > > Key: KAFKA-16511 > URL: https://issues.apache.org/jira/browse/KAFKA-16511 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.7.0 >Reporter: Francois Visconte >Priority: Major > Labels: tiered-storage > > I have some topics there were not written since a few days (having 12h > retention) where some data remains on tiered storage (in our case S3) and > they are never deleted. > > Looking at the log history, it appears that we never even tried to delete > these segments: > When looking at one of the non-leaking segment, I get the following > interesting messages: > > {code:java} > "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current > earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), > segment-end-offset: 2976819 and segment-epochs: [5]" > "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data > for completed successfully > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" > "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data > for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" > "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied > 02968418.log to remote storage with segment-id: > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ}" > "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data > completed successfully, metadata: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" > "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, > metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, > id=fqGng3UURCG3-v4lETeLKQ} > , startOffset=2968418, endOffset=2976819, brokerId=10029, > maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, > segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" > {code} > > Which looks right because we can see logs from both the plugin and remote log > manager indicating that the remote log segment was removed. > Now if I look on one of the leaked segment, here is what I see > > {code:java} > "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 > partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied > 02971163.log to remote storage with segment-id: > RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, > id=8dP13VDYSaiFlubl9SNBTQ}" > "2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data > completed successfully, metadata: > RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId > {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, > id=8dP13VDYSaiFlubl9SNBTQ} > , startOffset=2971163, endOffset=2978396, brokerId=10001, > maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, > segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, > customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" > "2024-04-02T00:43:20.003Z","""kafka""","""10001""","Copying log segment data, > metadata:
[jira] [Updated] (KAFKA-16511) Leaking tiered segments
[ https://issues.apache.org/jira/browse/KAFKA-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Visconte updated KAFKA-16511: -- Description: I have some topics there were not written since a few days (having 12h retention) where some data remains on tiered storage (in our case S3) and they are never deleted. Looking at the log history, it appears that we never even tried to delete these segments: When looking at one of the non-leaking segment, I get the following interesting messages: {code:java} "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), segment-end-offset: 2976819 and segment-epochs: [5]" "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data for completed successfully RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}" "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied 02968418.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ}" "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, id=fqGng3UURCG3-v4lETeLKQ} , startOffset=2968418, endOffset=2976819, brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} Which looks right because we can see logs from both the plugin and remote log manager indicating that the remote log segment was removed. Now if I look on one of the leaked segment, here is what I see {code:java} "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied 02971163.log to remote storage with segment-id: RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ}" "2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data completed successfully, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" "2024-04-02T00:43:20.003Z","""kafka""","""10001""","Copying log segment data, metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, id=8dP13VDYSaiFlubl9SNBTQ} , startOffset=2971163, endOffset=2978396, brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=1712018599981, segmentLeaderEpochs={7=2971163}, segmentSizeInBytes=459778940, customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}" {code} I have no errors whatsoever indicating that the remote log deletion was actually triggered and failed. I tried rolling restarting my cluster to see if refreshing