[jira] (KAFKA-16511) Leaking tiered segments

2024-04-16 Thread Francois Visconte (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16511 ]


Francois Visconte deleted comment on KAFKA-16511:
---

was (Author: JIRAUSER288982):
>The issue might be due to the overlapping remote log segments after a new 
>leader gets elected during rolling restart. Would you please upload the past 
>10 segments remote-log-segment metadata events for 
>5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765 partition? Thanks!

Here is 


{code:java}
partition: 27, offset: 400504, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=v8QykWvQQryMxNoS1aHMCQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712045518393, brokerId=10015}
partition: 27, offset: 400505, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=v8QykWvQQryMxNoS1aHMCQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712045518494, brokerId=10015}
partition: 27, offset: 400828, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=6K-bSNgxSnOypIrjl9OiGA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712047438507, brokerId=10015}
partition: 27, offset: 400829, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=6K-bSNgxSnOypIrjl9OiGA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712047438609, brokerId=10015}
partition: 27, offset: 401145, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=ErPfaCVhRiS6EKx7RnL1iQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712049418775, brokerId=10015}
partition: 27, offset: 401146, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=ErPfaCVhRiS6EKx7RnL1iQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712049418894, brokerId=10015}
partition: 27, offset: 401458, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=2yAT3R6tS3umrvU8iuMVfw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712051340571, brokerId=10015}
partition: 27, offset: 401459, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=2yAT3R6tS3umrvU8iuMVfw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712051340719, brokerId=10015}
partition: 27, offset: 401758, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=KLtWI-SWS7eBML87LX63SA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712053320735, brokerId=10015}
partition: 27, offset: 401759, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=KLtWI-SWS7eBML87LX63SA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712053320838, brokerId=10015}
partition: 27, offset: 539496, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=8dP13VDYSaiFlubl9SNBTQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712909568625, brokerId=10015}
partition: 27, offset: 539497, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1-765,
 id=8dP13VDYSaiFlubl9SNBTQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712909568846, brokerId=10015}
{code}


> Leaking tiered segments
> ---
>
> Key: KAFKA-16511
> URL: https://issues.apache.org/jira/browse/KAFKA-16511
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Francois Visconte
>Assignee: Kamal Chandraprakash
>Priority: Major
>  Labels: tiered-storage
>
> I have some topics there were not written since a few days (having 12h 
> retention) where some data remains on tiered storage (in our case S3) and 
> they are never deleted.
>  
> Looking at the log history, it appears that we never even tried to delete 
> these segments:
> When looking at one of the non-leaking segment, I get the following 
> interesting messages:
>  
> {code:java}
> "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 
> 

[jira] (KAFKA-16511) Leaking tiered segments

2024-04-16 Thread Francois Visconte (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16511 ]


Francois Visconte deleted comment on KAFKA-16511:
---

was (Author: JIRAUSER288982):
Here is for example on one of the partition where I have the issue: 

{code}
_zIcSPWGzlqLjUTFfw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712036488994, brokerId=10041}
partition: 12, offset: 432066, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=YTIk_zIcSPWGzlqLjUTFfw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712036489094, brokerId=10041}
partition: 12, offset: 432319, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=8mQ_JJLfTQmkP80MYcN6Ig}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712038114012, brokerId=10041}
partition: 12, offset: 432320, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=8mQ_JJLfTQmkP80MYcN6Ig}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712038114110, brokerId=10041}
partition: 12, offset: 432593, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=oqr_-KDrSIGiGrcLQZhzvA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712039805803, brokerId=10041}
partition: 12, offset: 432594, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=oqr_-KDrSIGiGrcLQZhzvA}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712039805901, brokerId=10041}
partition: 12, offset: 432914, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=OSou-FH9S4ioH5LHYUxPMg}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712041682856, brokerId=10041}
partition: 12, offset: 432915, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=OSou-FH9S4ioH5LHYUxPMg}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712041683023, brokerId=10041}
partition: 12, offset: 433251, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=uSPj4yS5RIO1LuBHecl7iQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712043678337, brokerId=10041}
partition: 12, offset: 433252, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=uSPj4yS5RIO1LuBHecl7iQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712043678474, brokerId=10041}
partition: 12, offset: 433577, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=pcCt8vc-TUyRhHOGUJBHXg}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712045569214, brokerId=10041}
partition: 12, offset: 433578, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=pcCt8vc-TUyRhHOGUJBHXg}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712045569333, brokerId=10041}
partition: 12, offset: 433904, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=LFqoPyg0SOiHbscnV3Ahtw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712047491475, brokerId=10041}
partition: 12, offset: 433905, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=LFqoPyg0SOiHbscnV3Ahtw}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712047491576, brokerId=10041}
partition: 12, offset: 434229, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=SSyS_hbXSEKD5rgbu1S8ug}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1712049443915, brokerId=10041}
partition: 12, offset: 434230, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic2-774,
 id=SSyS_hbXSEKD5rgbu1S8ug}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1712049444048, brokerId=10041}
partition: 12, offset: 434584, value: 

[jira] (KAFKA-16511) Leaking tiered segments

2024-04-11 Thread Kamal Chandraprakash (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-16511 ]


Kamal Chandraprakash deleted comment on KAFKA-16511:
--

was (Author: ckamal):
[~fvisconte] 

>From logs, the issue is on partition-765 and segment 
>`02971163.log` . There can be dangling segments in the remote 
>storage if there are retries in uploading the segment. In this case, the 
>uploader thread uploaded the segment but failed to write the 
>COPY_SEGMENT_FINISHED event. If you switch the leader to a different replica, 
>then the dangling segment will be removed.


1. The earliest and latest offset for the partition 765 is same which matches 
the expected value (there were not written since a few days (having 12h 
retention). 
2. Was the same segment re-uploaded and deleted with different 
remote-log-segment-id? Can you check your logs?
3. Did you move the log-start-offset using the {{kafka-delete-records.sh}} 
script in middle of segment upload?


> Leaking tiered segments
> ---
>
> Key: KAFKA-16511
> URL: https://issues.apache.org/jira/browse/KAFKA-16511
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Francois Visconte
>Priority: Major
>  Labels: tiered-storage
>
> I have some topics there were not written since a few days (having 12h 
> retention) where some data remains on tiered storage (in our case S3) and 
> they are never deleted.
>  
> Looking at the log history, it appears that we never even tried to delete 
> these segments:
> When looking at one of the non-leaking segment, I get the following 
> interesting messages:
>  
> {code:java}
> "2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 
> partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment 
> RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current 
> earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), 
> segment-end-offset: 2976819 and segment-epochs: [5]"
> "2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data 
> for completed successfully 
> RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId
> {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ}
> , startOffset=2968418, endOffset=2976819, brokerId=10029, 
> maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, 
> segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, 
> customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}"
> "2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data 
> for RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId
> {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ}
> , startOffset=2968418, endOffset=2976819, brokerId=10029, 
> maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, 
> segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, 
> customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}"
> "2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 
> partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied 
> 02968418.log to remote storage with segment-id: 
> RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ}"
> "2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data 
> completed successfully, metadata: 
> RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId
> {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ}
> , startOffset=2968418, endOffset=2976819, brokerId=10029, 
> maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, 
> segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, 
> customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}"
> "2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, 
> metadata: RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId
> {topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
> id=fqGng3UURCG3-v4lETeLKQ}
> , startOffset=2968418, endOffset=2976819, brokerId=10029, 
> maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, 
> segmentLeaderEpochs={5=2968418}, segmentSizeInBytes=536351075, 
> customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}"
> {code}
>  
> Which looks right because we can see logs from both the plugin and remote log 
> manager indicating that the remote log segment was removed.
> Now if I look on one of the leaked segment, here is what I see
>  
> {code:java}
> "2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 
> partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied 
> 02971163.log to remote storage with