dd-willgan opened a new pull request, #14073: URL: https://github.com/apache/pinot/pull/14073
Motivation: https://github.com/apache/pinot/issues/10956 - For Pinot clusters that have been up for a while `Deleted_Segments` may account for a significant portion of bucket usage and cost Problem: - `S3PinotFS.listFiles` which the retention relies on does not return directories even though that is expected by `PinotFS` interface and `SegmentDeletionManager` - PR history of relevant code: - https://github.com/apache/pinot/pull/9466 - https://github.com/apache/pinot/pull/6002 - https://github.com/apache/pinot/pull/5249 Changes: - Use CommonPrefixes in ListObjectsV2 response to get directories - Modify the visitFiles signature Tested: - Updated existing unit test - Backwards incompatibilities - Recursive = true behavior unchanged - Based on screenshot below, aside from `SegmentDeletionManager` which we want to fix, the only other places listFiles is called with recursive = false is `SegmentGenerationUtils` and `PinotLLCRealtimeSegmentManager` - Those code locations won't be affected by this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
