dd-willgan opened a new pull request, #14073:
URL: https://github.com/apache/pinot/pull/14073

   Motivation: https://github.com/apache/pinot/issues/10956
   
   - For Pinot clusters that have been up for a while `Deleted_Segments` may 
account for a significant portion of bucket usage and cost
   
   Problem:
   
   - `S3PinotFS.listFiles` which the retention relies on does not return 
directories even though that is expected by `PinotFS` interface and 
`SegmentDeletionManager`
   - PR history of relevant code:
     - https://github.com/apache/pinot/pull/9466
     - https://github.com/apache/pinot/pull/6002
     - https://github.com/apache/pinot/pull/5249
   
   Changes:
   
   - Use CommonPrefixes in ListObjectsV2 response to get directories
   - Modify the visitFiles signature
   
   Tested:
   
   - Updated existing unit test
   - Backwards incompatibilities
     - Recursive = true behavior unchanged
     - Based on screenshot below, aside from `SegmentDeletionManager` which we 
want to fix, the only other places listFiles is called with recursive = false 
is `SegmentGenerationUtils` and `PinotLLCRealtimeSegmentManager`
       - Those code locations won't be affected by this change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to