jackye1995 commented on pull request #2591:
URL: https://github.com/apache/iceberg/pull/2591#issuecomment-845641397


   > If we now want to check if we can remove Delete File A we only have to 
read files C and D so we actually
   made progress.
   
   I think this is the place I am a bit confused. A' and B' don't need delete 
file A for sure because sequence number of A' and B' is higher. But we don't 
read C and D to add delete file A to C and D's FileScanTask. It's done by 
reading the statistics of delete file A and determined by the partition filter. 
As long as there are files of lower sequence number in that partition, the 
delete file will be included to that file scan task.
   
   This means that if we can have a counter for each delete file and expose a 
method `cleanUnreferencedDeleteFiles()` called after `planFileGroups()`, we can 
naturally get all the files compacted just by running bin packing continuously.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to