amogh-jahagirdar commented on PR #10983:
URL: https://github.com/apache/iceberg/pull/10983#issuecomment-2316763362

   @hantangwangd I thought about this a bit more and while I still think 
updating the IncrementalFileCleanup to address this particular case is 
possible, I think the question more so is "Should we?". 
   
   The main problem we want to address is that the ExpireSnapshots 
implementation should ensure that a user doesn't end up in a state where they 
cleanup files that are still referenced. Looking at what's going on in Presto 
we're actually casting to `RemoveSnapshots` implementation and the 
implementation exposes `withIncrementalCleanup`. 
   
   The reason for exposing that in the implementation (and not the interface) 
was when we were doing branching/tagging we wanted to verify in tests that both 
cleanup strategies work as expected so it was easy to parameterize around that 
API. Ideally we wouldn't need to expose such an API and in the implementation 
intelligently choose the right strategy, but the validation from tests is 
really important.
   
   So considering `withIncrementalCleanup` is exposed to someone who casts to 
RemoveSnapshots then we can either choose to complicate the Incremental cleanup 
with more logic to handle this specific case or we can accept that incremental 
cleanup has these limitations and from an API perspective ensure that callers 
don't shoot themselves in the foot and removing files which shouldn't be 
removed by triggering incremental cleanup when there's an arbitrary snapshot 
that's removed (what @rdblue suggested).
   
   The incremental cleanup logic is already quite complex and I'm now thinking 
it's not really worth it to add handling this particular case. There's probably 
more cases and then if we were to keep adding handling for that at some point 
the implementation will look like the reachable cleanup anyways.
   
   @hantangwangd would you be open to changing the approach to just ensure that 
if `expireSnapshotId` is called and incrementalCleanup is set to true we fail 
before triggering the file cleanup? 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L323
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to