amogh-jahagirdar commented on PR #10983: URL: https://github.com/apache/iceberg/pull/10983#issuecomment-2316763362
@hantangwangd I thought about this a bit more and while I still think updating the IncrementalFileCleanup to address this particular case is possible, I think the question more so is "Should we?". The main problem we want to address is that the ExpireSnapshots implementation should ensure that a user doesn't end up in a state where they cleanup files that are still referenced. Looking at what's going on in Presto we're actually casting to `RemoveSnapshots` implementation and the implementation exposes `withIncrementalCleanup`. The reason for exposing that in the implementation (and not the interface) was when we were doing branching/tagging we wanted to verify in tests that both cleanup strategies work as expected so it was easy to parameterize around that API. Ideally we wouldn't need to expose such an API and in the implementation intelligently choose the right strategy, but the validation from tests is really important. So considering `withIncrementalCleanup` is exposed to someone who casts to RemoveSnapshots then we can either choose to complicate the Incremental cleanup with more logic to handle this specific case or we can accept that incremental cleanup has these limitations and from an API perspective ensure that callers don't shoot themselves in the foot and removing files which shouldn't be removed by triggering incremental cleanup when there's an arbitrary snapshot that's removed (what @rdblue suggested). The incremental cleanup logic is already quite complex and I'm now thinking it's not really worth it to add handling this particular case. There's probably more cases and then if we were to keep adding handling for that at some point the implementation will look like the reachable cleanup anyways. @hantangwangd would you be open to changing the approach to just ensure that if `expireSnapshotId` is called and incrementalCleanup is set to true we fail before triggering the file cleanup? https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/RemoveSnapshots.java#L323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org