amogh-jahagirdar opened a new pull request, #10373:
URL: https://github.com/apache/iceberg/pull/10373

   Upstream callers of the transaction/snapshot producer APIs such as engines 
like Spark currently handle CommitStateUnknown exceptions to avoid cleaning up 
data files when it's unclear if the commit was successful or not. For example, 
see the rewrite data files procedure here: 
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L116
   
   Currently, we only cleanup metadata files if strict cleanup is enabled 
(default true) and it's a cleanable failure. We should extend this to data file 
cleanups that upstream callers may do. To do that we can throw a commit state 
unknown exception in SnapshotProducer/BaseTransaction if strict cleanup is 
enabled and it's not a cleanable failure. Then engines won't go and cleanup 
data files. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to