Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19269 > The only contract Spark needs is: data written/committed by tasks should not be visible to data source readers until the job-level commitment. But they can be visible to others like other writing tasks, so it's possible for data sources to implement "abort the output of the other writer". I'm not following what you mean here. > making DataSourceV2Writer.abort take commit messages is still a "best-effort" to clean up the data Agreed. We should state something about this in the abort job docs: "Commit messages passed to abort are the messages for all commits that succeeded and sent a commit message to the driver. It is possible, though unlikely, for an executor to successfully commit data to a data source, but fail before sending the commit message to the driver."
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org