Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19623#discussion_r148263405 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -75,8 +82,10 @@ /** * Aborts this writing job because some data writers are failed to write the records and aborted, * or the Spark job fails with some unknown reasons, or {@link #commit(WriterCommitMessage[])} - * fails. If this method fails(throw exception), the underlying data source may have garbage that - * need to be cleaned manually, but these garbage should not be visible to data source readers. + * fails. + * + * If an exception was thrown, the underlying data source may have garbage that need to be + * cleaned manually, but these garbage should not be visible to data source readers. --- End diff -- call it, because commit may fail for other reasons, e.g. the committer permits the job to commit if the output is all to empty partitions, but fails if there is data in any of the dest partitions. the abort() call triggers the cleanup. 1. exceptions aborts after job commit failure. must of course be caught/swallowed. Hadoop MR doesn't do that properly, which is something someone should fix. Probably me. 1. ideally the committers should be doing their own cleanup if the job fails, so they can be confident the cleanup always takes place. Again: catch & swallow exceptions. 1. Failure in job commit is very much an "outcome is unknown" state. Couple of references * [s3a committer lifecycle tests](https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java), including OOO workflows (abort before commit, 2x commit task, 2x commit job, commit job -> commit task). * [SPARK-2984](https://issues.apache.org/jira/browse/SPARK-2984), where underlying problem of recurrent issue is people trying to write to same dest with >1 job, and finding all jobs after first one failing as the `__temporary` dir has been deleted in cleanup(). Arguably, the job commit/cleanup is being overaggressive about cleanup, but given how partition merging isn't atomic with the classic fileoutput committers, its really a manifestation of people trying to do something they shouldn't.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org