There's been discussion going on in various PRs about what committers do, are expected to do, and how they get coordinated; a general conclusion to these is "this should be covered in the developer list"
Here then, are the 3 PRs where this has surfaced. [SPARK-22026][SQL] data source v2 write path https://github.com/apache/spark/pull/19269 [SPARK-22078][SQL] clarify exception behaviors for all data source v2 interfaces https://github.com/apache/spark/pull/19623 SPARK-22162] Executors and the driver should use consistent JobIDs in the RDD commit protocol : https://github.com/apache/spark/pull/19848 Right now, the Hadoop side of things is non-normatively written up in https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md with some errata in a WiP patch https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-15107-correctness/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md Those docs are incomplete, and I don't know of anything equivalent covering the Spark driver's commit algorithm, so it's mostly been a matter of tracing back through the IDE and having a modified committer set to do things like fail in task or job commit. Having spent time integrating Hadoop's forthcoming S3A committers with things, I suspect that there may be some mismatch of expectations of committers & what they deliver, but I'll need to add a bit more fault injection there to be sure. I'll have a draft of a paper up in a week or so for anyone interested in this area -Steve