Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19848 > I actually feel like this is something hadoop should be documenting ... we are talking about how committers we happen to know work, rather than talking about the general contract of committers. But even if its not in the hadoop docs, in our jira or mailing list would be better. I concur. The best there is right now is in [the s3a docs|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md], which is based on stepping through with a debugger while taking notes...not the best thing people should do. I kept in the hadoop-aws site because there wasn't an equivalent bit of source tree for MR, funnily enough, and doing it in the -aws module made it easier to get in. Really that could be pulled up somewhere and perhaps accompanied by a list of requirements of committers, and for v1 and v2, their FS requirements. At the same time, Spark needs its counterpoint of what it expects, which is equally tricky to work out. The big risk is the mismatch in spark's expectations and that delivered by committers and their stores. In particular, I think Spark assumes that a timeout on a failure to respond to a commit-task-granted message can be handled by granting another speculative task the right. However, that requires taskCommit to be atomic, which only holds for v1.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org