Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 So there are 2 options: 1. ask the RDD closure to be idempotent. I'm not sure if it's OK for MLlib, cc @mengxr @WeichenXu123 @yanboliang 2. ask the output committer to be able to overwrite a committed task. Note that, the output committer here is the `FileCommitProtocol` interface in Spark, not the hadoop output committer. We don't have to make it all the hadoop output committers work.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org