Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/19848 @mridulm what I meant by same rdd was to run the same job two times on the same cluster but in different spark contexts. So it is not the same rdd, but since sparkContext will start rdd ids from zero then we may have same rdd ids in different executions. The jobTrackerId will be different, but I actually didn't check whether hadoop will cause a different file path based on the jobTrackerId. If that is the case then there will not be a problem. But if not then the commit will fail I guess. I think this can only happen when spark.hadoop.validateOutputSpec is true.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org