Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21577#discussion_r196247811 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -109,20 +116,21 @@ private[spark] class OutputCommitCoordinator(conf: SparkConf, isDriver: Boolean) * @param maxPartitionId the maximum partition id that could appear in this stage's tasks (i.e. * the maximum possible value of `context.partitionId`). */ - private[scheduler] def stageStart(stage: StageId, maxPartitionId: Int): Unit = synchronized { + private[scheduler] def stageStart(stage: Int, maxPartitionId: Int): Unit = synchronized { stageStates(stage) = new StageState(maxPartitionId + 1) --- End diff -- I don't think the semantics are changing. It's always been racy, in that either of the concurrent tasks from different stage attempts may succeed first. And I'm almost sure the assumption is that both task attempts are equivalent (i.e. the output is deterministic or at least should be), so it should be fine for either to be committed. The problem is that without this change the coordinator would allow both attempts to commit, and that is kinda bad.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org