Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21558 > So the problem here is, when we retry a stage, Spark doesn't kill the tasks of the old stage and just launch tasks for the new stage I think that's something that should be fixed, but it wouldn't entirely fix the problem unless we were very careful about ordering in the driver: the stage would have to fail, then stop allowing commits, then wait for all of the tasks that were allowed to commit to finish, and account for the coordination messages being in flight. Not an easy problem. I'd like to see a fix that makes the attempt number unique within a job and partition, i.e., no two tasks should have the same (job id, partition id, attempt number) triple as Wenchen said.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org