Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21558
  
    > So the problem here is, when we retry a stage, Spark doesn't kill the 
tasks of the old stage and just launch tasks for the new stage
    
    I think that's something that should be fixed, but it wouldn't entirely fix 
the problem unless we were very careful about ordering in the driver: the stage 
would have to fail, then stop allowing commits, then wait for all of the tasks 
that were allowed to commit to finish, and account for the coordination 
messages being in flight. Not an easy problem.
    
    I'd like to see a fix that makes the attempt number unique within a job and 
partition, i.e., no two tasks should have the same (job id, partition id, 
attempt number) triple as Wenchen said.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to