Ryan Blue created SPARK-24684:
---------------------------------

             Summary: DAGScheduler reports the wrong attempt number to the 
commit coordinator
                 Key: SPARK-24684
                 URL: https://issues.apache.org/jira/browse/SPARK-24684
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 2.1.3, 2.3.2
            Reporter: Ryan Blue


SPARK-24552 changes writers to pass the task ID to the output coordinator so 
that the coordinator tracks each task uniquely because attempt numbers can be 
reused across stage attempts. However, the DAGScheduler still passes the 
attempt number when notifying the coordinator that a task has finished. The 
result is that when a task is authorized and then fails due to OOM or a similar 
error, the scheduler is notified but doesn't remove the commit authorization 
because the attempt number doesn't match. This causes infinite task retries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to