[ 
https://issues.apache.org/jira/browse/SPARK-24684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved SPARK-24684.
-------------------------------
    Resolution: Not A Problem

Closing this. In master, the attempt number is still used. Looks like this was 
just backported incorrectly by me.

> DAGScheduler reports the wrong attempt number to the commit coordinator
> -----------------------------------------------------------------------
>
>                 Key: SPARK-24684
>                 URL: https://issues.apache.org/jira/browse/SPARK-24684
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.3, 2.3.2
>            Reporter: Ryan Blue
>            Priority: Major
>
> SPARK-24552 changes writers to pass the task ID to the output coordinator so 
> that the coordinator tracks each task uniquely because attempt numbers can be 
> reused across stage attempts. However, the DAGScheduler still passes the 
> attempt number when notifying the coordinator that a task has finished. The 
> result is that when a task is authorized and then fails due to OOM or a 
> similar error, the scheduler is notified but doesn't remove the commit 
> authorization because the attempt number doesn't match. This causes infinite 
> task retries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to