Ryan Blue created SPARK-24684: --------------------------------- Summary: DAGScheduler reports the wrong attempt number to the commit coordinator Key: SPARK-24684 URL: https://issues.apache.org/jira/browse/SPARK-24684 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.1.3, 2.3.2 Reporter: Ryan Blue
SPARK-24552 changes writers to pass the task ID to the output coordinator so that the coordinator tracks each task uniquely because attempt numbers can be reused across stage attempts. However, the DAGScheduler still passes the attempt number when notifying the coordinator that a task has finished. The result is that when a task is authorized and then fails due to OOM or a similar error, the scheduler is notified but doesn't remove the commit authorization because the attempt number doesn't match. This causes infinite task retries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org