[ https://issues.apache.org/jira/browse/SPARK-37300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hujiahua updated SPARK-37300: ----------------------------- Description: `TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and `handleFailedTask` , but in some case the task may When a executor finished a task of some stage, the driver will receive a StatusUpdate event to handle it. At the same time the driver found the executor heartbeat timed out, so the dirver also need handle ExecutorLost event simultaneously. There was a race condition issues here, which will make TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. More detailed description and discussion can be viewed at https://issues.apache.org/jira/browse/SPARK-36575 and https://github.com/apache/spark/pull/33872 was: TaskSchedulerImpl in some case may handle task `handleSuccessfulTask` and `handleFailedTask` When a executor finished a task of some stage, the driver will receive a StatusUpdate event to handle it. At the same time the driver found the executor heartbeat timed out, so the dirver also need handle ExecutorLost event simultaneously. There was a race condition issues here, which will make TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. More detailed description and discussion can be viewed at https://issues.apache.org/jira/browse/SPARK-36575 and https://github.com/apache/spark/pull/33872 > TaskSchedulerImpl should ignore task finished event if its task was already > finished state > ------------------------------------------------------------------------------------------ > > Key: SPARK-37300 > URL: https://issues.apache.org/jira/browse/SPARK-37300 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.2.0 > Reporter: hujiahua > Priority: Major > > `TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and > `handleFailedTask` , but in some case the task may > When a executor finished a task of some stage, the driver will receive a > StatusUpdate event to handle it. At the same time the driver found the > executor heartbeat timed out, so the dirver also need handle ExecutorLost > event simultaneously. There was a race condition issues here, which will make > TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. > More detailed description and discussion can be viewed at > https://issues.apache.org/jira/browse/SPARK-36575 and > https://github.com/apache/spark/pull/33872 -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org