Thomas Graves created SPARK-24622: ------------------------------------- Summary: Task attempts in other stage attempts not killed when one task attempt succeeds Key: SPARK-24622 URL: https://issues.apache.org/jira/browse/SPARK-24622 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 2.1.0 Reporter: Thomas Graves
Looking through the code handling for [https://github.com/apache/spark/pull/21577,] I was looking to see how we are killing task attempts. I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully. For instance: stage 0.0 . (stage id 0, attempt 0) - task 1.0 (task 1, attempt 0) Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail. Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful. We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1. Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt. It will kill the other attempt when one of them succeeds. See TaskSetManager.handleSuccessfulTask -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org