[ https://issues.apache.org/jira/browse/SPARK-44179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-44179: ----------------------------------- Labels: pull-request-available (was: ) > When a task failed and the inferred task for that task is still executing, > the number of dynamically scheduled executors will be calculated incorrectly > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-44179 > URL: https://issues.apache.org/jira/browse/SPARK-44179 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.4.1 > Reporter: liangyongyuan > Priority: Major > Labels: pull-request-available > > Assuming a stage has Task 1, with Task 1.0 and a speculative task Task 1.1 > running concurrently, the dynamic scheduler calculates the number of > executors as 2 (pendingTask: 0, pendingSpeculative: 0, running: 2). > At this point, Task 1.0 fails, and the dynamic scheduler recalculates the > number of executors as 2 (pendingTask: 1, pendingSpeculative: 0, running: 1). > Due to the failure of Task 1.0, copyRunning(1) becomes 1. As a result, Task 1 > is speculated again and a SparkListenerSpeculativeTaskSubmitted event is > triggered. However, the dynamic scheduler's calculation for the number of > executors becomes 3 (pendingTask: 1, pendingSpeculative: 1, running: 1), > which is obviously not as expected. > Then, Task 1.2 starts, and it is marked as a speculative task. However, the > dynamic scheduler still calculates the number of executors as 3 (pendingTask: > 1, pendingSpeculative: 1, running: 1), which again is not as expected. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org