[ https://issues.apache.org/jira/browse/SPARK-30511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zebing Lin updated SPARK-30511: ------------------------------- External issue ID: SPARK-2840 > Spark marks ended speculative tasks as pending leads to holding idle executors > ------------------------------------------------------------------------------ > > Key: SPARK-30511 > URL: https://issues.apache.org/jira/browse/SPARK-30511 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 2.3.0 > Reporter: Zebing Lin > Priority: Major > > *TL;DR* > When speculative tasks finished/failed/got killed, they are still considered > as pending and count towards the calculation of number of needed executors. > h3. Symptom > In one of our production job (where it's running 4 tasks per executor), we > found that it was holding 6 executors at the end with only 2 tasks running (1 > speculative). With more logging enabled, we found the job printed: > {code:java} > pendingTasks is 0 pendingSpeculativeTasks is 17 totalRunningTasks is 2 > {code} > while the job only had 1 speculative task running and 16 speculative tasks > intentionally killed because of corresponding original tasks had finished. > h3. The Bug > Upon examining the code of _pendingSpeculativeTasks_: > {code:java} > stageAttemptToNumSpeculativeTasks.map { case (stageAttempt, numTasks) => > numTasks - > stageAttemptToSpeculativeTaskIndices.get(stageAttempt).map(_.size).getOrElse(0) > }.sum > {code} > where _stageAttemptToNumSpeculativeTasks(stageAttempt)_ is incremented on > _onSpeculativeTaskSubmitted_, but never decremented. > _stageAttemptToNumSpeculativeTasks -= stageAttempt_ is performed on stage > completion. *This means Spark is marking ended speculative tasks as pending, > which leads to Spark to hold more executors that it actually needs!* > I will have a PR ready to fix this issue, along with SPARK-2840 too > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org