Ngone51 commented on a change in pull request #27223: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in ExecutorAllocationManager URL: https://github.com/apache/spark/pull/27223#discussion_r373926875
########## File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ########## @@ -614,18 +615,30 @@ private[spark] class ExecutorAllocationManager( stageAttemptToNumRunningTask -= stageAttempt } } - // If the task failed, we expect it to be resubmitted later. To ensure we have - // enough resources to run the resubmitted task, we need to mark the scheduler - // as backlogged again if it's not already marked as such (SPARK-8366) - if (taskEnd.reason != Success) { - if (totalPendingTasks() == 0) { - allocationManager.onSchedulerBacklogged() - } - if (taskEnd.taskInfo.speculative) { - stageAttemptToSpeculativeTaskIndices.get(stageAttempt).foreach {_.remove(taskIndex)} - } else { - stageAttemptToTaskIndices.get(stageAttempt).foreach {_.remove(taskIndex)} - } + + if (taskEnd.taskInfo.speculative) { + stageAttemptToSpeculativeTaskIndices.get(stageAttempt).foreach {_.remove{taskIndex}} + stageAttemptToNumSpeculativeTasks(stageAttempt) -= 1 + } + + taskEnd.reason match { + case Success | _: TaskKilled => + case _ => + if (totalPendingTasks() == 0) { + // If the task failed (not intentionally killed), we expect it to be resubmitted + // later. To ensure we have enough resources to run the resubmitted task, we need to + // mark the scheduler as backlogged again if it's not already marked as such + // (SPARK-8366) + allocationManager.onSchedulerBacklogged() Review comment: > If a speculative task fails, while it will not be directly resubmitted, a new speculative task will be launched in next speculation cycle. So it's OK for us to mark the scheduler as backlogged in this case. Hmm...but the new speculative task may not be launched if the normal task finish. And even if it launched, `ExecutorAllocationManager` could still handle it by receiving `SparkListenerSpeculativeTaskSubmitted`. Though, calling `onSchedulerBacklogged` may could reserve executor resource more early to reduce the delay. Fine! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org