[ https://issues.apache.org/jira/browse/SPARK-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393377#comment-15393377 ]
Saisai Shao commented on SPARK-16708: ------------------------------------- Looks similar to SPARK-11334, and I have a patch on it, though not merged into code base. > ExecutorAllocationManager.numRunningTasks can be negative when stage retry > -------------------------------------------------------------------------- > > Key: SPARK-16708 > URL: https://issues.apache.org/jira/browse/SPARK-16708 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.0 > Reporter: Hong Shen > > When a task fetch failed, the stage will complete and retry, when the stage > complete, ExecutorAllocationManager.numRunningTasks will be set 0, here is > the code: > {code} > override def onStageCompleted(stageCompleted: > SparkListenerStageCompleted): Unit = { > val stageId = stageCompleted.stageInfo.stageId > allocationManager.synchronized { > stageIdToNumTasks -= stageId > stageIdToTaskIndices -= stageId > stageIdToExecutorPlacementHints -= stageId > // Update the executor placement hints > updateExecutorPlacementHints() > // If this is the last stage with pending tasks, mark the scheduler > queue as empty > // This is needed in case the stage is aborted for any reason > if (stageIdToNumTasks.isEmpty) { > allocationManager.onSchedulerQueueEmpty() > if (numRunningTasks != 0) { > logWarning("No stages are running, but numRunningTasks != 0") > numRunningTasks = 0 > } > } > } > } > {code} > But when the stage's running task finished, numRunningTasks will minus 1, so > numRunningTasks be negative, it can cause maxNeeded be negative. > {code} > override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = { > val executorId = taskEnd.taskInfo.executorId > val taskId = taskEnd.taskInfo.taskId > val taskIndex = taskEnd.taskInfo.index > val stageId = taskEnd.stageId > allocationManager.synchronized { > numRunningTasks -= 1 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org