Github user GraceH commented on a diff in the pull request: https://github.com/apache/spark/pull/7888#discussion_r44490668 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -509,6 +511,13 @@ private[spark] class ExecutorAllocationManager( private def onExecutorBusy(executorId: String): Unit = synchronized { logDebug(s"Clearing idle timer for $executorId because it is now running a task") removeTimes.remove(executorId) + + // Executor is added to remove by misjudgment due to async listener making it as idle). + // see SPARK-9552 + if (executorsPendingToRemove.contains(executorId)) { --- End diff -- @vanzin Here is the code path. 1. Prepare entire executorID-list to be killed (meet certain criteria) 2. killExecutors will filter out non-eligible ones (some of them may not be killed accordingly) 3. no matter what kind of executors filtered out, if some of them are acknowledged(really killed), we will add all of the executorID-list to `executorsPendingToRemove`. There is no way to tell who is actually to kill. That is why we need such kind of rescuing. please let me know if it makes sense.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org