Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/18739#discussion_r129849052 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -665,10 +667,15 @@ private[spark] class TaskSetManager( } } if (blacklistedEverywhere) { - val partition = tasks(indexInTaskSet).partitionId - abort(s"Aborting $taskSet because task $indexInTaskSet (partition $partition) " + - s"cannot run anywhere due to node and executor blacklist. Blacklisting behavior " + - s"can be configured via spark.blacklist.*.") + val dynamicAllocationEnabled = conf.getBoolean("spark.dynamicAllocation.enabled", false) + val mayAllocateNewExecutor = + conf.getInt("spark.executor.instances", -1) > currentExecutorNumber + if (!dynamicAllocationEnabled && !mayAllocateNewExecutor) { --- End diff -- the reason we do wait until the task set has finished is that before that, we have no idea whether the failure is the fault of the user-code (or bad input data etc.), or its actually a fault with the node / executor. Our only piece of information on that is when the task that fails on one executor, and then succeeds elsewhere, then we assume that its the failure was the fault of the original executor (though this heuristic also has false-positives, from what I've seen so far it seems tolerable.) I have also thought of having this wait some amount of time rather than killing the taskset immediately, to see if another executor comes up. However, there are some complications with that as well. I think this is all captured in the discussion on SPARK-15815, that actually discusses one of the trickiest cases -- just one task remaining with Dynamic Allocation, and all other executors have been killed b/c they were idle. Take a look at that jira. If it summarizes things, then we can close SPARK-21539 as a duplicate and continue discussion on SPARK-15815.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org