Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/15249 I thought about this a little more and had some offline discussion with Imran. @mridulm, I re-read all of your comments and it sounds like there are two issues that are addressed by the old blacklisting mechanism. I've attempted to summarize the state of things here: ### (1) Executor Shutdown **Problem**: Sometimes executors are in the process of being killed, and tasks get scheduled on them during this period. It's bad if we don't do anything about this, because (e.g., due to locality) a task could repeatedly get re-scheduled on that executor, eventually causing the task to exhaust its max number of failures and the job to be aborted. **Approach in current Spark**: A per-task blacklist avoids re-scheduling tasks on the same executor for a configurable period of time (by default, 0). **Approach in this PR**: After a configurable number of failures (default 1), the executor will be permanently blacklisted. For the executor that's shutting down, any tasks run on it will be permanently blacklisted from that executor, and the executor may eventually be permanently blacklisted for the task set. This seems like a non-issue since the executor is shutting down anyway, so an an executor level, this new approach works at least as well as the old approach. On a HOST level, the new approach is different: the failures on that executor will count towards the max number of failures on the host where the executor is/was running. This could be problematic if there are other executors on the same host. For example, if the max failures per host is 1, or multiple executors on one host have this shutdown issue, the entire host will be permanently blacklisted for the task set. Does this seem like an issue to folks? I'm not very familiar with YARN's allocation policies, but it seems like if it's common to have many executors per host, a user would probably set max failures per host to be > max failures per executor. In this case, the new host-level behavior is only problematic if (1) multiple executors on the host have this being-shutdown-issue AND (2) YARN allocates more executors on the host after that. ### (2) Temporary Resource Contention** **Problem**: Sometimes machines have temporary resource contention; e.g., disk or memory is temporarily full with data from another job. If we don't do anything about this, tasks will repeatedly get re-scheduled on the bad executor (e.g., due to locality), eventually causing the task to exhaust its max number of failures and the job to be aborted. **Approach in current Spark**: A per-task blacklist avoids re-scheduling tasks on the same executor for a configurable period of time (by default, 0). This allows tasks to eventually get a chance to use the executor again (but as @tgravescs pointed out, this timeout may be hard to configure, and needs to be balanced with the locality wait time, since if the timeout is > locality wait timeout, the task will probably get scheduled on a different machine anyway). **Approach in this PR**: After a configurable number of attempts, tasks will be permanently blacklisted from the temporarily contended executor (or host) and will be run a different machine, even though the task may succeed on the host later. The biggest consequence of this seems to be that the task may be forced to run on a non-local machine (and it may need to wait for the locality wait timer to expire before being scheduled). @mridulm are these issues summarized correctly above? If so, can you elaborate on why the approach in this PR isn't sufficient? I agree with Imran that, if the approach in this PR doesn't seem sufficient for these two cases, we should just leave in the old mechanism.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org