Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/15249
  
    I thought about this a little more and had some offline discussion with 
Imran.  @mridulm, I re-read all of your comments and it sounds like there are 
two issues that are addressed by the old blacklisting mechanism.  I've 
attempted to summarize the state of things here:
    
    ### (1) Executor Shutdown
    
    **Problem**: Sometimes executors are in the process of being killed, and 
tasks get scheduled on them during this period.  It's bad if we don't do 
anything about this, because (e.g., due to locality) a task could repeatedly 
get re-scheduled on that executor, eventually causing the task to exhaust its 
max number of failures and the job to be aborted.
    
    **Approach in current Spark**: A per-task blacklist avoids re-scheduling 
tasks on the same executor for a configurable period of time (by default, 0).
    
    **Approach in this PR**: After a configurable number of failures (default 
1), the executor will be permanently blacklisted.  For the executor that's 
shutting down, any tasks run on it will be permanently blacklisted from that 
executor, and the executor may eventually be permanently blacklisted for the 
task set.  This seems like a non-issue since the executor is shutting down 
anyway, so an an executor level, this new approach works at least as well as 
the old approach.  
    
    On a HOST level, the new approach is different: the failures on that 
executor will count towards the max number of failures on the host where the 
executor is/was running.  This could be problematic if there are other 
executors on the same host.  For example, if the max failures per host is 1, or 
multiple executors on one host have this shutdown issue, the entire host will 
be permanently blacklisted for the task set.  Does this seem like an issue to 
folks? I'm not very familiar with YARN's allocation policies, but it seems like 
if it's common to have many executors per host, a user would probably set max 
failures per host to be > max failures per executor.  In this case, the new 
host-level behavior is only problematic if (1) multiple executors on the host 
have this being-shutdown-issue AND (2) YARN allocates more executors on the 
host after that.
    
    ### (2) Temporary Resource Contention**
    
    **Problem**: Sometimes machines have temporary resource contention; e.g., 
disk or memory is temporarily full with data from another job.  If we don't do 
anything about this, tasks will repeatedly get re-scheduled on the bad executor 
(e.g., due to locality), eventually causing the task to exhaust its max number 
of failures and the job to be aborted.
    
    **Approach in current Spark**: A per-task blacklist avoids re-scheduling 
tasks on the same executor for a configurable period of time (by default, 0).  
This allows tasks to eventually get a chance to use the executor again (but as 
@tgravescs pointed out, this timeout may be hard to configure, and needs to be 
balanced with the locality wait time, since if the timeout is > locality wait 
timeout, the task will probably get scheduled on a different machine anyway).
    
    **Approach in this PR**: After a configurable number of attempts, tasks 
will be permanently blacklisted from the temporarily contended executor (or 
host) and will be run a different machine, even though the task may succeed on 
the host later.  The biggest consequence of this seems to be that the task may 
be forced to run on a non-local machine (and it may need to wait for the 
locality wait timer to expire before being scheduled).
    
    @mridulm are these issues summarized correctly above?  If so, can you 
elaborate on why the approach in this PR isn't sufficient?  I agree with Imran 
that, if the approach in this PR doesn't seem sufficient for these two cases, 
we should just leave in the old mechanism.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to