Thomas Graves created SPARK-24413:
-------------------------------------

             Summary: Executor Blacklisting shouldn't immediately fail the 
application if dynamic allocation is enabled and it doesn't have any other 
active executors 
                 Key: SPARK-24413
                 URL: https://issues.apache.org/jira/browse/SPARK-24413
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 2.3.0
            Reporter: Thomas Graves


Currently with executor blacklisting enabled, dynamic allocation on, and you 
only have 1 active executor (spark.blacklist.killBlacklistedExecutors setting 
doesn't matter in this case, can be on or off), if you have a task fail that 
results in the 1 executor you have getting blacklisted, then your entire 
application will fail.  The error you get is something like:

Aborting TaskSet 0.0 because task 9 (partition 9)
cannot run anywhere due to node and executor blacklist.

This is very undesirable behavior because you may have a huge job but one task 
is the long tail and if it happens to hit a bad node that would blacklist it, 
the entire job fail.

Ideally since dynamic allocation is on, the schedule should not immediately 
fail but it should let dynamic allocation try to get more executors. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to