Ryan Williams created SPARK-11120:
-------------------------------------

             Summary: maxNumExecutorFailures defaults to 3 under dynamic 
allocation
                 Key: SPARK-11120
                 URL: https://issues.apache.org/jira/browse/SPARK-11120
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.5.1
            Reporter: Ryan Williams
            Priority: Minor


With dynamic allocation, the {{spark.executor.instances}} config is 0, meaning 
[this 
line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68]
 ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has 
resulted in large dynamicAllocation jobs with hundreds of executors dying due 
to one bad node serially failing executors that are allocated on it.

I think that using {{spark.dynamicAllocation.maxExecutors}} would make most 
sense in this case; I frequently run shells that vary between 1 and 1000 
executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would 
still leave me with a value that is lower than makes sense.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to