[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960174#comment-14960174 ] Apache Spark commented on SPARK-11120: -- User 'ryan-williams' has created a pull request for this issue: https://github.com/apache/spark/pull/9147 > maxNumExecutorFailures defaults to 3 under dynamic allocation > - > > Key: SPARK-11120 > URL: https://issues.apache.org/jira/browse/SPARK-11120 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Ryan Williams >Priority: Minor > > With dynamic allocation, the {{spark.executor.instances}} config is 0, > meaning [this > line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68] > ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has > resulted in large dynamicAllocation jobs with hundreds of executors dying due > to one bad node serially failing executors that are allocated on it. > I think that using {{spark.dynamicAllocation.maxExecutors}} would make most > sense in this case; I frequently run shells that vary between 1 and 1000 > executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would > still leave me with a value that is lower than makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960169#comment-14960169 ] Ryan Williams commented on SPARK-11120: --- Without dynamic allocation, you are allowed [twice the number of executors] failures, which seems reasonable. With dynamic allocation, {{spark.executor.instances}} doesn't get set, and so you are allowed {{math.max(0 * 2, 3)}} failures, no matter how many executors your job has as its min, initial, and max settings. > maxNumExecutorFailures defaults to 3 under dynamic allocation > - > > Key: SPARK-11120 > URL: https://issues.apache.org/jira/browse/SPARK-11120 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Ryan Williams >Priority: Minor > > With dynamic allocation, the {{spark.executor.instances}} config is 0, > meaning [this > line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68] > ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has > resulted in large dynamicAllocation jobs with hundreds of executors dying due > to one bad node serially failing executors that are allocated on it. > I think that using {{spark.dynamicAllocation.maxExecutors}} would make most > sense in this case; I frequently run shells that vary between 1 and 1000 > executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would > still leave me with a value that is lower than makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958762#comment-14958762 ] Sean Owen commented on SPARK-11120: --- Is this specific to dynamic allocation though? you could have the same problem without it. > maxNumExecutorFailures defaults to 3 under dynamic allocation > - > > Key: SPARK-11120 > URL: https://issues.apache.org/jira/browse/SPARK-11120 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Ryan Williams >Priority: Minor > > With dynamic allocation, the {{spark.executor.instances}} config is 0, > meaning [this > line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68] > ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has > resulted in large dynamicAllocation jobs with hundreds of executors dying due > to one bad node serially failing executors that are allocated on it. > I think that using {{spark.dynamicAllocation.maxExecutors}} would make most > sense in this case; I frequently run shells that vary between 1 and 1000 > executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would > still leave me with a value that is lower than makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org