[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation

2015-10-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960174#comment-14960174
 ] 

Apache Spark commented on SPARK-11120:
--

User 'ryan-williams' has created a pull request for this issue:
https://github.com/apache/spark/pull/9147

> maxNumExecutorFailures defaults to 3 under dynamic allocation
> -
>
> Key: SPARK-11120
> URL: https://issues.apache.org/jira/browse/SPARK-11120
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Ryan Williams
>Priority: Minor
>
> With dynamic allocation, the {{spark.executor.instances}} config is 0, 
> meaning [this 
> line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68]
>  ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has 
> resulted in large dynamicAllocation jobs with hundreds of executors dying due 
> to one bad node serially failing executors that are allocated on it.
> I think that using {{spark.dynamicAllocation.maxExecutors}} would make most 
> sense in this case; I frequently run shells that vary between 1 and 1000 
> executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would 
> still leave me with a value that is lower than makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation

2015-10-15 Thread Ryan Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960169#comment-14960169
 ] 

Ryan Williams commented on SPARK-11120:
---

Without dynamic allocation, you are allowed [twice the number of executors] 
failures, which seems reasonable.

With dynamic allocation, {{spark.executor.instances}} doesn't get set, and so 
you are allowed {{math.max(0 * 2, 3)}} failures, no matter how many executors 
your job has as its min, initial, and max settings.


> maxNumExecutorFailures defaults to 3 under dynamic allocation
> -
>
> Key: SPARK-11120
> URL: https://issues.apache.org/jira/browse/SPARK-11120
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Ryan Williams
>Priority: Minor
>
> With dynamic allocation, the {{spark.executor.instances}} config is 0, 
> meaning [this 
> line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68]
>  ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has 
> resulted in large dynamicAllocation jobs with hundreds of executors dying due 
> to one bad node serially failing executors that are allocated on it.
> I think that using {{spark.dynamicAllocation.maxExecutors}} would make most 
> sense in this case; I frequently run shells that vary between 1 and 1000 
> executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would 
> still leave me with a value that is lower than makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11120) maxNumExecutorFailures defaults to 3 under dynamic allocation

2015-10-15 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958762#comment-14958762
 ] 

Sean Owen commented on SPARK-11120:
---

Is this specific to dynamic allocation though? you could have the same problem 
without it.

> maxNumExecutorFailures defaults to 3 under dynamic allocation
> -
>
> Key: SPARK-11120
> URL: https://issues.apache.org/jira/browse/SPARK-11120
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Ryan Williams
>Priority: Minor
>
> With dynamic allocation, the {{spark.executor.instances}} config is 0, 
> meaning [this 
> line|https://github.com/apache/spark/blob/4ace4f8a9c91beb21a0077e12b75637a4560a542/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L66-L68]
>  ends up with {{maxNumExecutorFailures}} equal to {{3}}, which for me has 
> resulted in large dynamicAllocation jobs with hundreds of executors dying due 
> to one bad node serially failing executors that are allocated on it.
> I think that using {{spark.dynamicAllocation.maxExecutors}} would make most 
> sense in this case; I frequently run shells that vary between 1 and 1000 
> executors, so using {{s.dA.minExecutors}} or {{s.dA.initialExecutors}} would 
> still leave me with a value that is lower than makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org