[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances

2018-07-27 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21960:
-

Assignee: Karthik Palaniappan

> Spark Streaming Dynamic Allocation should respect spark.executor.instances
> --
>
> Key: SPARK-21960
> URL: https://issues.apache.org/jira/browse/SPARK-21960
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: Karthik Palaniappan
>Assignee: Karthik Palaniappan
>Priority: Minor
> Fix For: 2.4.0
>
>
> This check enforces that spark.executor.instances (aka --num-executors) is 
> either unset or explicitly set to 0. 
> https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207
> If spark.executor.instances is unset, the check is fine, and the property 
> defaults to 2. Spark requests the cluster manager for 2 executors to start 
> with, then adds/removes executors appropriately.
> However, if you explicitly set it to 0, the check also succeeds, but Spark 
> never asks the cluster manager for any executors. When running on YARN, I 
> repeatedly saw:
> {code:java}
> 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> {code}
> I noticed that at least Google Dataproc and Ambari explicitly set 
> spark.executor.instances to a positive number, meaning that to use dynamic 
> allocation, you would have to edit spark-defaults.conf to remove the 
> property. That's obnoxious.
> In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value 
> for --num-executors or --conf spark.executor.instances: 
> https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9
> It is much more reasonable for Streaming DRA to use spark.executor.instances, 
> just like Core DRA. I'll open a pull request to remove the check if there are 
> no objections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances

2017-09-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21960:


Assignee: Apache Spark

> Spark Streaming Dynamic Allocation should respect spark.executor.instances
> --
>
> Key: SPARK-21960
> URL: https://issues.apache.org/jira/browse/SPARK-21960
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: Karthik Palaniappan
>Assignee: Apache Spark
>Priority: Minor
>
> This check enforces that spark.executor.instances (aka --num-executors) is 
> either unset or explicitly set to 0. 
> https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207
> If spark.executor.instances is unset, the check is fine, and the property 
> defaults to 2. Spark requests the cluster manager for 2 executors to start 
> with, then adds/removes executors appropriately.
> However, if you explicitly set it to 0, the check also succeeds, but Spark 
> never asks the cluster manager for any executors. When running on YARN, I 
> repeatedly saw:
> {code:java}
> 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> {code}
> I noticed that at least Google Dataproc and Ambari explicitly set 
> spark.executor.instances to a positive number, meaning that to use dynamic 
> allocation, you would have to edit spark-defaults.conf to remove the 
> property. That's obnoxious.
> In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value 
> for --num-executors or --conf spark.executor.instances: 
> https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9
> It is much more reasonable for Streaming DRA to use spark.executor.instances, 
> just like Core DRA. I'll open a pull request to remove the check if there are 
> no objections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances

2017-09-10 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21960:


Assignee: (was: Apache Spark)

> Spark Streaming Dynamic Allocation should respect spark.executor.instances
> --
>
> Key: SPARK-21960
> URL: https://issues.apache.org/jira/browse/SPARK-21960
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams
>Affects Versions: 2.2.0
>Reporter: Karthik Palaniappan
>Priority: Minor
>
> This check enforces that spark.executor.instances (aka --num-executors) is 
> either unset or explicitly set to 0. 
> https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207
> If spark.executor.instances is unset, the check is fine, and the property 
> defaults to 2. Spark requests the cluster manager for 2 executors to start 
> with, then adds/removes executors appropriately.
> However, if you explicitly set it to 0, the check also succeeds, but Spark 
> never asks the cluster manager for any executors. When running on YARN, I 
> repeatedly saw:
> {code:java}
> 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: 
> Initial job has not accepted any resources; check your cluster UI to ensure 
> that workers are registered and have sufficient resources
> {code}
> I noticed that at least Google Dataproc and Ambari explicitly set 
> spark.executor.instances to a positive number, meaning that to use dynamic 
> allocation, you would have to edit spark-defaults.conf to remove the 
> property. That's obnoxious.
> In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value 
> for --num-executors or --conf spark.executor.instances: 
> https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9
> It is much more reasonable for Streaming DRA to use spark.executor.instances, 
> just like Core DRA. I'll open a pull request to remove the check if there are 
> no objections.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org