[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances
[ https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21960: - Assignee: Karthik Palaniappan > Spark Streaming Dynamic Allocation should respect spark.executor.instances > -- > > Key: SPARK-21960 > URL: https://issues.apache.org/jira/browse/SPARK-21960 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 2.2.0 >Reporter: Karthik Palaniappan >Assignee: Karthik Palaniappan >Priority: Minor > Fix For: 2.4.0 > > > This check enforces that spark.executor.instances (aka --num-executors) is > either unset or explicitly set to 0. > https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207 > If spark.executor.instances is unset, the check is fine, and the property > defaults to 2. Spark requests the cluster manager for 2 executors to start > with, then adds/removes executors appropriately. > However, if you explicitly set it to 0, the check also succeeds, but Spark > never asks the cluster manager for any executors. When running on YARN, I > repeatedly saw: > {code:java} > 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > {code} > I noticed that at least Google Dataproc and Ambari explicitly set > spark.executor.instances to a positive number, meaning that to use dynamic > allocation, you would have to edit spark-defaults.conf to remove the > property. That's obnoxious. > In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value > for --num-executors or --conf spark.executor.instances: > https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9 > It is much more reasonable for Streaming DRA to use spark.executor.instances, > just like Core DRA. I'll open a pull request to remove the check if there are > no objections. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances
[ https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21960: Assignee: Apache Spark > Spark Streaming Dynamic Allocation should respect spark.executor.instances > -- > > Key: SPARK-21960 > URL: https://issues.apache.org/jira/browse/SPARK-21960 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 2.2.0 >Reporter: Karthik Palaniappan >Assignee: Apache Spark >Priority: Minor > > This check enforces that spark.executor.instances (aka --num-executors) is > either unset or explicitly set to 0. > https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207 > If spark.executor.instances is unset, the check is fine, and the property > defaults to 2. Spark requests the cluster manager for 2 executors to start > with, then adds/removes executors appropriately. > However, if you explicitly set it to 0, the check also succeeds, but Spark > never asks the cluster manager for any executors. When running on YARN, I > repeatedly saw: > {code:java} > 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > {code} > I noticed that at least Google Dataproc and Ambari explicitly set > spark.executor.instances to a positive number, meaning that to use dynamic > allocation, you would have to edit spark-defaults.conf to remove the > property. That's obnoxious. > In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value > for --num-executors or --conf spark.executor.instances: > https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9 > It is much more reasonable for Streaming DRA to use spark.executor.instances, > just like Core DRA. I'll open a pull request to remove the check if there are > no objections. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21960) Spark Streaming Dynamic Allocation should respect spark.executor.instances
[ https://issues.apache.org/jira/browse/SPARK-21960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21960: Assignee: (was: Apache Spark) > Spark Streaming Dynamic Allocation should respect spark.executor.instances > -- > > Key: SPARK-21960 > URL: https://issues.apache.org/jira/browse/SPARK-21960 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 2.2.0 >Reporter: Karthik Palaniappan >Priority: Minor > > This check enforces that spark.executor.instances (aka --num-executors) is > either unset or explicitly set to 0. > https://github.com/apache/spark/blob/v2.2.0/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207 > If spark.executor.instances is unset, the check is fine, and the property > defaults to 2. Spark requests the cluster manager for 2 executors to start > with, then adds/removes executors appropriately. > However, if you explicitly set it to 0, the check also succeeds, but Spark > never asks the cluster manager for any executors. When running on YARN, I > repeatedly saw: > {code:java} > 17/08/22 19:35:21 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:36 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > 17/08/22 19:35:51 WARN org.apache.spark.scheduler.cluster.YarnScheduler: > Initial job has not accepted any resources; check your cluster UI to ensure > that workers are registered and have sufficient resources > {code} > I noticed that at least Google Dataproc and Ambari explicitly set > spark.executor.instances to a positive number, meaning that to use dynamic > allocation, you would have to edit spark-defaults.conf to remove the > property. That's obnoxious. > In addition, in Spark 2.3, spark-submit will refuse to accept "0" as a value > for --num-executors or --conf spark.executor.instances: > https://github.com/apache/spark/commit/0fd84b05dc9ac3de240791e2d4200d8bdffbb01a#diff-63a5d817d2d45ae24de577f6a1bd80f9 > It is much more reasonable for Streaming DRA to use spark.executor.instances, > just like Core DRA. I'll open a pull request to remove the check if there are > no objections. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org