[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968649#comment-14968649 ]
Saisai Shao commented on SPARK-11181: ------------------------------------- I think you'd better backport to branch 1.3, not for tag 1.3.1 :). > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > ------------------------------------------------------------------------------------------- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN > Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. > Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org