[ https://issues.apache.org/jira/browse/SPARK-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075353#comment-15075353 ]
Andrew Or commented on SPARK-12554: ----------------------------------- [~jerrylead] The change you are proposing changes the semantics incorrectly. The right fix here is to adjust the wait behavior. Right now it doesn't take into account the fact that executors can have fixed number of cores, such that we never get to `spark.cores.max`. All we need to do is to use the right max when waiting for resources, e.g. in your example instead of waiting for all 10 cores, we wait for the nearest multiple of 4, i.e. 8. Your case 2 is not at all a bug. The user chose settings that are impossible to fulfill. Although there's nothing to fix we can throw an exception to fail the application quickly, but no one really runs into this so it's probably not worth doing. > Standalone app scheduler will hang when app.coreToAssign < minCoresPerExecutor > ------------------------------------------------------------------------------ > > Key: SPARK-12554 > URL: https://issues.apache.org/jira/browse/SPARK-12554 > Project: Spark > Issue Type: Bug > Components: Deploy, Scheduler > Affects Versions: 1.5.2 > Reporter: Lijie Xu > > In scheduleExecutorsOnWorker() in Master.scala, > {{val keepScheduling = coresToAssign >= minCoresPerExecutor}} should be > changed to {{val keepScheduling = coresToAssign > 0}} > Case 1: > Suppose that an app's requested cores is 10 (i.e., {{spark.cores.max = 10}}) > and app.coresPerExecutor is 4 (i.e., {{spark.executor.cores = 4}}). > After allocating two executors (each has 4 cores) to this app, the > {{app.coresToAssign = 2}} and {{minCoresPerExecutor = coresPerExecutor = 4}}, > so {{keepScheduling = false}} and no extra executor will be allocated to this > app. If {{spark.scheduler.minRegisteredResourcesRatio}} is set to a large > number (e.g., > 0.8 in this case), the app will hang and never finish. > Case 2: if a small app's coresPerExecutor is larger than its requested cores > (e.g., {{spark.cores.max = 10}}, {{spark.executor.cores = 16}}), {{val > keepScheduling = coresToAssign >= minCoresPerExecutor}} is always FALSE. As a > result, this app will never get an executor to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org