[ https://issues.apache.org/jira/browse/SPARK-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074728#comment-15074728 ]
Lijie Xu commented on SPARK-12554: ---------------------------------- Case 2 may be categorized to misconfiguration. However, we'd better not to blame users for misconfiguration. A reasonable solution is to give users a warning info. For case 1, I am sorry that I cannot quite understand "provision more resources". Currently, the app will starve for extra cores (cannot get the cores even the cluster has enough cores and memory). My proposal is to allocate a *normal* executor (i.e., has the same cores and memory quota with the previously allocated executors) to the app, if the app's requested extra cores are reasonable. Reasonable means that the requested extra cores are close to the normal executor's cores (e.g., extra cores / spark.executor.cores >= 0.75) or the normal executor is pretty small (spark.executor.cores <= 8). As a result, the wasted resources can be omitted and the app's execution time reduces 30s. Since we allocate a normal executor instead of a smaller one, the resource management mechanism (e.g., memory allocation) is as same as before. > Standalone app scheduler will hang when app.coreToAssign < minCoresPerExecutor > ------------------------------------------------------------------------------ > > Key: SPARK-12554 > URL: https://issues.apache.org/jira/browse/SPARK-12554 > Project: Spark > Issue Type: Bug > Components: Deploy, Scheduler > Affects Versions: 1.5.2 > Reporter: Lijie Xu > > In scheduleExecutorsOnWorker() in Master.scala, > {{val keepScheduling = coresToAssign >= minCoresPerExecutor}} should be > changed to {{val keepScheduling = coresToAssign > 0}} > Case 1: > Suppose that an app's requested cores is 10 (i.e., {{spark.cores.max = 10}}) > and app.coresPerExecutor is 4 (i.e., {{spark.executor.cores = 4}}). > After allocating two executors (each has 4 cores) to this app, the > {{app.coresToAssign = 2}} and {{minCoresPerExecutor = coresPerExecutor = 4}}, > so {{keepScheduling = false}} and no extra executor will be allocated to this > app. If {{spark.scheduler.minRegisteredResourcesRatio}} is set to a large > number (e.g., > 0.8 in this case), the app will hang and never finish. > Case 2: if a small app's coresPerExecutor is larger than its requested cores > (e.g., {{spark.cores.max = 10}}, {{spark.executor.cores = 16}}), {{val > keepScheduling = coresToAssign >= minCoresPerExecutor}} is always FALSE. As a > result, this app will never get an executor to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org