[ 
https://issues.apache.org/jira/browse/SPARK-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074728#comment-15074728
 ] 

Lijie Xu commented on SPARK-12554:
----------------------------------

Case 2 may be categorized to misconfiguration. However, we'd better not to 
blame users for misconfiguration. A reasonable solution is to give users a 
warning info. 

For case 1, I am sorry that I cannot quite understand "provision more 
resources". Currently, the app will starve for extra cores (cannot get the 
cores even the cluster has enough cores and memory). 

My proposal is to allocate a *normal* executor (i.e., has the same cores and 
memory quota with the previously allocated executors) to the app, if the app's 
requested extra cores are reasonable. Reasonable means that the requested extra 
cores are close to the normal executor's cores (e.g., extra cores / 
spark.executor.cores >= 0.75) or the normal executor is pretty small 
(spark.executor.cores <= 8). As a result, the wasted resources can be omitted 
and the app's execution time reduces 30s. Since we allocate a normal executor 
instead of a smaller one, the resource management mechanism (e.g., memory 
allocation) is as same as before.



> Standalone app scheduler will hang when app.coreToAssign < minCoresPerExecutor
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-12554
>                 URL: https://issues.apache.org/jira/browse/SPARK-12554
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Scheduler
>    Affects Versions: 1.5.2
>            Reporter: Lijie Xu
>
> In scheduleExecutorsOnWorker() in Master.scala,
> {{val keepScheduling = coresToAssign >= minCoresPerExecutor}} should be 
> changed to {{val keepScheduling = coresToAssign > 0}}
> Case 1: 
> Suppose that an app's requested cores is 10 (i.e., {{spark.cores.max = 10}}) 
> and app.coresPerExecutor is 4 (i.e., {{spark.executor.cores = 4}}). 
> After allocating two executors (each has 4 cores) to this app, the 
> {{app.coresToAssign = 2}} and {{minCoresPerExecutor = coresPerExecutor = 4}}, 
> so {{keepScheduling = false}} and no extra executor will be allocated to this 
> app. If {{spark.scheduler.minRegisteredResourcesRatio}} is set to a large 
> number (e.g., > 0.8 in this case), the app will hang and never finish.
> Case 2: if a small app's coresPerExecutor is larger than its requested cores 
> (e.g., {{spark.cores.max = 10}}, {{spark.executor.cores = 16}}), {{val 
> keepScheduling = coresToAssign >= minCoresPerExecutor}} is always FALSE. As a 
> result, this app will never get an executor to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to