[ 
https://issues.apache.org/jira/browse/SPARK-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075353#comment-15075353
 ] 

Andrew Or commented on SPARK-12554:
-----------------------------------

[~jerrylead] The change you are proposing changes the semantics incorrectly.

The right fix here is to adjust the wait behavior. Right now it doesn't take 
into account the fact that executors can have fixed number of cores, such that 
we never get to `spark.cores.max`. All we need to do is to use the right max 
when waiting for resources, e.g. in your example instead of waiting for all 10 
cores, we wait for the nearest multiple of 4, i.e. 8.

Your case 2 is not at all a bug. The user chose settings that are impossible to 
fulfill. Although there's nothing to fix we can throw an exception to fail the 
application quickly, but no one really runs into this so it's probably not 
worth doing.

> Standalone app scheduler will hang when app.coreToAssign < minCoresPerExecutor
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-12554
>                 URL: https://issues.apache.org/jira/browse/SPARK-12554
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Scheduler
>    Affects Versions: 1.5.2
>            Reporter: Lijie Xu
>
> In scheduleExecutorsOnWorker() in Master.scala,
> {{val keepScheduling = coresToAssign >= minCoresPerExecutor}} should be 
> changed to {{val keepScheduling = coresToAssign > 0}}
> Case 1: 
> Suppose that an app's requested cores is 10 (i.e., {{spark.cores.max = 10}}) 
> and app.coresPerExecutor is 4 (i.e., {{spark.executor.cores = 4}}). 
> After allocating two executors (each has 4 cores) to this app, the 
> {{app.coresToAssign = 2}} and {{minCoresPerExecutor = coresPerExecutor = 4}}, 
> so {{keepScheduling = false}} and no extra executor will be allocated to this 
> app. If {{spark.scheduler.minRegisteredResourcesRatio}} is set to a large 
> number (e.g., > 0.8 in this case), the app will hang and never finish.
> Case 2: if a small app's coresPerExecutor is larger than its requested cores 
> (e.g., {{spark.cores.max = 10}}, {{spark.executor.cores = 16}}), {{val 
> keepScheduling = coresToAssign >= minCoresPerExecutor}} is always FALSE. As a 
> result, this app will never get an executor to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to