[ 
https://issues.apache.org/jira/browse/SPARK-25889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669384#comment-16669384
 ] 

DB Tsai commented on SPARK-25889:
---------------------------------

The problem sounds valid, and the solution could work. Since I'm not in the 
field of job scheduler, ping [~vanzin] for expert's input. Thanks for the 
write-up.

> Dynamic allocation load-aware ramp up
> -------------------------------------
>
>                 Key: SPARK-25889
>                 URL: https://issues.apache.org/jira/browse/SPARK-25889
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler, YARN
>    Affects Versions: 2.3.2
>            Reporter: Adam Kennedy
>            Priority: Major
>
> The time based exponential ramp up behavior for dynamic allocation is naive 
> and destructive, making it very difficult to run very large jobs.
> On a large (64,000 core) YARN cluster with a high number of input partitions 
> (200,000+) the default dynamic allocation approach of requesting containers 
> in waves, doubling exponentially once per second, results in 50% of the 
> entire cluster being requested in the final 1 second wave.
> This can easily overwhelm RPC processing, or cause expensive Executor startup 
> steps to break systems. With the interval so short, many additional 
> containers may be requested beyond what is actually needed and then complete 
> very little work before sitting around waiting to be deallocated.
> Delaying the time between these fixed doublings only has limited impact. 
> Setting double intervals to once per minute would result in a very slow ramp 
> up speed, at the end of which we still face large potentially crippling waves 
> of executor startup.
> An alternative approach to spooling up large job appears to be needed, which 
> is still relatively simple but could be more adaptable to different cluster 
> sizes and differing cluster and job performance.
> I would like to propose a few different approaches based around the general 
> idea of controlling outstanding requests for new containers based on the 
> number of executors that are currently running, for some definition of 
> "running".
> One example might be to limit requests to one new executor for every existing 
> executor that currently has an active task. Or some ratio of that, to allow 
> for more or less aggressive spool up. A lower number would let us approximate 
> something like fibonacci ramp up, a higher number of say 2x would spool up 
> quickly, but still aligned with the rate at which broadcast blocks can be 
> easily distributed to new members.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to