[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160937#comment-14160937
 ] 

Sandy Ryza commented on SPARK-3174:
-----------------------------------

Thanks for posting the detailed design, Andrew.  A few comments.

I would expect properties underneath spark.executor.* to pertain to what goes 
on inside of executors.  This is really more of a driver/scheduler feature, so 
a different prefix might make more sense.

Because it's a large task and there's still significant value without it, I 
assume we'll hold off on implementing the graceful decommission until we're 
done with the first parts?

This is probably out of scope for the first cut, but in the future it might be 
useful to include addition/removal policies that use what Spark knows about 
upcoming stages to anticipate the number of executors needed.  Can we structure 
the config property names in a way that will make sense if we choose to add 
more advanced functionality like this?

When cluster resources are constrained, we may find ourselves in situations 
where YARN is unable to allocate the additional resources we requested before 
the next time interval.  I haven't thought about it extremely deeply, but it 
seems like there may be some pathological situations in which we request an 
enormous number of additional executors while waiting.  It might make sense to 
do something like avoid increasing the number requested until we've actually 
received some?

Last, any thoughts on what reasonable intervals would be?  For the add 
interval, I imagine that we want it to be at least the amount of time required 
between invoking requestExecutors and being able to schedule tasks on the 
executors requested.

> Provide elastic scaling within a Spark application
> --------------------------------------------------
>
>                 Key: SPARK-3174
>                 URL: https://issues.apache.org/jira/browse/SPARK-3174
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.0.2
>            Reporter: Sandy Ryza
>            Assignee: Andrew Or
>         Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to