[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161372#comment-14161372
 ] 

Andrew Or commented on SPARK-3174:
----------------------------------

@[~sandyr] Replying inline:

bq. I would expect properties underneath spark.executor.* to pertain to what 
goes on inside of executors. This is really more of a driver/scheduler feature, 
so a different prefix might make more sense.

Ah I see. Makes sense. Maybe it makes sense to just call it 
`spark.dynamicAllocation.*`. Do you have other suggestions?

bq. Because it's a large task and there's still significant value without it, I 
assume we'll hold off on implementing the graceful decommission until we're 
done with the first parts?

I intend for all of these components to be in 1.2. We can actually do the 
decommission part in parallel. Aaron is working on a more general service that 
serves shuffle files independently of the executors in SPARK-3796, and after 
that's ready I will integrate it into Yarn's aux service in SPARK-3797. 
Meanwhile we can still work on the heuristics and mechanisms for scaling. (All 
of this is assuming we only handle the shuffles but not the blocks).

bq. This is probably out of scope for the first cut, but in the future it might 
be useful to include addition/removal policies that use what Spark knows about 
upcoming stages to anticipate the number of executors needed. Can we structure 
the config property names in a way that will make sense if we choose to add 
more advanced functionality like this?

I think in general we should limit the number of things that will affect 
adding/removing executors. Otherwise an application might get/lose many 
executors all of a sudden without a good understanding of why. Also 
anticipating what's needed in a future stage is usually fairly difficult, 
because you don't know a priori how long each stage is running. I don't see a 
good metric to decide how far in the future to anticipate for.

bq. When cluster resources are constrained, we may find ourselves in situations 
where YARN is unable to allocate the additional resources we requested before 
the next time interval. I haven't thought about it extremely deeply, but it 
seems like there may be some pathological situations in which we request an 
enormous number of additional executors while waiting. It might make sense to 
do something like avoid increasing the number requested until we've actually 
received some?

Yes, there is a config that limits the number of executors you can have. You 
raise a good point that if the resource manager keeps rejecting your requests 
for more executors, your application might want to back off a little before 
trying again so you don't flood the RM, though that adds some complexity.

bq. Last, any thoughts on what reasonable intervals would be? For the add 
interval, I imagine that we want it to be at least the amount of time required 
between invoking requestExecutors and being able to schedule tasks on the 
executors requested.

I think the intervals high depend on your workload. I don't have a concrete 
number on this, but I think the time between sending a request and being able 
to schedule tasks on the new executor is on the order of tens of seconds. This 
feature also targets heavy workloads, the add interval should be on the order 
of minutes or tens of minutes.

bq. My opinion is that, for a first cut without the graceful decommission, we 
should go with option 2 and only remove executors that have no cached blocks.

Maybe. I've been back and forth about this one. I suppose approach (b) of 
removing executors only if they have no cached blocks is more conservative. The 
worst that can happen with (b) is that you don't remove executors, but before 
1.2.0 you already can't remove executors, so there's little possibility for 
regression there, whereas the worst that can happen with (a) is that your 
application suddenly has worse performance. Note that if we throw an exception 
when the application attempts to cache blocks then (a) and (b) are basically 
equivalent.


> Provide elastic scaling within a Spark application
> --------------------------------------------------
>
>                 Key: SPARK-3174
>                 URL: https://issues.apache.org/jira/browse/SPARK-3174
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.0.2
>            Reporter: Sandy Ryza
>            Assignee: Andrew Or
>         Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to