[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160937#comment-14160937 ]
Sandy Ryza commented on SPARK-3174: ----------------------------------- Thanks for posting the detailed design, Andrew. A few comments. I would expect properties underneath spark.executor.* to pertain to what goes on inside of executors. This is really more of a driver/scheduler feature, so a different prefix might make more sense. Because it's a large task and there's still significant value without it, I assume we'll hold off on implementing the graceful decommission until we're done with the first parts? This is probably out of scope for the first cut, but in the future it might be useful to include addition/removal policies that use what Spark knows about upcoming stages to anticipate the number of executors needed. Can we structure the config property names in a way that will make sense if we choose to add more advanced functionality like this? When cluster resources are constrained, we may find ourselves in situations where YARN is unable to allocate the additional resources we requested before the next time interval. I haven't thought about it extremely deeply, but it seems like there may be some pathological situations in which we request an enormous number of additional executors while waiting. It might make sense to do something like avoid increasing the number requested until we've actually received some? Last, any thoughts on what reasonable intervals would be? For the add interval, I imagine that we want it to be at least the amount of time required between invoking requestExecutors and being able to schedule tasks on the executors requested. > Provide elastic scaling within a Spark application > -------------------------------------------------- > > Key: SPARK-3174 > URL: https://issues.apache.org/jira/browse/SPARK-3174 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN > Affects Versions: 1.0.2 > Reporter: Sandy Ryza > Assignee: Andrew Or > Attachments: SPARK-3174design.pdf, > dynamic-scaling-executors-10-6-14.pdf > > > A common complaint with Spark in a multi-tenant environment is that > applications have a fixed allocation that doesn't grow and shrink with their > resource needs. We're blocked on YARN-1197 for dynamically changing the > resources within executors, but we can still allocate and discard whole > executors. > It would be useful to have some heuristics that > * Request more executors when many pending tasks are building up > * Discard executors when they are idle > See the latest design doc for more information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org