[ https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162685#comment-14162685 ]
Sandy Ryza commented on SPARK-3174: ----------------------------------- bq. Maybe it makes sense to just call it `spark.dynamicAllocation.*` That sounds good to me. bq. I think in general we should limit the number of things that will affect adding/removing executors. Otherwise an application might get/lose many executors all of a sudden without a good understanding of why. Also anticipating what's needed in a future stage is usually fairly difficult, because you don't know a priori how long each stage is running. I don't see a good metric to decide how far in the future to anticipate for. Consider the (common) case of a user keeping a Hive session open and setting a low number of minimum executors in order to not sit on cluster resources when idle. Goal number 1 should be making queries return as fast as possible. A policy that, upon receiving a job, simply requested executors with enough slots to handle all the tasks required by the first stage would be a vast latency and user experience improvement over the exponential increase policy. Given that resource managers like YARN will mediate fairness between users and that Spark will be able to give executors back, there's not much advantage to being conservative or ramping up slowly in this case. Accurately anticipating resource needs is difficult, but not necessary. > Provide elastic scaling within a Spark application > -------------------------------------------------- > > Key: SPARK-3174 > URL: https://issues.apache.org/jira/browse/SPARK-3174 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN > Affects Versions: 1.0.2 > Reporter: Sandy Ryza > Assignee: Andrew Or > Attachments: SPARK-3174design.pdf, > dynamic-scaling-executors-10-6-14.pdf > > > A common complaint with Spark in a multi-tenant environment is that > applications have a fixed allocation that doesn't grow and shrink with their > resource needs. We're blocked on YARN-1197 for dynamically changing the > resources within executors, but we can still allocate and discard whole > executors. > It would be useful to have some heuristics that > * Request more executors when many pending tasks are building up > * Discard executors when they are idle > See the latest design doc for more information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org