[ 
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169830#comment-14169830
 ] 

Andrew Or edited comment on SPARK-3174 at 10/13/14 7:45 PM:
------------------------------------------------------------

[~vanzin]

bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.


was (Author: andrewor14):
[~vanzin]

bq. My first question I think is similar to Tom's. It was not clear to me how 
the app will behave when it starts up. I'd expect the first job to be the one 
that has to process the largest amount of data, so it would benefit from having 
as many executors as possible available as quickly as possible - something that 
seems to conflict with the idea of a slow start.
bq. Are you proposing a change to the current semantics, where Yarn will 
request "--num-executors" up front? If you keep that, I think that would cover 
my above concerns. But switching to a slow start with no option to pre-allocate 
a certain numbers seems like it might harm certain jobs.

I'm actually not proposing to change the application start-up behavior. Spark 
will continue to request however many number of executors it will today 
upfront. The slow-start comes in when you want to add executors after removing 
them. Also, you can control how often you want to add executors with a config, 
so if the application wants the behavior where it requests all executors at 
once, it can still do that.

bq. My second is about the shuffle service you're proposing. Have you 
investigated whether it would be possible to make Hadoop's shuffle service more 
generic, so that Spark can benefit from it? It does mean that this feature 
might be constrained to certain versions of Hadoop, but maybe that's not 
necessarily a bad thing if it means more infrastructure is shared.

I have indeed. The main difficulty to integrating Spark and Yarn cleanly there 
stems from the hard-coded shuffle file paths and index shuffle file format. 
Currently, both are highly specific to MR, and although we can work around them 
by adapting Spark's shuffle behavior to MR's (non-trivial but certainly 
possible), we'll only be able to use the feature on Yarn. If we decide to 
extend this feature to standalone or mesos mode, we'll have to do what we're 
doing right now anyway since we can't rely on the Yarn ShuffleHandler there.

> Provide elastic scaling within a Spark application
> --------------------------------------------------
>
>                 Key: SPARK-3174
>                 URL: https://issues.apache.org/jira/browse/SPARK-3174
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 1.0.2
>            Reporter: Sandy Ryza
>            Assignee: Andrew Or
>         Attachments: SPARK-3174design.pdf, 
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that 
> applications have a fixed allocation that doesn't grow and shrink with their 
> resource needs.  We're blocked on YARN-1197 for dynamically changing the 
> resources within executors, but we can still allocate and discard whole 
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to