[ 
https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968374#comment-13968374
 ] 

Thomas Graves commented on SPARK-1453:
--------------------------------------

Actually there are 2 timeouts. The one you mention which is  a max.  Then one 
in YarnClusterScheduler which I think is basically always another 3 seconds.

Ideally I think this should be b) by default with possibly the option for c), 
where c means I want x% but if I don't get it within a certain amount of time 
go ahead and run because I know my application will run ok with less resources 
(just not run optimally). 

I don't see a reason to do d).  If you have submitted your application then you 
want something to run.  If it exits then you have wasted all that time waiting. 
 I would rather the user just kill it if they have that tight of sla's.  Or 
they should get their own queue or reconfigure their queue.

I'd be ok with adding the option for d if some power users want it.  I think 
for most normal users b is the best default behavior though.  If possible we 
should tell the user why its waiting too.

> Improve the way Spark on Yarn waits for executors before starting
> -----------------------------------------------------------------
>
>                 Key: SPARK-1453
>                 URL: https://issues.apache.org/jira/browse/SPARK-1453
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 1.0.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> Currently Spark on Yarn just delays a few seconds between when the spark 
> context is initialized and when it allows the job to start.  If you are on a 
> busy hadoop cluster is might take longer to get the number of executors. 
> In the very least we could make this timeout a configurable value.  Its 
> currently hardcoded to 3 seconds.  
> Better yet would be to allow user to give a minimum number of executors it 
> wants to wait for, but that looks much more complex. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to