[jira] [Commented] (SPARK-1453) Improve the way Spark on Yarn waits for executors before starting

2014-04-14 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968374#comment-13968374
 ] 

Thomas Graves commented on SPARK-1453:
--

Actually there are 2 timeouts. The one you mention which is  a max.  Then one 
in YarnClusterScheduler which I think is basically always another 3 seconds.

Ideally I think this should be b) by default with possibly the option for c), 
where c means I want x% but if I don't get it within a certain amount of time 
go ahead and run because I know my application will run ok with less resources 
(just not run optimally). 

I don't see a reason to do d).  If you have submitted your application then you 
want something to run.  If it exits then you have wasted all that time waiting. 
 I would rather the user just kill it if they have that tight of sla's.  Or 
they should get their own queue or reconfigure their queue.

I'd be ok with adding the option for d if some power users want it.  I think 
for most normal users b is the best default behavior though.  If possible we 
should tell the user why its waiting too.

 Improve the way Spark on Yarn waits for executors before starting
 -

 Key: SPARK-1453
 URL: https://issues.apache.org/jira/browse/SPARK-1453
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 Currently Spark on Yarn just delays a few seconds between when the spark 
 context is initialized and when it allows the job to start.  If you are on a 
 busy hadoop cluster is might take longer to get the number of executors. 
 In the very least we could make this timeout a configurable value.  Its 
 currently hardcoded to 3 seconds.  
 Better yet would be to allow user to give a minimum number of executors it 
 wants to wait for, but that looks much more complex. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1453) Improve the way Spark on Yarn waits for executors before starting

2014-04-14 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968390#comment-13968390
 ] 

Mridul Muralidharan commented on SPARK-1453:



(d) becomes relevant in case of headless/cron'ed jobs.
If the job is user initiated, then I agree, the user would typically kill and 
restart the job.

 Improve the way Spark on Yarn waits for executors before starting
 -

 Key: SPARK-1453
 URL: https://issues.apache.org/jira/browse/SPARK-1453
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 Currently Spark on Yarn just delays a few seconds between when the spark 
 context is initialized and when it allows the job to start.  If you are on a 
 busy hadoop cluster is might take longer to get the number of executors. 
 In the very least we could make this timeout a configurable value.  Its 
 currently hardcoded to 3 seconds.  
 Better yet would be to allow user to give a minimum number of executors it 
 wants to wait for, but that looks much more complex. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1453) Improve the way Spark on Yarn waits for executors before starting

2014-04-11 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967193#comment-13967193
 ] 

Mridul Muralidharan commented on SPARK-1453:


The timeout gets hit only when we dont get requested executors, right ? So it 
is more like max timeout (controlled by number of times we loop iirc).
The reason for keeping it stupid was simply because we have no gaurantees of 
number of containers which might be available to spark in a busy cluster : at 
times, it might not be practically possible to even get a fraction of the 
requested nodes (either due to busy cluster, or because of lack of resources - 
so infinite wait).

Ideally, I should have exposed the number of containers allocated - so that 
atleast user code could use it as spi and decide how to proceed for more 
complex cases. Missed out on this one.

I am not sure which usecases make sense.
a) Wait for X seconds or requested containers allocated.
b) Wait until minimum of Y containers allocated (out of X requested).
c) (b) with (a) - that is min containers and timeout on that.
d) (c) with exit if min containers not allocated ?

(d) is something which I keep hitting into (if I dont get my required minimum 
nodes, and job proceeds, I usually end up bringing down those nodes :-( )

 Improve the way Spark on Yarn waits for executors before starting
 -

 Key: SPARK-1453
 URL: https://issues.apache.org/jira/browse/SPARK-1453
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 Currently Spark on Yarn just delays a few seconds between when the spark 
 context is initialized and when it allows the job to start.  If you are on a 
 busy hadoop cluster is might take longer to get the number of executors. 
 In the very least we could make this timeout a configurable value.  Its 
 currently hardcoded to 3 seconds.  
 Better yet would be to allow user to give a minimum number of executors it 
 wants to wait for, but that looks much more complex. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)