[ 
https://issues.apache.org/jira/browse/SPARK-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386235#comment-15386235
 ] 

Thomas Graves commented on SPARK-2666:
--------------------------------------

I think eventually adding prestart (MapReduce slowstart type setting) makes 
sense.  This is actually why I didn't change the mapoutput statuses to go along 
with task launch. I wanted to be able to do this or get incremental map output 
status results.

But as far as the keep the remaining tasks running I think it depends on the 
behavior and I haven't had time to go look in more detail.

If the stages fails, what tasks does it rerun:

1) does it rerun all the ones not succeeded yet in the failed stage (including 
the ones that could still be running)?  
2) does it only run the failed ones and wait for the ones still running in 
failed stage?  If they succeed it uses those results.

>From what I saw with this job I thought it was acting like number 1 above. The 
>only use to leave the ones running is to see if they get FetchFailures, this 
>seems like a lot of overhead to find that out if that task takes a long time.

When a fetch failure happens, does the schedule re-run all maps that had run on 
that node or just the ones specifically mentioned by the fetch failure?  Again 
I thought it was just the specific map that the fetch failure failed to get, 
thus why it needs to know if the other reducers get fetch failures.

I can kind of understand letting them run to see if they hit fetch failures as 
well but on a large job or with tasks that take a long time, if we aren't  
counting them as success then its more a waste of resources and just extends 
the job time as well as confuses the user since the UI doesn't represent those 
still running.

 In the case i was seeing my tasks took roughly an hour.  One stage failed so 
it restarted that stage, but since it didn't kill the tasks from the original 
stage it had very few executors open to run new ones, thus the job took a lot 
longer then it should.   I don't remember the exact cause of the failures 
anymore.

Anyway I think the results are going to vary a lot based on the type of job and 
length of each stage (map vs reduce). 

personally I think it would be better to change to fail all maps that ran on 
the host it failed to fetch from and kill the rest of the running reducers in 
that stage. But I would have to investigate the code more to fully understand.



> Always try to cancel running tasks when a stage is marked as zombie
> -------------------------------------------------------------------
>
>                 Key: SPARK-2666
>                 URL: https://issues.apache.org/jira/browse/SPARK-2666
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, Spark Core
>            Reporter: Lianhui Wang
>
> There are some situations in which the scheduler can mark a task set as a 
> "zombie" before the task set has completed all of its tasks.  For example:
> (a) When a task fails b/c of a {{FetchFailed}}
> (b) When a stage completes because two different attempts create all the 
> ShuffleMapOutput, though no attempt has completed all its tasks (at least, 
> this *should* result in the task set being marked as zombie, see SPARK-10370)
> (there may be others, I'm not sure if this list is exhaustive.)
> Marking a taskset as zombie prevents any *additional* tasks from getting 
> scheduled, however it does not cancel all currently running tasks.  We should 
> cancel all running to avoid wasting resources (and also to make the behavior 
> a little more clear to the end user).  Rather than canceling tasks in each 
> case piecemeal, we should refactor the scheduler so that these two actions 
> are always taken together -- canceling tasks should go hand-in-hand with 
> marking the taskset as zombie.
> Some implementation notes:
> * We should change {{taskSetManager.isZombie}} to be private and put it 
> behind a method like {{markZombie}} or something.
> * marking a stage as zombie before the all tasks have completed does *not* 
> necessarily mean the stage attempt has failed.  In case (a), the stage 
> attempt has failed, but in stage (b) we are not canceling b/c of a failure, 
> rather just b/c no more tasks are needed.
> * {{taskScheduler.cancelTasks}} always marks the task set as zombie.  
> However, it also has some side-effects like logging that the stage has failed 
> and creating a {{TaskSetFailed}} event, which we don't want eg. in case (b) 
> when nothing has failed.  So it may need some additional refactoring to go 
> along w/ {{markZombie}}.
> * {{SchedulerBackend}}'s are free to not implement {{killTask}}, so we need 
> to be sure to catch the {{UnsupportedOperationException}} s
> * Testing this *might* benefit from SPARK-10372



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to