[ 
https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556507#action_12556507
 ] 

Joydeep Sen Sarma commented on HADOOP-2141:
-------------------------------------------

we recently upgraded to 14.4 and had spec. execution on by default for a week 
or so. i am looking for a place to pass on some feedback and seems like we had 
a fine discussion going on here.

for starters - speculative execution is a good optimization for us to have - we 
frequently see tasks stuck in 'Initializing' state (dunno why) and when 
speculative execution was turned on by default - that problem was largely 
resolved.

however - we quickly realized that spec. execution was way too aggressive for 
us. almost always, we would have speculative reduce tasks when all was going 
well. given that we have a pretty small cluster, all those extra processes 
really started slowing us down. and we eventually had to turn the default 
setting to off. looking at the code, its clear that the protocol for 
speculative execution is not quite what we would want. 

the proposals in this thread seem interesting. there are a few additional 
things to consider though:

- speculative execution should consider overall system load and be more or less 
aggressive depending on how busy it is. 
- if we wait until 90+% completion, our problematic case (tasks stuck for long 
time in initializing state) would not be handled well.
- finally - if the tracker decides to execute speculatively - shouldn't the 
slower of the two tasks be killed? 

looking forward to a discussion. i would like to try something out in 0.14 
itself since this is a pain point for us right now.

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Koji Noguchi
>            Assignee: Arun C Murthy
>             Fix For: 0.17.0
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative 
> instance of a task is that it must be at least 20% behind the average 
> progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop 
> making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for 
> tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to