[ 
https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542869
 ] 

Arun C Murthy commented on HADOOP-2141:
---------------------------------------

Thanks for your comments Runping, some thoughts of my own...

----

bq.1. A speculative execution for a mapper (reducer) is started only if there 
are no pending non-speculative mappers (reducers)

I believe this is already the case for choosing speculative tasks... I'll 
double-check.


bq. 2. We should estimate the expected finish time for a mapper(reducer) based 
on its current progression state and progression rate. A speculative execution 
for a mapper (reducer) is starte only if the projected finish time is far away 
than the average execution time of mappers(reducers)

Hmm... I'm concerned this could lead to some aggressively spawned reduce tasks 
in cases that Koji reported. Do you see a way to do this more conservatively 
and yet keep it simple?


bq. 3. It is a bit treaky to compute the average execution of reducers. If a 
reducer started before the map phase completed, then the overalp period should 
be taken out.

Ok, I agree in principle. Yet I'm concerned about whether this is an over-kill. 
We could subtract the time it took all mappers to finish... I'm not very sure.


bq. 4. If a reducer is stucked at shuffling state, the real reason for the 
stall may be related to the machine(s) where the needed map outputs sit. 
Launching a speculative execution of the reducer may not help. In this case, we 
may need to declare the concerned mappers are gone and re-run them.

I'm hoping HADOOP-1128, and more recently HADOOP-1984, take care of this; as 
long as we aren't too aggressive about starting speculative reduces.

----

Overall, I'm very concerned about keeping this reasonably simple, atleast as a 
first-pass, till we have a chance to see this in action in the real-world. We 
can then iterate...


> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Koji Noguchi
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative 
> instance of a task is that it must be at least 20% behind the average 
> progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop 
> making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for 
> tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to