[ 
https://issues.apache.org/jira/browse/HADOOP-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548003
 ] 

Doug Cutting commented on HADOOP-2327:
--------------------------------------

Wouldn't it be better to invest in a more configurable and extensible retry 
mechanism for maps?  If a task fails, we should have hooks that permit cleanup 
of any side-effect data before retry and/or to move final results into place on 
success.  If we had to choose, I'd rather have that than the ability to re-run 
particular tasks by hand.

Another approach to this might be bypass tasks whose output already exists.  
Then one can simply re-run the original job and only those tasks whose output 
does not exist would require execution.  For example, the InputFormat could 
check which outputs already exist and not generate input splits for those.  
Could that work?

> Streaming: need to be able to re-run specific map tasks (when -reducer NONE)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2327
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2327
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> Sometimes, a few map tasks fail and -reducer NONE.  
> It should be possible to rerun the failed map tasks .
> There are several failure modes:
>    * a task is hanging, so the job is killed
>    * from the infrastructure perspective, the task has completed successfully 
> , but it failed to produces correct result
>    * failed in the proper Hadoop sense
> It is often too expensive to rerun the whole job.  And for larger jobs, 
> chances are each run will have a few failed tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to