[ 
https://issues.apache.org/jira/browse/MAPREDUCE-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reopened MAPREDUCE-460:
-------------------------------------

      Assignee:     (was: Owen O'Malley)

This is easy to support in MR2, I think we should take a crack at this - sounds 
like a very useful feature.
                
> Should be able to re-run jobs, collecting only missing output
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-460
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-460
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Bryan Pendleton
>            Priority: Minor
>
> For jobs with no side effects (roughly == jobs with speculative execution 
> enabled), if partial output has been generated, it should be possible to 
> re-run the job, and fill in the missing pieces. I have now run the same job 
> twice, once finishing 42 of 44 reduce tasks, another time finishing only 17. 
> Each time, many nodes have failed, causing many many tasks to fail ( in one 
> case, 5k failures from 15k map tasks, 23 failures from 44 reduces), but some 
> valid output was generated. Since the output is only dependent on the input, 
> and both jobs used the same input, I will now be able to combine these two 
> failed task outputs to get a completed job's output. This should be something 
> that can be more automatic.
> In particular, it should be possible to resubmit a job, with a list of 
> partitions that should be ignored. A special Combiner, or pre-Combiner, would 
> throw out any map output for partitions that have already been successfully 
> completed, thus reducing the amount of data that needs to be reduced to 
> complete the job. It would, of course, be nice to support "filling in" 
> existing outputs, rather than having to do a move operation on completed 
> outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to