[
https://issues.apache.org/jira/browse/MAPREDUCE-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy reopened MAPREDUCE-460:
-------------------------------------
Assignee: (was: Owen O'Malley)
This is easy to support in MR2, I think we should take a crack at this - sounds
like a very useful feature.
> Should be able to re-run jobs, collecting only missing output
> -------------------------------------------------------------
>
> Key: MAPREDUCE-460
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-460
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: mrv2
> Reporter: Bryan Pendleton
> Priority: Minor
>
> For jobs with no side effects (roughly == jobs with speculative execution
> enabled), if partial output has been generated, it should be possible to
> re-run the job, and fill in the missing pieces. I have now run the same job
> twice, once finishing 42 of 44 reduce tasks, another time finishing only 17.
> Each time, many nodes have failed, causing many many tasks to fail ( in one
> case, 5k failures from 15k map tasks, 23 failures from 44 reduces), but some
> valid output was generated. Since the output is only dependent on the input,
> and both jobs used the same input, I will now be able to combine these two
> failed task outputs to get a completed job's output. This should be something
> that can be more automatic.
> In particular, it should be possible to resubmit a job, with a list of
> partitions that should be ignored. A special Combiner, or pre-Combiner, would
> throw out any map output for partitions that have already been successfully
> completed, thus reducing the amount of data that needs to be reduced to
> complete the job. It would, of course, be nice to support "filling in"
> existing outputs, rather than having to do a move operation on completed
> outputs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira