[
https://issues.apache.org/jira/browse/OOZIE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731455#comment-13731455
]
Robert Kanter commented on OOZIE-1483:
--------------------------------------
Discussion thread on dev mailing list here:
http://mail-archives.apache.org/mod_mbox/oozie-dev/201308.mbox/%3cCAHz+ZFeCKbnx7v7=qmtxn90y7tfhmorg4yzgbfecpuj7tss...@mail.gmail.com%3e
> Support for Job Recoverability
> ------------------------------
>
> Key: OOZIE-1483
> URL: https://issues.apache.org/jira/browse/OOZIE-1483
> Project: Oozie
> Issue Type: Improvement
> Reporter: Robert Kanter
> Assignee: Robert Kanter
>
> To support for the JobTracker to recover jobs on restart, we need to
> configure the launcher job to be restarted by the JT, but not any of the
> launched jobs ({{mapreduce.job.restart.recover}}). This way, the launcher
> job will simply start over when the JT recovers it; if we allow the JT to
> recover the actual jobs, then they will interfere. We'll also need this for
> the same ability in YARN.
> This should be fairly trivial except for the MapReduce action because of the
> optimization where the launcher finishes instead of waiting for the actual
> job and Oozie does an "id swap". Trying to add support for JT to recover the
> MR action doesn't seem feasible as we'd run into a lot of trickiness and some
> race conditions due to the id swap.
> Instead, I think we should remove the MR optimization because it will allow
> us to to support the recoverability for the MR action as well. This also has
> the benefit of simplifying the code because we'd be getting rid of all of the
> id swap stuff and also making the MR action consistent with the other
> actions. The only downside is that the MR action will take an extra Map slot
> just like the other actions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira