Wouldn't oozie poll for the job status and decide that it has failed and
when JT comes up launch another one if retry is configured?

On Mon, Aug 5, 2013 at 3:11 PM, Robert Kanter <[email protected]> wrote:

> Hi,
>
> We looked into how to support Job Recoverability (i.e. the JT is restarted
> and it wants to restart the jobs that were running; similarly for YARN) and
> have a pretty simple solution for all of the action types except for
> MapReduce.  If we set mapreduce.job.restart.recover=true for the launcher
> job and mapreduce.job.restart.recover=false for the jobs launched by the
> launcher, then when the JT restarts, it will recover the launcher job but
> not the child jobs -- the launcher job will then take care of relaunching
> the child jobs.
>
> For MapReduce, because of the optimization with the id swap, this won't
> work.  It would be very tricky, if it's even practical, to do something
> similar for the MR action.  Instead, we think it would be best if we simply
> remove the MR optimization and make it just like the other action types.  I
> know we normally don't want to remove optimizations, but there are many
> advantages in this case, and it's only saving a single Map slot for MR jobs
> only.
>
> I've created OOZIE-1483 <https://issues.apache.org/jira/browse/OOZIE-1483>
> with
> more details and should have a patch soon.
>
> Thoughts?
>
>
> thanks
> - Robert
>

Reply via email to