Re: Job Recoverability

Rohini Palaniswamy Mon, 05 Aug 2013 17:28:52 -0700

Wouldn't oozie poll for the job status and decide that it has failed and
when JT comes up launch another one if retry is configured?


On Mon, Aug 5, 2013 at 3:11 PM, Robert Kanter <[email protected]> wrote:

> Hi,
>
> We looked into how to support Job Recoverability (i.e. the JT is restarted
> and it wants to restart the jobs that were running; similarly for YARN) and
> have a pretty simple solution for all of the action types except for
> MapReduce.  If we set mapreduce.job.restart.recover=true for the launcher
> job and mapreduce.job.restart.recover=false for the jobs launched by the
> launcher, then when the JT restarts, it will recover the launcher job but
> not the child jobs -- the launcher job will then take care of relaunching
> the child jobs.
>
> For MapReduce, because of the optimization with the id swap, this won't
> work.  It would be very tricky, if it's even practical, to do something
> similar for the MR action.  Instead, we think it would be best if we simply
> remove the MR optimization and make it just like the other action types.  I
> know we normally don't want to remove optimizations, but there are many
> advantages in this case, and it's only saving a single Map slot for MR jobs
> only.
>
> I've created OOZIE-1483 <https://issues.apache.org/jira/browse/OOZIE-1483>
> with
> more details and should have a patch soon.
>
> Thoughts?
>
>
> thanks
> - Robert
>

Re: Job Recoverability

Reply via email to