Wouldn't oozie poll for the job status and decide that it has failed and when JT comes up launch another one if retry is configured?
On Mon, Aug 5, 2013 at 3:11 PM, Robert Kanter <[email protected]> wrote: > Hi, > > We looked into how to support Job Recoverability (i.e. the JT is restarted > and it wants to restart the jobs that were running; similarly for YARN) and > have a pretty simple solution for all of the action types except for > MapReduce. If we set mapreduce.job.restart.recover=true for the launcher > job and mapreduce.job.restart.recover=false for the jobs launched by the > launcher, then when the JT restarts, it will recover the launcher job but > not the child jobs -- the launcher job will then take care of relaunching > the child jobs. > > For MapReduce, because of the optimization with the id swap, this won't > work. It would be very tricky, if it's even practical, to do something > similar for the MR action. Instead, we think it would be best if we simply > remove the MR optimization and make it just like the other action types. I > know we normally don't want to remove optimizations, but there are many > advantages in this case, and it's only saving a single Map slot for MR jobs > only. > > I've created OOZIE-1483 <https://issues.apache.org/jira/browse/OOZIE-1483> > with > more details and should have a patch soon. > > Thoughts? > > > thanks > - Robert >
