Robert, Thats a break in backward compatibility. Till now user are used to click on to link to go to MR page.
Is there a better way to handle this? Thanks, Mayank On Tue, Aug 6, 2013 at 10:42 AM, Robert Kanter <[email protected]> wrote: > Mona, > As far as I'm aware, the "retry" that Oozie is doing is just retrying to > connect to the JT (which is why when the JT comes back up, Oozie > can continue monitoring the hadoop job if it still has the same ID); it > doesn't try to submit the job again as part of the "retry". > > Mayank, > We can put the ID for the actual job in the Child IDs tab (like with Pig). > > > - Robert > > > On Tue, Aug 6, 2013 at 10:41 AM, Mayank Bansal <[email protected]> wrote: > > > I agree , we should handle these two scenarios, I am ok with changing the > > launcher behavior for MR however if we remove the id swap then how we > > nevigate to MR jobs from UI as we do right now? > > > > Thanks, > > Mayank > > > > > > On Tue, Aug 6, 2013 at 10:24 AM, Robert Kanter <[email protected]> > > wrote: > > > > > Suppose we leave the MR ID swap thing as is but set the launcher > recover > > to > > > 0 and job to 1; then consider these two scenarios: > > > > > > 1. JT gets restarted during the launcher job but before the launcher > job > > > actually launches the real job: > > > - The launcher job won't be recovered because we told it not to > > > - The real job was never launched > > > ---> Action never completes and Oozie marks it as failed > > > > > > 2. Launcher job submits the real job, but JT gets restarted before the > > > Oozie server has a chance to swap IDs (its not an atomic operation): > > > - The launcher job won't be recovered because we told it not to > > > - The real job will be recovered and finish successfully > > > ---> Oozie marks the action as failed even though the actual job > > > succeeded because it didn't know about the ID swap > > > > > > It would only work for the case where the JT gets restarted after the > ID > > > swap occurs. > > > > > > > > > - Robert > > > > > > > > > On Tue, Aug 6, 2013 at 10:17 AM, Mayank Bansal <[email protected]> > > wrote: > > > > > > > Hi Robert, > > > > > > > > +1 for oozie to set launcher to 1 and 0 to jobs for recovery in all > the > > > > cases except MR. > > > > > > > > As after Id swapped Oozie only know about MR job isn't it? then there > > > > should not be any problem. > > > > > > > > If we set MR launcher recover to 0 and job to 1 then job will be > > succeded > > > > in case of JT restart. > > > > > > > > AM I missing something? > > > > > > > > Thanks, > > > > Mayank > > > > > > > > > > > > > > > > > > > > On Tue, Aug 6, 2013 at 9:59 AM, Robert Kanter <[email protected]> > > > > wrote: > > > > > > > > > I think you usually just get the "Unknown Hadoop Job" error message > > > > because > > > > > Oozie tries to look up the Hadoop Job ID it already has, but the JT > > no > > > > > longer has that ID because it was restarted. With JT > Recoverability > > > > turned > > > > > on, it will restart the job using the same ID, so Oozie continues > > just > > > > > fine. > > > > > > > > > > - Robert > > > > > > > > > > > > > > > On Mon, Aug 5, 2013 at 5:27 PM, Rohini Palaniswamy > > > > > <[email protected]>wrote: > > > > > > > > > > > Wouldn't oozie poll for the job status and decide that it has > > failed > > > > and > > > > > > when JT comes up launch another one if retry is configured? > > > > > > > > > > > > On Mon, Aug 5, 2013 at 3:11 PM, Robert Kanter < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > We looked into how to support Job Recoverability (i.e. the JT > is > > > > > > restarted > > > > > > > and it wants to restart the jobs that were running; similarly > for > > > > YARN) > > > > > > and > > > > > > > have a pretty simple solution for all of the action types > except > > > for > > > > > > > MapReduce. If we set mapreduce.job.restart.recover=true for > the > > > > > launcher > > > > > > > job and mapreduce.job.restart.recover=false for the jobs > launched > > > by > > > > > the > > > > > > > launcher, then when the JT restarts, it will recover the > launcher > > > job > > > > > but > > > > > > > not the child jobs -- the launcher job will then take care of > > > > > relaunching > > > > > > > the child jobs. > > > > > > > > > > > > > > For MapReduce, because of the optimization with the id swap, > this > > > > won't > > > > > > > work. It would be very tricky, if it's even practical, to do > > > > something > > > > > > > similar for the MR action. Instead, we think it would be best > if > > > we > > > > > > simply > > > > > > > remove the MR optimization and make it just like the other > action > > > > > types. > > > > > > I > > > > > > > know we normally don't want to remove optimizations, but there > > are > > > > many > > > > > > > advantages in this case, and it's only saving a single Map slot > > for > > > > MR > > > > > > jobs > > > > > > > only. > > > > > > > > > > > > > > I've created OOZIE-1483 < > > > > > > https://issues.apache.org/jira/browse/OOZIE-1483> > > > > > > > with > > > > > > > more details and should have a patch soon. > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > thanks > > > > > > > - Robert > > > > > > > > > > > > > > > > > > > > > > > > > > > >
