[ https://issues.apache.org/jira/browse/OOZIE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907637#comment-14907637 ]
Srikanth Sundarrajan commented on OOZIE-2314: --------------------------------------------- Good catch [~jaydeepvishwakarma]. Thanks for the patch. A minor nit, have left my comments in RB. > Unable to kill old instance child job by workflow or coord rerun by Launcher > ---------------------------------------------------------------------------- > > Key: OOZIE-2314 > URL: https://issues.apache.org/jira/browse/OOZIE-2314 > Project: Oozie > Issue Type: Bug > Reporter: Jaydeep Vishwakarma > Assignee: Jaydeep Vishwakarma > Priority: Blocker > Attachments: OOZIE-2314.patch > > > Oozie launcher kills all the child jobs which, launched by an old instance of > same launcher, workflow or coord action to avoid the duplicate child running > at same. For same it searches the application ids by tag and time, And it > kills all AMs. You can find more detail in OOZIE-2129. > It works fine when Launcher attempt gets killed and tries again. In case of > Yarn container which contains AM get kills due to some reason and we run > workflow/coord action this patch does not work. > It happens due to a time filter applied during finding the app ids, which > always takes the current time from the server. > {{LauncherMapperHelper.java}} > {code} > public static void setupYarnRestartHandling(JobConf launcherJobConf, > Configuration actionConf, String launcherTag) > throws NoSuchAlgorithmException { > > launcherJobConf.setLong(LauncherMainHadoopUtils.OOZIE_JOB_LAUNCH_TIME, > System.currentTimeMillis()); > // Tags are limited to 100 chars so we need to hash them to make > sure (the actionId otherwise doesn't have a max length) > String tag = getTag(launcherTag); > // keeping the oozie.child.mapreduce.job.tags instead of > mapreduce.job.tags to avoid killing launcher itself. > // mapreduce.job.tags should only go to child job launch by > launcher. > actionConf.set(LauncherMainHadoopUtils.CHILD_MAPREDUCE_JOB_TAGS, > tag); > } > {code} > When a user rerun the workflow or coord action, Launcher picks the current > system time along with tags, It searches for running application ids and > kills them. It eventually does not find any App Id, As the previous instance > of the same workflow/coord ran before the new system time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)