[
https://issues.apache.org/jira/browse/OOZIE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaydeep Vishwakarma reassigned OOZIE-2314:
------------------------------------------
Assignee: Jaydeep Vishwakarma
> Unable to kill old instance child job by workflow or coord rerun by Launcher
> ----------------------------------------------------------------------------
>
> Key: OOZIE-2314
> URL: https://issues.apache.org/jira/browse/OOZIE-2314
> Project: Oozie
> Issue Type: Bug
> Reporter: Jaydeep Vishwakarma
> Assignee: Jaydeep Vishwakarma
>
> Oozie launcher kills all the child jobs which, launched by an old instance of
> same launcher, workflow or coord action to avoid the duplicate child running
> at same. For same it searches the application ids by tag and time, And it
> kills all AMs. You can find more detail in OOZIE-2129.
> It works fine when Launcher attempt gets killed and tries again. In case of
> Yarn container which contains AM get kills due to some reason and we run
> workflow/coord action this patch does not work.
> It happens due to a time filter applied during finding the app ids, which
> always takes the current time from the server.
> {{LauncherMapperHelper.java}}
> {code}
> public static void setupYarnRestartHandling(JobConf launcherJobConf,
> Configuration actionConf, String launcherTag)
> throws NoSuchAlgorithmException {
>
> launcherJobConf.setLong(LauncherMainHadoopUtils.OOZIE_JOB_LAUNCH_TIME,
> System.currentTimeMillis());
> // Tags are limited to 100 chars so we need to hash them to make
> sure (the actionId otherwise doesn't have a max length)
> String tag = getTag(launcherTag);
> // keeping the oozie.child.mapreduce.job.tags instead of
> mapreduce.job.tags to avoid killing launcher itself.
> // mapreduce.job.tags should only go to child job launch by
> launcher.
> actionConf.set(LauncherMainHadoopUtils.CHILD_MAPREDUCE_JOB_TAGS,
> tag);
> }
> {code}
> When a user rerun the workflow or coord action, Launcher picks the current
> system time along with tags, It searches for running application ids and
> kills them. It eventually does not find any App Id, As the previous instance
> of the same workflow/coord ran before the new system time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)