[ 
https://issues.apache.org/jira/browse/OOZIE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907637#comment-14907637
 ] 

Srikanth Sundarrajan commented on OOZIE-2314:
---------------------------------------------

Good catch [~jaydeepvishwakarma]. Thanks for the patch. A minor nit, have left 
my comments in RB.

> Unable to kill old instance child job by workflow or coord rerun by Launcher
> ----------------------------------------------------------------------------
>
>                 Key: OOZIE-2314
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2314
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Jaydeep Vishwakarma
>            Assignee: Jaydeep Vishwakarma
>            Priority: Blocker
>         Attachments: OOZIE-2314.patch
>
>
> Oozie launcher kills all the child jobs which, launched by an old instance of 
> same launcher, workflow or coord action to avoid the duplicate child running 
> at same. For same it searches the application ids by tag and time, And it 
> kills all AMs. You can find more detail in OOZIE-2129. 
> It works fine when Launcher attempt gets killed and tries again. In case of 
> Yarn container which contains AM get kills due to some reason and we run 
> workflow/coord action this patch does not work.
>    It happens due to a time filter applied during finding the app ids, which 
> always takes the current time from the server.
>    {{LauncherMapperHelper.java}}
>    {code}
>        public static void setupYarnRestartHandling(JobConf launcherJobConf, 
> Configuration actionConf, String launcherTag)
>                throws NoSuchAlgorithmException {
>            
> launcherJobConf.setLong(LauncherMainHadoopUtils.OOZIE_JOB_LAUNCH_TIME, 
> System.currentTimeMillis());
>            // Tags are limited to 100 chars so we need to hash them to make 
> sure (the actionId otherwise doesn't have a max length)
>            String tag = getTag(launcherTag);
>            // keeping the oozie.child.mapreduce.job.tags instead of 
> mapreduce.job.tags to avoid killing launcher itself.
>            // mapreduce.job.tags should only go to child job launch by 
> launcher.
>            actionConf.set(LauncherMainHadoopUtils.CHILD_MAPREDUCE_JOB_TAGS, 
> tag);
>        }
>    {code}
> When a user rerun the workflow or coord action, Launcher picks the current 
> system time along with tags, It searches for running application ids and 
> kills them. It eventually does not find any App Id, As the previous instance 
> of the same workflow/coord ran before the new system time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to