[ 
https://issues.apache.org/jira/browse/OOZIE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729999#comment-13729999
 ] 

Robert Kanter commented on OOZIE-1025:
--------------------------------------

I looked into this a bit more recently.  

The way Oozie kills a job is to tell Hadoop to kill the launcher job.  The 
launcher job doesn't write the child ids until after they're finished, right 
before the launcher itself finishes (e.g. Pig gets run, then the launcher 
writes the ids of all of the jobs launched by Pig to a file).  However, when 
the launcher job gets killed by Hadoop, it doesn't write the file, which means 
that Oozie doesn't have the child IDs so it can't kill them.  

So in order to get this to work, I think we'd have to do some non-trivial 
refactoring of how the launcher jobs work.  Some ideas I had were:
# Make the launcher job multithreaded so a second thread can go and figure out 
the jobs immediately when their IDs are available and write that to the file 
and keep updating the file.  This way, when the launcher is killed, Oozie will 
have the child IDs (or at least most of them).  This may not be possible for 
all action types.
# This would require a lot of changes and make things really complicated, but 
having the launcher job listen on a port or accept a REST call or something 
similar; instead of asking hadoop to kill the launcher job, Oozie would send it 
a command on that port/REST/etc so that the launcher could take care of more 
"nicely" killing the job, including any children and then itself.  This would 
probably also open up some security concerns.  

I don't really see a clean solution or one that we can easily apply to all 
action types :(
                
> Killing oozie job kills oozie launcher job alone in hadoop.
> -----------------------------------------------------------
>
>                 Key: OOZIE-1025
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1025
>             Project: Oozie
>          Issue Type: Bug
>         Environment: Centos-5.8,Hadoop 2.0.0-cdh4.0.1
>            Reporter: PriyaSundararajan
>
> As per the release  build version: 3.1.3-cdh4.0.1 killing oozie job using 
> kill command kills the oozie launcher job alone in hadoop and all other jobs 
> associated with that workflow is running till it gets complete. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to