[ 
https://issues.apache.org/jira/browse/OOZIE-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577251#comment-13577251
 ] 

Virag Kothari commented on OOZIE-1205:
--------------------------------------

Good find Robert

On the patch, instead of a new command, can we not just call the failJob() of 
ActionX? That will set the status of job,instance and action to failed and 
queue the WfKill for killing other actions of the job. I believe the new FailX 
command does the same job.
If a new command is added, then recovery for it should also be handled.


                
> If the JobTracker is restarted during a Fork, Oozie doesn't fail all of the 
> currently running actions
> -----------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-1205
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1205
>             Project: Oozie
>          Issue Type: Bug
>          Components: action
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>             Fix For: trunk, 3.3.2
>
>         Attachments: OOZIE-1205.patch
>
>
> If you have a workflow with a fork and restart the JobTracker while its 
> executing the paths in the fork, those two jobs will be lost (as expected).  
> Once the timeout occurs on the {{ActionCheckXCommand}}, it will check both 
> actions sequentially.  While checking the first action, it sets the status to 
> FAILED and also sets the workflow's status to FAILED.  It then moves on to 
> the other action that was running concurrently, but it cannot pass the 
> precondition check because the workflow was already FAILED (the check 
> requires that the Workflow is RUNNING).  It will keep trying this every time 
> the timeout hits (10min is default) and print a WARN message in the log.   
> That action will also be in RUNNING state forever even though the underlying 
> job isn't running and the WF is FAILED.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to