[ 
https://issues.apache.org/jira/browse/OOZIE-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646852#comment-16646852
 ] 

Satish Subhashrao Saley commented on OOZIE-3365:
------------------------------------------------

1.
We skip the nodes here, including the subworkflow as we don't want to create 
new subworkflow on rerun. We just want to rerun the failed actions from that 
subworkflow.
Since we skip, subworkflow node is not deleted.

https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/wf/ReRunXCommand.java#L219-L223

{code}
if (!nodesToSkip.contains(actions.get(i).getName()) &&
 !(conf.getBoolean(OozieClient.RERUN_FAIL_NODES, false) &&
 SubWorkflowActionExecutor.ACTION_TYPE.equals(actions.get(i).getType()))) {
 deleteList.add(actions.get(i));
 LOG.info("Deleting Action[\{0}] for re-run", actions.get(i).getId());

{code}

2.
Here we check if action id exists in database to avoid case of reinserting same 
node multiple times. 
In case of subworkflow action, action id exists (because we did not delete in 
step 1 above) and we skip further processing. This leaves workflow in RUNNING 
state. 
The solution is to add check for subworkflow.


https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/wf/SignalXCommand.java#L369-L380
{code}
if(!skipAction) {
 try {
 // Make sure that transition node for a forked action
 // is inserted only once
 
WorkflowActionQueryExecutor.getInstance().get(WorkflowActionQuery.GET_ACTION_ID_TYPE_LASTCHECK,
 newAction.getId());

continue;
 } catch (JPAExecutorException jee) {
 }
}

{code}

 

> Workflow and Coord Action status remains RUNNING after rerun
> ------------------------------------------------------------
>
>                 Key: OOZIE-3365
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3365
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>
> User reran a workflow job which had subworkflow action. Subworkflow action 
> failed, but the status of Workflow and corresponding coord action was not 
> updated from RUNNING to FAILED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to