[ 
https://issues.apache.org/jira/browse/OOZIE-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647181#comment-16647181
 ] 

Satish Subhashrao Saley commented on OOZIE-3366:
------------------------------------------------

I co-related the logs and the part of code, it seems we are not suspending the 
parent WF if subworkflow gets suspended.

Logs:

{code}
2018-04-23 02:15:25,620 WARN ActionStartXCommand:523 [pool-12-thread-224] - 
SERVER[wf322] USER[saley] GROUP[users] TOKEN[] APP[saleyapp] 
JOB[123-123-oozie-saley--W] ACTION[123-123-oozie-saley--W@saleyapp] Error 
starting action [saleyapp]. ErrorType [NON_TRANSIENT], ErrorCode [JA002], 
Message [JA002: User: wrkflow is not allowed to impersonate saley]
2018-04-23 02:15:25,620 WARN ActionStartXCommand:523 [pool-12-thread-224] - 
SERVER[wf322] USER[saley] GROUP[users] TOKEN[] APP[saleyapp] 
JOB[123-123-oozie-saley--W] ACTION[123-123-oozie-saley--W@saleyapp] Suspending 
Workflow Job id=123-123-oozie-saley--W
2018-04-23 02:15:25,622 DEBUG LiteWorkflowInstance:526 [pool-12-thread-224] - 
SERVER[wf322] USER[saley] GROUP[users] TOKEN[] APP[saleyapp] 
JOB[123-123-oozie-saley--W] ACTION[123-123-oozie-saley--W@saleyapp] Suspending 
job
{code}

While starting the action, we get non transient exception.
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/wf/ActionStartXCommand.java#L290-L305
{code}
ActionStartXCommand.java

catch (ActionExecutorException ex) {
 LOG.warn("Error starting action [\{0}]. ErrorType [\{1}], ErrorCode [\{2}], 
Message [\{3}]",
 wfAction.getName(), ex.getErrorType(), ex.getErrorCode(), ex.getMessage(), ex);
 wfAction.setErrorInfo(ex.getErrorCode(), ex.getMessage());
 switch (ex.getErrorType()) {
 case TRANSIENT:
 if (!handleTransient(context, executor, WorkflowAction.Status.START_RETRY)) {
 handleNonTransient(context, executor, WorkflowAction.Status.START_MANUAL);
 wfAction.setPendingAge(new Date());
 wfAction.setRetries(0);
 wfAction.setStartTime(null);
 }
 break;
 case NON_TRANSIENT:
 handleNonTransient(context, executor, WorkflowAction.Status.START_MANUAL);
{code}

We put the workflow action in START_MANUAL and suspend the workflow. 
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/wf/ActionXCommand.java#L125-L144
{code}
ActionXCommand.java

protected void handleNonTransient(ActionExecutor.Context context, 
ActionExecutor executor,WorkflowAction.Status status) throws CommandException {
 ActionExecutorContext aContext = (ActionExecutorContext) context;
 WorkflowActionBean action = (WorkflowActionBean) aContext.getAction();
 incrActionErrorCounter(action.getType(), "nontransient", 1);
 WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
 String id = workflow.getId();
 action.setStatus(status);
 action.resetPendingOnly();
 LOG.warn("Suspending Workflow Job id=" + id);
 try {
 SuspendXCommand.suspendJob(Services.get().get(JPAService.class), workflow, id, 
action.getId(), null);
 }
 catch (Exception e) {
 throw new CommandException(ErrorCode.E0727, id, e.getMessage());
 }
 finally {
 updateParentIfNecessary(workflow, 3);
 }
 }
{code}

While updating the parent's status, we don't consider the case where a 
workflow's parent can be another workflow.
https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/wf/WorkflowXCommand.java#L92-L97
{code}
WorkflowXCommand.java

protected void updateParentIfNecessary(WorkflowJobBean wfjob, int maxRetries) 
throws CommandException {
 // update coordinator action if the wf was actually started by a coord
 if (wfjob.getParentId() != null && wfjob.getParentId().contains("-C@")) {
 new CoordActionUpdateXCommand(wfjob, maxRetries).call();
 }
 }
{code}

> Update workflow status and subworkflow status on suspend command
> ----------------------------------------------------------------
>
>                 Key: OOZIE-3366
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3366
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>
> Currently, when subworkflow gets suspended, its corresponding workflow status 
> is not updated correctly. Also, when a coord is suspended, the subworkflows 
> are not suspended. We need to fix this.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to