[ https://issues.apache.org/jira/browse/OOZIE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033243#comment-14033243 ]
Robert Kanter commented on OOZIE-1879: -------------------------------------- You're right, I'll change it to a {{>}} when committing. > Workflow Rerun causes error depending on the order of forked nodes > ------------------------------------------------------------------ > > Key: OOZIE-1879 > URL: https://issues.apache.org/jira/browse/OOZIE-1879 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Priority: Blocker > Attachments: OOZIE-1879.patch > > > Suppose you have a workflow like this: > {noformat} > start --> fork > fork --> shell1, shell2 > shell1 --> join > shell2 --> join > join --> shell3 > shell3 --> end > {noformat} > And all but shell3 are successful. > Assuming you fix the problem with shell3, if you do a rerun, the following > two outcomes can happen: > # If shell1 finished before shell2, then the rerun succeeds > # If shell2 finished before shell1, then the rerun fails > The error in the second outcome is simply this log message: > {noformat} > 2014-05-29 17:17:03,735 ERROR > org.apache.oozie.workflow.lite.LiteWorkflowInstance: > SERVER[cdh5-1.cloudera.local] USER[pdvorak] GROUP[-] TOKEN[] > APP[test-rerun-wf] JOB[0000004-140521220856264-oozie-oozi-W] > ACTION[0000004-140521220856264-oozie-oozi-W@join] invalid execution path > [/shell1/] > {noformat} > After a bunch of digging, I discovered that during a rerun with the above > workflow or similar workflows, LiteWorkflowInstance#signal gets called for > each action in the fork node in the order that they are listed in the fork > node's XML; however, during the original run, LiteWorkflowInstance#signal > gets called for each action in the order that they complete (i.e. endTime). > When these don't match, you get the above error. The general fix for this is > therefore to ensure that during a rerun, LiteWorkflowInstance#signal gets > called for each action in the fork node in the order that they originally ran > in. And if you think about it, that is more correct than the current > behavior anyway. -- This message was sent by Atlassian JIRA (v6.2#6252)