[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Janos Makai updated OOZIE-3670: ------------------------------- Attachment: job.properties > Actions can stuck while running in a Fork-Join workflow > ------------------------------------------------------- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 5.2.1 > Reporter: Janos Makai > Assignee: Janos Makai > Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, > forkjoin.xml, helloworld.sh, job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > <value>${variableThatWillCauseELError}</value> {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)