[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979354#comment-16979354
 ] 

Peter Bacsko commented on OOZIE-3561:
-------------------------------------

So, as we discussed in private, the problem is that the "error" path might lead 
back to the workflow. Usually it's a very short sequence of actions, eg. 
sending an email then kill the execution. When the flow is redirected back to 
the "normal" path from an action node, then essentially every subsequent nodes 
are available from two different paths.

So in your example, "a4" is available in 8 different ways ([ok, ok, ok], [ok, 
ok, error], [ok, error, ok], ... [error, error, error]). So we have an 
exponential runtime, which is pretty sad. I believe we have to use memoization: 
just simply store the nodes that have been already validated. But we have to be 
careful and think about edge cases.

> Forkjoin validation is slow when there are many actions in chain
> ----------------------------------------------------------------
>
>                 Key: OOZIE-3561
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3561
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.1.0
>            Reporter: Denes Bodo
>            Assignee: Denes Bodo
>            Priority: Critical
>              Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to