[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-27 Thread Andras Salamon (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983439#comment-16983439
 ] 

Andras Salamon commented on OOZIE-3561:
---

[~dionusos] [~pbacsko] Thanks for the contributions, +1, committed to master.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983437#comment-16983437
 ] 

ASF subversion and git services commented on OOZIE-3561:


Commit 06cf2cf005b3f98bcd40a934a02b530fac07 in oozie's branch 
refs/heads/master from Andras Salamon
[ https://gitbox.apache.org/repos/asf?p=oozie.git;h=06cf2cf ]

OOZIE-3561 Forkjoin validation is slow when there are many actions in chain 
(dionusos, pbacsko via asalamon74)


> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-27 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983414#comment-16983414
 ] 

Peter Bacsko commented on OOZIE-3561:
-

Good to hear.

[~asalamon74] please review patch v3 and commit it if you think it's good.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982620#comment-16982620
 ] 

Hadoop QA commented on OOZIE-3561:
--


Testing JIRA OOZIE-3561

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [4] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:red}-1{color} There are [4] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.java/io/File.(Ljava/lang/String;Ljava/lang/String;)V reads a 
file whose location might be specified by user input: At 
BulkJPAExecutor.java:[line 206]
.At AuthorizationService.java:[line 189]: At 
AuthorizationService.java:[line 192]
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3197
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/1254/



> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does d

[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982576#comment-16982576
 ] 

Hadoop QA commented on OOZIE-3561:
--


Testing JIRA OOZIE-3561

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [6] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:red}-1{color} There are [6] new bugs found below threshold in 
[core] that must be fixed, listing only the first [5] ones.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The top [5] most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206]
.At BulkJPAExecutor.java:[line 111]: At BulkJPAExecutor.java:[line 127]
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:orange}0{color} There are [4] new bugs found in [server] that would 
be nice to have fixed.
.You can find the SpotBugs diff here: server/findbugs-new.html
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [webapp].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3197
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/1252/



> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 8

[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-26 Thread Denes Bodo (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982500#comment-16982500
 ] 

Denes Bodo commented on OOZIE-3561:
---

Thank you for the fix [~pbacsko] . I managed to check your change with the 
original WF which caused the slowness and that worked well and quick. 
Unfortunately that wf contains sensitive data so I could not share here but the 
provided 80-action wf is almost the same as the original one.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982493#comment-16982493
 ] 

Hadoop QA commented on OOZIE-3561:
--

PreCommit-OOZIE-Build started


> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561-004.patch, OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982457#comment-16982457
 ] 

Hadoop QA commented on OOZIE-3561:
--

PreCommit-OOZIE-Build started


> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561-003.patch, 
> OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981562#comment-16981562
 ] 

Hadoop QA commented on OOZIE-3561:
--


Testing JIRA OOZIE-3561

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:red}-1{color} the patch contains 5 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [4] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:red}-1{color} There are [4] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.java/io/File.(Ljava/lang/String;Ljava/lang/String;)V reads a 
file whose location might be specified by user input: At 
BulkJPAExecutor.java:[line 206]
.At AuthorizationService.java:[line 189]: At 
AuthorizationService.java:[line 192]
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [webapp].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3196
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/1251/



> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> t

[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981482#comment-16981482
 ] 

Hadoop QA commented on OOZIE-3561:
--

PreCommit-OOZIE-Build started


> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981480#comment-16981480
 ] 

Peter Bacsko commented on OOZIE-3561:
-

_"Just store the {{NodeDef}} object in the set, not a string. That should 
exhibit the exact same behavior."_

After doing this locally, it turned out that this alone is not sufficient. We 
have to:
 # Move the memoization part a bit
 # Don't store End and Join nodes

Now all existing tests pass. 

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561-002.patch, OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980123#comment-16980123
 ] 

Peter Bacsko commented on OOZIE-3561:
-

[~dionusos] thanks for the patch, I believe this is the approach that we need.
As we discussed in person, let's improve this further:

1. Just store the {{NodeDef}} object in the set, not a string. That should 
exhibit the exact same behavior.
2. Call the set sth like "seenNodes" or "visitedNodes".
3. Next week, let's come up with some more edge cases, eg. "errorTo" of a node 
inside a fork points to a node which is located in another fork.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980108#comment-16980108
 ] 

Hadoop QA commented on OOZIE-3561:
--


Testing JIRA OOZIE-3561

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:red}-1{color} the patch contains 2 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [6] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:red}-1{color} There are [6] new bugs found below threshold in 
[core] that must be fixed, listing only the first [5] ones.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The top [5] most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206]
.At BulkJPAExecutor.java:[line 111]: At BulkJPAExecutor.java:[line 127]
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [webapp].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3196
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://builds.apache.org/job/PreCommit-OOZIE-Build/1250/



> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does

[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-22 Thread Andras Salamon (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980093#comment-16980093
 ] 

Andras Salamon commented on OOZIE-3561:
---

[~dionusos] Just a quick question about the {{family}} variable:
{noformat}
String family = parents + nodeName;
if (investigateds.contains(family)) {
return;
} else {
investigateds.add(family);
}
{noformat}
What if I have a node "a" and a "b" after that on the same path and also have a 
different "ab" node. I think after investigating "a" and "b", the investigateds 
will contain "ab" and the algorithm will skip node "ab".

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980047#comment-16980047
 ] 

Hadoop QA commented on OOZIE-3561:
--

PreCommit-OOZIE-Build started


> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
> Attachments: OOZIE-3561_001.patch
>
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-21 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979354#comment-16979354
 ] 

Peter Bacsko commented on OOZIE-3561:
-

So, as we discussed in private, the problem is that the "error" path might lead 
back to the workflow. Usually it's a very short sequence of actions, eg. 
sending an email then kill the execution. When the flow is redirected back to 
the "normal" path from an action node, then essentially every subsequent nodes 
are available from two different paths.

So in your example, "a4" is available in 8 different ways ([ok, ok, ok], [ok, 
ok, error], [ok, error, ok], ... [error, error, error]). So we have an 
exponential runtime, which is pretty sad. I believe we have to use memoization: 
just simply store the nodes that have been already validated. But we have to be 
careful and think about edge cases.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-21 Thread Denes Bodo (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979311#comment-16979311
 ] 

Denes Bodo commented on OOZIE-3561:
---

I managed to reproduce the issue by using only unit tests:
 # Create a workflow with similar content:
{noformat}









{noformat}
 # create a test like:
{code:java}
public void test40ActionsInARow() throws WorkflowException, IOException {
LiteWorkflowAppParser parser = newLiteWorkflowAppParser();
try {

parser.validateAndParse(IOUtils.getResourceAsReader(
"wf-actions-40.xml", -1), new Configuration());
} catch (final WorkflowException e) {
e.printStackTrace();
Assert.fail("This workflow has to be correct.");
}
}
{code}

With 40 actions, the check couldn't finish within 10 minutes.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-21 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979310#comment-16979310
 ] 

Peter Bacsko commented on OOZIE-3561:
-

I refactored the validator 3 years ago, so I had to check it again how it works:

1. Basic validation makes sure that the workflow is acyclic. That's definitely 
fast.
2. Fork-join validation: it was more tricky. Multiple fork-joins did cause 
problems because paths were re-walked unnecessarily - this had exponential 
runtime with regards to the number of fork-join pairs. However, OOZIE-1978 made 
sure that no unnecessary walks take place by making sure that we stop the 
recursion when we encounter a join. 

Right now I don't see what could go wrong.

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-21 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979305#comment-16979305
 ] 

Peter Bacsko commented on OOZIE-3561:
-

[~dionusos] I don't exactly understand the theory. In your example, you have a 
graph of 80 nodes, which is basically a list without forks. There's no way that 
the runtime is O(n!). What do the nodes represent in your example? What is 
"a1", "a2", etc?

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

2019-11-21 Thread Denes Bodo (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979281#comment-16979281
 ] 

Denes Bodo commented on OOZIE-3561:
---

cc [~pbacsko], [~rkanter]

> Forkjoin validation is slow when there are many actions in chain
> 
>
> Key: OOZIE-3561
> URL: https://issues.apache.org/jira/browse/OOZIE-3561
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.1.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from 
> the start node and runs in time of n! . This is confirmed as when we split 
> this huge workflow into two 40-element workflow then we get 2x ~40!-step in 
> validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)