[ https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396859#comment-15396859 ]
Hadoop QA commented on OOZIE-1978: ---------------------------------- Testing JIRA OOZIE-1978 Cleaning local git workspace ---------------------------- {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} . {color:green}+1{color} the patch does not introduce any @author tags . {color:green}+1{color} the patch does not introduce any tabs . {color:green}+1{color} the patch does not introduce any trailing spaces . {color:green}+1{color} the patch does not introduce any line longer than 132 . {color:green}+1{color} the patch does adds/modifies 4 testcase(s) {color:green}+1 RAT{color} . {color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} . {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:red}-1 COMPILE{color} . {color:red}-1{color} HEAD does not compile . {color:red}-1{color} patch does not compile . {color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} . {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . {color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} - patch does not compile, cannot run testcases {color:green}+1 DISTRO{color} . {color:green}+1{color} distro tarball builds with the patch ---------------------------- {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/3081/ > Forkjoin validation code is ridiculously slow in some cases > ----------------------------------------------------------- > > Key: OOZIE-1978 > URL: https://issues.apache.org/jira/browse/OOZIE-1978 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: trunk, 4.0.1 > Reporter: Robert Kanter > Assignee: Peter Bacsko > Fix For: trunk > > Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, > OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, > OOZIE-1978-005.patch, OOZIE-1978_wip.001.patch, workflow.xml > > > We've had a few users who have run into problems where submitting a workflow > appears to hang (in the case of a subworkflow, it's similar but stuck in > PREP). It turns out that if you wait long enough, it will actually go > through and the workflow will run normally. The problem is that the forkjoin > validation code is taking a really long time. > The attached example has a series of 20 forks where each fork has 6 actions > (it's based on an actual workflow, but all of the names were changed and the > actions were all replaced by simple shell actions). One of our support guys > said it took 1-2 hours , but on my computer it was taking {color:red}*15+ > hours*{color} (I had to cancel it) > While this example doesn't have any nested forks, those can also take a long > time too. > It's easy to verify that it's the forkjoin validation code that's taking so > long by looking at a jstack of the Oozie server and seeing deep recursive > calls to > {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}. I > also noticed a lot of sitting around in calls LinkedList.contains. > I think we have 3 options: > # See if we can make the existing code faster somehow. Perhaps there's a way > to parallelize it? Maybe there's some redundant checking that we can > identify and skip? Change some data structures? etc > # See if we can write a new way to do this validation. I had originally > completely rewritten this code a while ago, and we've since made a few fixes > to catch edge cases and things. Perhaps it needs another rewrite? > # Try to identify when it's taking a long time and at least let the user know > what's happening or something. Right now, it just appears that the Oozie CLI > has hung and the job doesn't show up in the Oozie server. Most users aren't > going to wait more than a minute or two. -- This message was sent by Atlassian JIRA (v6.3.4#6332)