[jira] [Commented] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14966019#comment-14966019 ] Mona Chitnis commented on OOZIE-1976: - Thanks [~puru] for your patch. I did a first pass as well and have few comments. Waiting for your replies > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Purshotam Shah > Fix For: trunk > > Attachments: Input-check.docx, OOZIE-1976-WIP.patch, > OOZIE-1976-rough-design-2.pdf, OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 38474: OOZIE-1976- Specifying coordinator input datasets in more logical ways
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/38474/#review103325 --- Having a little trouble zero-ing in on the code that checks that logically the dependencies have been met. Can you point me to that class? Or is that abstracted by the JexlEngine? (I'm not familiar with it) client/src/main/resources/oozie-coordinator-0.5.xsd (line 110) <https://reviews.apache.org/r/38474/#comment161313> there is through 'maxOccurs'. but I'm thinking this patch is supporting any arbitrary depth client/src/main/resources/oozie-coordinator-0.5.xsd (line 126) <https://reviews.apache.org/r/38474/#comment161314> what is 'combine' used for? core/src/main/java/org/apache/oozie/CoordinatorActionBean.java (line 172) <https://reviews.apache.org/r/38474/#comment161315> this is part of a different change core/src/main/java/org/apache/oozie/CoordinatorActionBean.java (line 859) <https://reviews.apache.org/r/38474/#comment161316> Log the exception here core/src/main/java/org/apache/oozie/CoordinatorActionBean.java (line 882) <https://reviews.apache.org/r/38474/#comment161318> same as above, log the exception here core/src/main/java/org/apache/oozie/command/coord/CoordActionInputCheckXCommand.java (line 155) <https://reviews.apache.org/r/38474/#comment161319> let's make "input-check" a private static final String element and used in the multiple places in the code, so its a single place in case the name changes later core/src/main/java/org/apache/oozie/command/coord/CoordPushDependencyCheckXCommand.java (line 139) <https://reviews.apache.org/r/38474/#comment161320> typo availableList core/src/main/java/org/apache/oozie/coord/dependency/CoordDependenciesInputCheck.java (line 75) <https://reviews.apache.org/r/38474/#comment161322> ditto core/src/main/java/org/apache/oozie/coord/dependency/CoordDependency.java (line 242) <https://reviews.apache.org/r/38474/#comment161323> doesn't close() and flush() on the enclosing OutputStream suffice and close/flush the enclosed stream too? Safeguard against closing a stream that's already closed core/src/main/java/org/apache/oozie/coord/dependency/CoordInputCheckerPhaseOne.java (line 97) <https://reviews.apache.org/r/38474/#comment161324> typo getFirst (not getFist) - Mona Chitnis On Sept. 18, 2015, 12:20 a.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/38474/ > --- > > (Updated Sept. 18, 2015, 12:20 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1976 > https://issues.apache.org/jira/browse/OOZIE-1976 > > > Repository: oozie-git > > > Description > --- > > There are three components in this patch > > 1. User interface > A new tag is added to coordinator.xml > ex. > > > > " > > > > > > > > ; > > > > input-check will have nested and/or/combine operation. It can have min and > wait at operator or at date-in. > If input-check tag is missing then it consider to be old approach where all > data dependency are needed. > > 2. Processing > input-check is converted into logical expression > (a&&B)||(c&&d) > We use jexl to parse the logical expression. > > There are three phase in parsing. > phase 1 : only resolved dataset are parsed ( only current). > phase 2 : once all current are resolved, then future/latest are parsed. > phase 3 : Doesn't do any filecheck, just return what is being parsed by > phase1 and phase2. Is used for EL functions > > > 3. Storage. > if inputcheck is enable, push_missing_dependencies and missing_dependencies > are serialized and stored in DB. > If then not then it's old approach, where they are stored in plan text. This > is backward compatible. > > > Diffs > - > > client/src/main/resources/oozie-coordinator-0.5.xsd > e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 > core/pom.xml ca40e2e22293a3df2841764ce725420857425139 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java > 188b70e2e76858228b4d42e5798952383719a93d > core/src/main/java/org/apache/oozie/action/ActionExecutor.java > ff83
[jira] [Commented] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513118#comment-14513118 ] Mona Chitnis commented on OOZIE-1976: - Thanks for taking it up Jaydeep. I will keep a watch on this jira when it's ready for review > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Jaydeep Vishwakarma > Fix For: trunk > > Attachments: OOZIE-1976-WIP.patch, OOZIE-1976-rough-design-2.pdf, > OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: VOTE Release Oozie 4.1.0 (candidate 1)
Downloaded the tarball, built and installed Oozie (-DskipTests) and ran an example M-R Oozie job against my Hadoop 2.4 cluster. +1 for that. Was unable to verify the md5 and gpg signatures. I dont find the key used for signing the page in the list of public keys on the KEYS page. Please let me know if I'm missing the right procedure. Regards, Mona Chitnis On Monday, November 17, 2014 8:08 AM, Shwetha GS wrote: +1 On Fri, Nov 14, 2014 at 6:49 AM, bowen zhang < bowenzhang...@yahoo.com.invalid> wrote: > Hi, > > I have created a build for Oozie 4.1.0, candidate 1. > > Keys to verify the signature of the release artifact are available at > > http://www.apache.org/dist/oozie/KEYS > > Please download, test, and try it out: > > http://people.apache.org/~bzhang/oozie-4.1.0-rc1 > > The release, md5 signature, gpg signature, and rat report can all > be found at the above address. > > Vote closes on Monday EOD, the 17th. > > Bowen > -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
[jira] [Commented] (OOZIE-1913) Devise a way to turn off SLA alerts for bundle/coordinator flexibly
[ https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195508#comment-14195508 ] Mona Chitnis commented on OOZIE-1913: - Reviewboard revision is fairly up-to-date except a couple of unit tests. I will be updating that and then would appreciate a review > Devise a way to turn off SLA alerts for bundle/coordinator flexibly > --- > > Key: OOZIE-1913 > URL: https://issues.apache.org/jira/browse/OOZIE-1913 > Project: Oozie > Issue Type: Improvement >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > > From user: > Need to turn off the SLA miss alerts in jobs when the bundle is suspended for > grid upgrades and similar work so that when it's resumed we aren't flooded > with a bunch of alerts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2034) Disable SSLv3 (POODLEbleed vulnerability)
[ https://issues.apache.org/jira/browse/OOZIE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183087#comment-14183087 ] Mona Chitnis commented on OOZIE-2034: - +1. Pretty straightforward. Thanks for checking the bit about support of TLSv1, not TLSv1.1. Can you paste your doc references here for record? > Disable SSLv3 (POODLEbleed vulnerability) > - > > Key: OOZIE-2034 > URL: https://issues.apache.org/jira/browse/OOZIE-2034 > Project: Oozie > Issue Type: Bug > Components: security >Affects Versions: 4.0.1 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Fix For: 4.1.0 > > Attachments: OOZIE-2034.patch, OOZIE-2034.patch > > > We should disable SSLv3 to protect against the POODLEbleed vulnerability. > See > [CVE-2014-3566|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] > We have {{sslProtocol="TLS"}} set to only allow TLS in ssl-server.xml, but > when I checked, I could still connect with SSLv3. From what I can tell, > there's some ambiguity in the tomcat configs between {{sslProtocol}}, > {{sslProtocols}}, and {{sslEnabledProtocols}} so we probably have the wrong > thing here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2034) Disable SSLv3 (POODLEbleed vulnerability)
[ https://issues.apache.org/jira/browse/OOZIE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183074#comment-14183074 ] Mona Chitnis commented on OOZIE-2034: - starting to look at this now.. > Disable SSLv3 (POODLEbleed vulnerability) > - > > Key: OOZIE-2034 > URL: https://issues.apache.org/jira/browse/OOZIE-2034 > Project: Oozie > Issue Type: Bug > Components: security >Affects Versions: 4.0.1 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Fix For: 4.1.0 > > Attachments: OOZIE-2034.patch, OOZIE-2034.patch > > > We should disable SSLv3 to protect against the POODLEbleed vulnerability. > See > [CVE-2014-3566|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] > We have {{sslProtocol="TLS"}} set to only allow TLS in ssl-server.xml, but > when I checked, I could still connect with SSLv3. From what I can tell, > there's some ambiguity in the tomcat configs between {{sslProtocol}}, > {{sslProtocols}}, and {{sslEnabledProtocols}} so we probably have the wrong > thing here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/ --- (Updated Oct. 20, 2014, 11:20 p.m.) Review request for oozie. Changes --- review comments addressed. minor changes required in 2 unit tests and will update that next Bugs: OOZIE-1913 https://issues.apache.org/jira/browse/OOZIE-1913 Repository: oozie-git Description --- See Jira Diffs (updated) - client/src/main/java/org/apache/oozie/cli/OozieCLI.java 9c2d14b client/src/main/java/org/apache/oozie/client/OozieClient.java 5e53a18 client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java 801ad7e client/src/main/java/org/apache/oozie/client/rest/RestConstants.java 4cc6606 core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 759e643 core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 2362084 core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java 070cee5 core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java de78ab7 core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java 05b7a62 core/src/main/java/org/apache/oozie/coord/CoordUtils.java 4643d73 core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java e6ab09b core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 4bccef4 core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java c19839f core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java ee1085a core/src/main/java/org/apache/oozie/service/EventHandlerService.java 244c048 core/src/main/java/org/apache/oozie/servlet/BaseJobServlet.java c94d1e2 core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 2578e41 core/src/main/java/org/apache/oozie/servlet/V0JobServlet.java b160b46 core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 8dc9608 core/src/main/java/org/apache/oozie/servlet/V2JobServlet.java da81b49 core/src/main/java/org/apache/oozie/sla/BundleChangeSlaXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/BundleDisableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/BundleEnableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/CoordChangeSlaXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/CoordDisableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/CoordEnableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 189d5ea core/src/main/java/org/apache/oozie/sla/SLACalculator.java 20f93b5 core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 188144e core/src/main/java/org/apache/oozie/sla/SLAOperations.java f5fc826 core/src/main/java/org/apache/oozie/sla/service/SLAService.java 89615bc core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java 7c2620c core/src/main/resources/oozie-default.xml 26eb7e0 core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java f13e48f core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java ae3f18d core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java 30fd151 core/src/test/java/org/apache/oozie/servlet/DagServletTestCase.java 48193c7 core/src/test/java/org/apache/oozie/servlet/TestV2JobServlet.java fb203a6 core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java db3f6eb core/src/test/java/org/apache/oozie/store/TestCoordinatorStore.java b8b2405 Diff: https://reviews.apache.org/r/24487/diff/ Testing --- unit tests added, e-2-e test with CLI command done Thanks, Mona Chitnis
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
> On Oct. 17, 2014, 7:38 p.m., Rohini Palaniswamy wrote: > > core/src/main/java/org/apache/oozie/coord/CoordUtils.java, lines 146-147 > > <https://reviews.apache.org/r/24487/diff/4/?file=692718#file692718line146> > > > > What happens to other commands? other commands calling this util method - kill and rerun. in both cases, we should allow superset of action and ability to skip over if all actions in the range are not there. > On Oct. 17, 2014, 7:38 p.m., Rohini Palaniswamy wrote: > > core/src/main/java/org/apache/oozie/coord/CoordUtils.java, line 258 > > <https://reviews.apache.org/r/24487/diff/4/?file=692718#file692718line258> > > > > private referenced in another class too - Mona --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review57158 ------- On Sept. 17, 2014, 6:59 p.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Sept. 17, 2014, 6:59 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java f3ffd1f > client/src/main/java/org/apache/oozie/client/OozieClient.java d6ff2d0 > > client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java > 801ad7e > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 4b393c8 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java cc5596b > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 71a9ab4 > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 070cee5 > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > de78ab7 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > 05b7a62 > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 4643d73 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 0aee0e4 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 2c9e00e > core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java c19839f > > core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java > ee1085a > core/src/main/java/org/apache/oozie/service/EventHandlerService.java > 244c048 > core/src/main/java/org/apache/oozie/servlet/BaseJobServlet.java 11835ed > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 2578e41 > core/src/main/java/org/apache/oozie/servlet/V0JobServlet.java eb699e6 > core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 396661a > core/src/main/java/org/apache/oozie/servlet/V2JobServlet.java de4f865 > core/src/main/java/org/apache/oozie/sla/BundleDisableSlaAlertsXCommand.java > PRE-CREATION > core/src/main/java/org/apache/oozie/sla/BundleEnableSlaAlertsXCommand.java > PRE-CREATION > core/src/main/java/org/apache/oozie/sla/CoordDisableSlaAlertsXCommand.java > PRE-CREATION > core/src/main/java/org/apache/oozie/sla/CoordEnableSlaAlertsXCommand.java > PRE-CREATION > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 189d5ea > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 20f93b5 > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java cdf8b73 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java f5fc826 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 89615bc > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > 7c2620c > core/src/main/resources/oozie-default.xml 6a91dc6 > > core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java > f13e48f > core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java ae3f18d > core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java > 30fd151 > core/src/test/java/org/apache/oozie/servlet/DagServletTestCase.java 48193c7 > core/src/test/java/org/apache/oozie/servlet/TestV2JobServlet.java db9c594 > core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java > db3f6eb > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > unit tests added, e-2-e test with CLI command done > > > Thanks, > > Mona Chitnis > >
[jira] [Commented] (OOZIE-1954) Add a way for the MapReduce action to be configured by Java code
[ https://issues.apache.org/jira/browse/OOZIE-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153721#comment-14153721 ] Mona Chitnis commented on OOZIE-1954: - Good work Robert! > Add a way for the MapReduce action to be configured by Java code > > > Key: OOZIE-1954 > URL: https://issues.apache.org/jira/browse/OOZIE-1954 > Project: Oozie > Issue Type: New Feature >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: trunk > > Attachments: OOZIE-1954.patch, OOZIE-1954.patch, OOZIE-1954.patch > > > With certain other components (e.g. Avro, HFileOutputFormat (HBase), etc), it > becomes impractical to use the MapReduce action and users must instead use > the Java action. The problem is that these components require a lot of extra > configuration that is often hidden from the user in Java code (e.g. > HFileOutputFormat.configureIncrementalLoad(job, table); which can also > include decision logic, serialization, and other things that we can't do in > an XML file directly. > One way to solve this problem is to allow the user to give the MR action some > Java code that would do this configuration, similar to how we allow the > {{}} field to specify an external XML file of configuration > properties. > In more detail, we could have an interface; something like this: > {code} > public interface OozieActionConfigurator { > public void updateOozieActionConfiguration(Configuration conf); > } > {code} > that the user can implement, create a jar, and include with their MR action > (i.e. add a "{{}}" field that let's them specify the class > name). To protect the Oozie server from running user code (which could do > anything it wants really), it would have to be run in the Launcher Job. The > Launcher Job could call this method after it loads the configuration prepared > by the Oozie server. > Another thing this will be helpful is with users who use the Java action to > launch MR jobs and expect a bunch of things to be done for them that are not > (e.g. delegation token propagation, config loading, returning the hadoop job > to Oozie, etc). These are all done with the MR action, so the more users we > can move to the MR action from the Java action, the less they'll run into > these difficulties. > Some of this may change slightly as I try to actually implement this (e.g. > have to handle throwing exceptions etc). And one thing I may do is keep this > general enough that it should be compatible with all action types in case we > want to add this to any of them in the future; though for now, the schema > would only accept it for the MapReduce action. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Java 7
+1 Mona Chitnis On Tuesday, September 23, 2014 6:21 PM, Rohini Palaniswamy wrote: +1 to drop support for Java 6 from Oozie trunk. I am just closing a vote on that for Pig now. From Hadoop 2.7, hadoop is planning to publish maven artifacts in jdk1.7. So it is better we drop support. Can you also open up a vote for dropping support for Hadoop 0.20 along with JDK 7 one? On Tue, Sep 23, 2014 at 10:58 AM, Robert Kanter wrote: > Hi all, > > I wanted to open a discussion about Java 7. Hadoop is planning on dropping > support for JDK 6 with Hadoop 2.7. Should we switch Oozie trunk to do the > same? > > On a related note, for OOZIE-1793, I'm trying to improve the findbugs > reporting for Oozie, and the latest findbugs requires Java 7. So there are > 3 options: > - Drop support for Java 6 for Oozie trunk (the overall question above) > - Switch to an older version of findbugs that does support Java 6. I'd > have to double check, but this may might it more difficult to get html > human-readable reports instead of XML > - Leave it as is. Findbugs only runs with the "verify" target, so all > other maven commands still work with Java 6, including compiling. > > thoughts? > > thanks > - Robert >
[jira] [Commented] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147856#comment-14147856 ] Mona Chitnis commented on OOZIE-1976: - Thanks [~rkanter] for comments. * We are thinking of using a serialize/deserialize technique (protobuf is one option) to convert back and forth from the object. I've created a class LogicalDependencySet for this object which either contains the subclass objects LogicalDependencyAndSet or LogicalDependencyOrSet and the leaf level is Dependency which has the lists of resolved and unresolved instances. Yet to see what is the cost of protobuf serde here. * Yes it is possible to do nested combinations, but will limit it to a depth of 2. i.e. both your examples are depth 2 and the most common cases that we should satisfy in the first go. An important thing to note here is the case of OR can have two 'strategies' :- ** 'Combined' : In case of {{A || B}}, instances of A and B can be interleaved to give the final "combined" set of total instances. For this, the requirement is that user considers both as equivalent, and they have the same frequency, initial instance etc. ** 'Exclusive' : In same case as above, either A should be completely used or B completely used. No interleaving. * Yes a better API output will be to display the action is waiting on which OR datasets' instances. > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-WIP.patch, OOZIE-1976-rough-design-2.pdf, > OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1976: Attachment: OOZIE-1976-WIP.patch attaching WIP patch for records sake. I will upload the v-1 patch when I have a fairly working version ready by tomorrow > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-WIP.patch, OOZIE-1976-rough-design-2.pdf, > OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1932: Attachment: OOZIE-1932-4-amendment.patch {code} 2014-09-22 22:02:34,148 INFO ShareLibService:539 [main] - USER[-] GROUP[-] oozie-hadoop-utils-2.3.0.oozie-4.4.1.1.jar uploaded to hdfs:/tmp/hdfs_shared_lib_path/launcher_2014090233/oozie 2014-09-22 22:02:34,198 INFO ShareLibService:539 [main] - USER[-] GROUP[-] oozie-sharelib-hcatalog-4.4.1.1.jar uploaded to hdfs:/tmp/hdfs_shared_lib_path/launcher_2014090233/oozie 2014-09-22 22:02:34,199 ERROR ShareLibService:536 [main] - USER[-] GROUP[-] Sharelib initialization fails java.lang.NullPointerException at org.apache.oozie.service.ShareLibService.setupLauncherLibPath(ShareLibService.java:178) at org.apache.oozie.service.ShareLibService.updateLauncherLib(ShareLibService.java:158) at org.apache.oozie.service.ShareLibService.init(ShareLibService.java:111) at org.apache.oozie.service.Services.setServiceInternal(Services.java:368) ShareLibService is dependent on ActionService. private void setupLauncherLibPath(FileSystem fs, Path tmpLauncherLibPath) throws IOException { ActionService actionService = Services.get().get(ActionService.class); List classes = JavaActionExecutor.getCommonLauncherClasses(); Path baseDir = new Path(tmpLauncherLibPath, JavaActionExecutor.OOZIE_COMMON_LIBDIR); copyJarContainingClasses(classes, fs, baseDir, JavaActionExecutor.OOZIE_COMMON_LIBDIR); Set actionTypes = actionService.getActionTypes(); {code} Attaching amendment patch > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-3.patch, > OOZIE-1932-4-amendment.patch, OOZIE-1932-4.patch, OOZIE-1932-addendum.patch, > OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143595#comment-14143595 ] Mona Chitnis commented on OOZIE-1932: - {quote} . -1 the patch does not add/modify any testcase {quote} This is a simple config change in oozie-default.xml and there is no applicable test-case just to check relative order of loading services {quote} . The patch failed the following testcases: . testBundleStatusNotTransitionFromKilled(org.apache.oozie.service.TestStatusTransitService) . testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService) {quote} These test failures are unrelated to my patch. I reran the tests in my local env and they pass consistently Committed patch to trunk and branch-4.1. Thanks Puru for review! > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-3.patch, > OOZIE-1932-4.patch, OOZIE-1932-addendum.patch, OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: issue after OOZIE-1807
bq. if a bundle with two actions - one FAILED due to coordinator submission error, other KILLED), bundle is supposed to KILLED bq. Bundle should be FAILED and not KILLED. Only when user has KILLED the bundle, should its status be KILLED. Thanks for minor correction. I was shooting for bundle will _not_ be DONEWITHERROR, which Bowen said he's observing Mona Chitnis Software Engineer, Hadoop Team Yahoo! On Thursday, September 18, 2014 5:09 PM, Rohini Palaniswamy wrote: bq. Shouldn't oozie be intelligent enough to do a no-op on a killed coord job? There are options now to resume a killed coord job. If new end time was applied on other coord jobs and not applied on that one, user needs to know. bq. if a bundle with two actions - one FAILED due to coordinator submission error, other KILLED), bundle is supposed to KILLED Bundle should be FAILED and not KILLED. Only when user has KILLED the bundle, should its status be KILLED. -Rohini On Thu, Sep 18, 2014 at 4:09 PM, Purshotam Shah wrote: Bowen, JIRA has explanation. Please update JIRA if you see any issue with approach. >Why is it a good idea to throw an exception if one of the coord jobs is >in "killed" state? In the BundleJobChangeXCommand, the code doesn't even >attempt to change the coord job. Shouldn't oozie be intelligent >enough >to do a no-op on a killed coord job? To let user know the list of coord jobs for which change is not applied. Puru. On 9/18/14, 2:11 PM, "bowen zhang" wrote: >Hi Purshotam, >Why is it a good idea to throw an exception if one of the coord jobs is >in "killed" state? In the BundleJobChangeXCommand, the code doesn't even >attempt to change the coord job. Shouldn't oozie be intelligent enough to >do a no-op on a killed coord job? >Bowen > > > > > From: Purshotam Shah >To: "dev@oozie.apache.org" ; Mona Chitnis >; bowen zhang >Sent: Wednesday, September 17, 2014 6:17 PM >Subject: Re: issue after OOZIE-1807 > > >Hi Bowen, > BundleJobChangeXCommand command will get applied to bundle and coord >jobs. It will aggregate message for all killed coord jobs and throw them >as exception. >It is similar to chmod command. > >JIRA has more details. Let me know if you need any other information. > >Puru. > > > > > >On 9/17/14, 6:05 PM, "Mona Chitnis" wrote: > >> >>Puru, >>Bowen just gave me a call regarding this issue. Can you answer his >>question? That'll be faster than me digging through the code. >> Mona Chitnis >>Yahoo! >> >> On Wednesday, September 17, 2014 5:51 PM, bowen zhang >> wrote: >> >> >> Hi guys, >> >>Purshatom, I see you checked oozie-1807 into the trunk. So, I have a >>question, why does it need to throw an exception when someone wants to >>change a bundle job where one of its coord job is in KILLED state? Due to >>the change in BundleJobChangeXCommand, this is throwing exceptions when >>trying to change a RUNNING bundle job where some of the coord jobs are >>intentionally killed by the user. >>Thanks, >>Bowen >> >> >>
Re: issue after OOZIE-1807
Bowen, Regarding the other issue (if a bundle with two actions - one FAILED due to coordinator submission error, other KILLED), bundle is supposed to KILLED. I see this taken care of as part of OOZIE-1940 also(StatusTransitService) but it is not committed to Apache yet. Puru can help track down if any of his other patches changed this desired behavior. Mona Chitnis Yahoo! On Wednesday, September 17, 2014 6:05 PM, Mona Chitnis wrote: Puru, Bowen just gave me a call regarding this issue. Can you answer his question? That'll be faster than me digging through the code. Mona Chitnis Yahoo! On Wednesday, September 17, 2014 5:51 PM, bowen zhang wrote: Hi guys, Purshatom, I see you checked oozie-1807 into the trunk. So, I have a question, why does it need to throw an exception when someone wants to change a bundle job where one of its coord job is in KILLED state? Due to the change in BundleJobChangeXCommand, this is throwing exceptions when trying to change a RUNNING bundle job where some of the coord jobs are intentionally killed by the user. Thanks, Bowen
Re: issue after OOZIE-1807
Puru, Bowen just gave me a call regarding this issue. Can you answer his question? That'll be faster than me digging through the code. Mona Chitnis Yahoo! On Wednesday, September 17, 2014 5:51 PM, bowen zhang wrote: Hi guys, Purshatom, I see you checked oozie-1807 into the trunk. So, I have a question, why does it need to throw an exception when someone wants to change a bundle job where one of its coord job is in KILLED state? Due to the change in BundleJobChangeXCommand, this is throwing exceptions when trying to change a RUNNING bundle job where some of the coord jobs are intentionally killed by the user. Thanks, Bowen
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/ --- (Updated Sept. 17, 2014, 6:59 p.m.) Review request for oozie. Changes --- Addressd Puru's comment to make separate bundle/coord disable/enable commands. Bugs: OOZIE-1913 https://issues.apache.org/jira/browse/OOZIE-1913 Repository: oozie-git Description --- See Jira Diffs (updated) - client/src/main/java/org/apache/oozie/cli/OozieCLI.java f3ffd1f client/src/main/java/org/apache/oozie/client/OozieClient.java d6ff2d0 client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java 801ad7e client/src/main/java/org/apache/oozie/client/rest/RestConstants.java 4b393c8 core/src/main/java/org/apache/oozie/CoordinatorActionBean.java cc5596b core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 71a9ab4 core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java 070cee5 core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java de78ab7 core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java 05b7a62 core/src/main/java/org/apache/oozie/coord/CoordUtils.java 4643d73 core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java 0aee0e4 core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 2c9e00e core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java c19839f core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java ee1085a core/src/main/java/org/apache/oozie/service/EventHandlerService.java 244c048 core/src/main/java/org/apache/oozie/servlet/BaseJobServlet.java 11835ed core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 2578e41 core/src/main/java/org/apache/oozie/servlet/V0JobServlet.java eb699e6 core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 396661a core/src/main/java/org/apache/oozie/servlet/V2JobServlet.java de4f865 core/src/main/java/org/apache/oozie/sla/BundleDisableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/BundleEnableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/CoordDisableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/CoordEnableSlaAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 189d5ea core/src/main/java/org/apache/oozie/sla/SLACalculator.java 20f93b5 core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java cdf8b73 core/src/main/java/org/apache/oozie/sla/SLAOperations.java f5fc826 core/src/main/java/org/apache/oozie/sla/service/SLAService.java 89615bc core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java 7c2620c core/src/main/resources/oozie-default.xml 6a91dc6 core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java f13e48f core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java ae3f18d core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java 30fd151 core/src/test/java/org/apache/oozie/servlet/DagServletTestCase.java 48193c7 core/src/test/java/org/apache/oozie/servlet/TestV2JobServlet.java db9c594 core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java db3f6eb Diff: https://reviews.apache.org/r/24487/diff/ Testing --- unit tests added, e-2-e test with CLI command done Thanks, Mona Chitnis
Re: oozie on oracle issue
Yes sounds like an OpenJPA schemas creation limitation. We directly use Oracle SQL scripts for tables creation. In HA mode, I think we use same SID for multiple schemas similar to the setup you described here but haven't faced this issue because of using SQL directlyhttps://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail for iPhone
RE: oozie on oracle issue
No not faced this issue before. That might be because we have Oracle instance dedicated to Oozie database. You have multiple db_owners because of a shared instance between say Oozie and other projects' schemas?https://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail for iPhone
Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24948/#review52789 --- Ship it! Looks good now. Thanks for the 2 clarifications above. - Mona Chitnis On Sept. 8, 2014, 9:15 p.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24948/ > --- > > (Updated Sept. 8, 2014, 9:15 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1940 > https://issues.apache.org/jira/browse/OOZIE-1940 > > > Repository: oozie-git > > > Description > --- > > StatusTransitService has race condition > > > Diffs > - > > core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d > core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 > core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java > 36cd968 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 3008393 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 04e6e29 > core/src/main/java/org/apache/oozie/service/StatusTransitService.java > 21ac25f > core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java > bb99138 > > Diff: https://reviews.apache.org/r/24948/diff/ > > > Testing > --- > > UTC > > > Thanks, > > Purshotam Shah > >
[jira] [Updated] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1932: Attachment: OOZIE-1932-4.patch addressed Puru's comment > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-3.patch, > OOZIE-1932-4.patch, OOZIE-1932-addendum.patch, OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1932: Attachment: OOZIE-1932-3.patch uploaded new patch OOZIE-1932-3.patch. > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-3.patch, > OOZIE-1932-addendum.patch, OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition
> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote: > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, > > line 177 > > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line177> > > > > related question, is this situation possible? - job status is PAUSED || > > PWE, and bundle action status is RWE? > > Purshotam Shah wrote: > Yes, BundlePauseXCommand only set bundle status to pause. Bundle status > can still be in running state. Ok. I just checked that BundlePauseXCommand and CoordPauseXCommand have empty implementations of pauseChildren() @Override public void pauseChildren() throws CommandException { // TODO - need revisit when revisiting coord job status redesign; } - Mona --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24948/#review52346 --- On Sept. 8, 2014, 9:15 p.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24948/ > --- > > (Updated Sept. 8, 2014, 9:15 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1940 > https://issues.apache.org/jira/browse/OOZIE-1940 > > > Repository: oozie-git > > > Description > --- > > StatusTransitService has race condition > > > Diffs > - > > core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d > core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 > core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java > 36cd968 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 3008393 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 04e6e29 > core/src/main/java/org/apache/oozie/service/StatusTransitService.java > 21ac25f > core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java > bb99138 > > Diff: https://reviews.apache.org/r/24948/diff/ > > > Testing > --- > > UTC > > > Thanks, > > Purshotam Shah > >
Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24948/#review52722 --- core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91699> But this is executed only if condition bAction.getCoordId() == null. So the case you mention will not occur core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91703> Then I think we should 'skip' loading in SKIPPED actions from DB. The idea of skipped is that its outcome would not make any difference. So if all other actions are terminal, you will mark parent coord/bundle as terminal. if some are non-terminal, it wont be. ok to optimize this from earlier code - Mona Chitnis On Sept. 8, 2014, 9:15 p.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24948/ > --- > > (Updated Sept. 8, 2014, 9:15 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1940 > https://issues.apache.org/jira/browse/OOZIE-1940 > > > Repository: oozie-git > > > Description > --- > > StatusTransitService has race condition > > > Diffs > - > > core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d > core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 > core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java > 36cd968 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 3008393 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 04e6e29 > core/src/main/java/org/apache/oozie/service/StatusTransitService.java > 21ac25f > core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java > bb99138 > > Diff: https://reviews.apache.org/r/24948/diff/ > > > Testing > --- > > UTC > > > Thanks, > > Purshotam Shah > >
Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24948/#review52346 --- core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91109> do we need to execute this synchronously? None of the counts here are affected by the outcome of this command. Can queue it to release lock faster core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91115> related question, is this situation possible? - job status is PAUSED || PWE, and bundle action status is RWE? core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91116> is this a change from current behavior? a mix of suspended, failed, killed, DWE, SPE = SPE? I'm not sure. Sounds reasonable to me but need to check if this could potentially be confusing to anyone core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91118> I dont understand this. Are the other bundle actions in Running? Then why is status Prep and not Running? core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91117> why is getPrepStatus calling getRunningStatus? core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91119> same comment as BundleStatusTransitX, just make sure whether SuspendedWithError is the right status here core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment91120> why dont we filter out the SKIPPED actions altogether for status transit processing? core/src/main/java/org/apache/oozie/service/StatusTransitService.java <https://reviews.apache.org/r/24948/#comment91122> some form of batching done right away would be good, so we can execute bundle/coord update queries on multiple bundles/coords in a batched execute query. However then we wont be able to utilize the locking per job to have strong consistency. I guess ok to defer this until we have a better idea - Mona Chitnis On Aug. 26, 2014, 12:30 a.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24948/ > --- > > (Updated Aug. 26, 2014, 12:30 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1940 > https://issues.apache.org/jira/browse/OOZIE-1940 > > > Repository: oozie-git > > > Description > --- > > StatusTransitService has race condition > > > Diffs > - > > core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d > core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 > core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java > 36cd968 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 3008393 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 04e6e29 > core/src/main/java/org/apache/oozie/service/StatusTransitService.java > 21ac25f > core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java > bb99138 > > Diff: https://reviews.apache.org/r/24948/diff/ > > > Testing > --- > > UTC > > > Thanks, > > Purshotam Shah > >
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review52265 --- Extra line changes (false ones) in BundleSubmitX and SubmitTransitionX I'll remove in final/next version - Mona Chitnis On Sept. 4, 2014, 1:05 a.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Sept. 4, 2014, 1:05 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 79a9b68 > client/src/main/java/org/apache/oozie/client/OozieClient.java 363ebd2 > > client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java > 801ad7e > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 4b393c8 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java cc5596b > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 14fd74c > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 070cee5 > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > d479086 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > a13fe83 > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 4643d73 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 0aee0e4 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 25953bf > core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java c19839f > > core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java > 7a688b1 > core/src/main/java/org/apache/oozie/service/EventHandlerService.java > 244c048 > core/src/main/java/org/apache/oozie/servlet/BaseJobServlet.java f651d5c > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 2578e41 > core/src/main/java/org/apache/oozie/servlet/V0JobServlet.java 508538d > core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 6427989 > core/src/main/java/org/apache/oozie/servlet/V2JobServlet.java b7b9be9 > core/src/main/java/org/apache/oozie/sla/SLAAlertsXCommand.java PRE-CREATION > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 189d5ea > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 20f93b5 > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java cdf8b73 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java f5fc826 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 89615bc > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > 7c2620c > core/src/main/resources/oozie-default.xml 3a957d0 > > core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java > f13e48f > core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java ae3f18d > core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java > 30fd151 > core/src/test/java/org/apache/oozie/servlet/DagServletTestCase.java 48193c7 > core/src/test/java/org/apache/oozie/servlet/TestV2JobServlet.java db9c594 > core/src/test/java/org/apache/oozie/servlet/TestV2SLAServlet.java 5f51b22 > core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java > db3f6eb > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > unit tests added, e-2-e test with CLI command done > > > Thanks, > > Mona Chitnis > >
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/ --- (Updated Sept. 4, 2014, 1:05 a.m.) Review request for oozie. Changes --- updated patch with review comments Bugs: OOZIE-1913 https://issues.apache.org/jira/browse/OOZIE-1913 Repository: oozie-git Description --- See Jira Diffs (updated) - client/src/main/java/org/apache/oozie/cli/OozieCLI.java 79a9b68 client/src/main/java/org/apache/oozie/client/OozieClient.java 363ebd2 client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java 801ad7e client/src/main/java/org/apache/oozie/client/rest/RestConstants.java 4b393c8 core/src/main/java/org/apache/oozie/CoordinatorActionBean.java cc5596b core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 14fd74c core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java 070cee5 core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java d479086 core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java a13fe83 core/src/main/java/org/apache/oozie/coord/CoordUtils.java 4643d73 core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java 0aee0e4 core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 25953bf core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java c19839f core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java 7a688b1 core/src/main/java/org/apache/oozie/service/EventHandlerService.java 244c048 core/src/main/java/org/apache/oozie/servlet/BaseJobServlet.java f651d5c core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 2578e41 core/src/main/java/org/apache/oozie/servlet/V0JobServlet.java 508538d core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 6427989 core/src/main/java/org/apache/oozie/servlet/V2JobServlet.java b7b9be9 core/src/main/java/org/apache/oozie/sla/SLAAlertsXCommand.java PRE-CREATION core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 189d5ea core/src/main/java/org/apache/oozie/sla/SLACalculator.java 20f93b5 core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java cdf8b73 core/src/main/java/org/apache/oozie/sla/SLAOperations.java f5fc826 core/src/main/java/org/apache/oozie/sla/service/SLAService.java 89615bc core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java 7c2620c core/src/main/resources/oozie-default.xml 3a957d0 core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java f13e48f core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java ae3f18d core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java 30fd151 core/src/test/java/org/apache/oozie/servlet/DagServletTestCase.java 48193c7 core/src/test/java/org/apache/oozie/servlet/TestV2JobServlet.java db9c594 core/src/test/java/org/apache/oozie/servlet/TestV2SLAServlet.java 5f51b22 core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java db3f6eb Diff: https://reviews.apache.org/r/24487/diff/ Testing --- unit tests added, e-2-e test with CLI command done Thanks, Mona Chitnis
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
> On Aug. 29, 2014, 4:52 p.m., Purshotam Shah wrote: > > client/src/main/java/org/apache/oozie/client/OozieClient.java, line 152 > > <https://reviews.apache.org/r/24487/diff/2/?file=660982#file660982line152> > > > > Do you need to say new (newshouldend ) ? > > > > When we specify end time for coord, we just say endtime=<>, better to > > keep same convention. Its consistent with the current SLA terminology - should-start, should-end. I dont see the major benefit of deviating from this terminology. Also endtime=<> and should-end are dealing with different values, former specifies date and latter specifies the number of minutes relative to a nominal time for sla purposes. > On Aug. 29, 2014, 4:52 p.m., Purshotam Shah wrote: > > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java, > > line 91 > > <https://reviews.apache.org/r/24487/diff/2/?file=660987#file660987line91> > > > > We are reading from same conf and setting to same conf. why?? dont remember the rationale now. removing - Mona --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review51473 --- On Aug. 14, 2014, 11:13 p.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Aug. 14, 2014, 11:13 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 > client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 > > client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java > 2f0a45c > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 5d3fc62 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 5d3b6af > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > ffb2d08 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > b4b2fef > core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java > 02b30ef > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > cd26e07 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 42a0968 > core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java 8296a6c > > core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java > 3fbd092 > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 > core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > fd21c45 > core/src/main/resources/oozie-default.xml ebceaa7 > core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 > > core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java > fedf4a8 > core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java a39efe3 > core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java > fa26935 > core/src/test/java/org/apache/oozie/servlet/TestV2SLAServlet.java 5a35fdb > core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java > 210c99e > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > unit tests added, e-2-e test with CLI command done > > > Thanks, > > Mona Chitnis > >
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review52110 --- client/src/main/java/org/apache/oozie/cli/OozieCLI.java <https://reviews.apache.org/r/24487/#comment90856> changing this to hasArgs=false as its not mandatory client/src/main/java/org/apache/oozie/cli/OozieCLI.java <https://reviews.apache.org/r/24487/#comment90853> this got left behind. thanks - Mona Chitnis On Aug. 14, 2014, 11:13 p.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Aug. 14, 2014, 11:13 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 > client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 > > client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java > 2f0a45c > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 5d3fc62 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 5d3b6af > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > ffb2d08 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > b4b2fef > core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java > 02b30ef > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > cd26e07 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 42a0968 > core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java 8296a6c > > core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java > 3fbd092 > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 > core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > fd21c45 > core/src/main/resources/oozie-default.xml ebceaa7 > core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 > > core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java > fedf4a8 > core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java a39efe3 > core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java > fa26935 > core/src/test/java/org/apache/oozie/servlet/TestV2SLAServlet.java 5a35fdb > core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java > 210c99e > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > unit tests added, e-2-e test with CLI command done > > > Thanks, > > Mona Chitnis > >
[jira] [Resolved] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs
[ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis resolved OOZIE-1984. - Resolution: Fixed Committed to trunk and 4.1.0. Thanks for review Ryota > SLACalculator in HA mode performs duplicate operations on records with > completed jobs > - > > Key: OOZIE-1984 > URL: https://issues.apache.org/jira/browse/OOZIE-1984 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Mona Chitnis > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1984-1.patch, OOZIE-1984.patch > > > Scenario: > SLA periodic run has already processed start,duration and end for a job's sla > entry. But job notification for that job came after this, and triggers the > sla listener. > Buggy part: > {code} > SLACalculatorMemory.java > else if > (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > if (slaRegBean != null) { // filter out jobs picked by SLA > job event listener > // but not actually configured for > SLA > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get( > SLASummaryQuery.GET_SLA_SUMMARY, jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > if (slaCalc.getEventProcessed() < 7) { > slaMap.put(jobId, slaCalc); > } > } > } > } > if (slaCalc != null) { > .. > Object eventProcObj = ((SLASummaryQueryExecutor) > SLASummaryQueryExecutor.getInstance()) > > .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId); > byte eventProc = ((Byte) eventProcObj).byteValue(); > .. > processJobEndSuccessSLA(slaCalc, startTime, endTime); > {code} > method processJobEndSuccesSLA goes ahead and checks second LSB bit of > eventProc and sends duration event _again_. So the bug here is two-fold: > * if all events are already processed, still invokes this function > * event processed is 8 (1000), so second LSB bit is unset and hence duration > processed. > Fix - not invoke function when eventProc = 1000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs
[ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1984: Attachment: OOZIE-1984-1.patch updated patch v-1 > SLACalculator in HA mode performs duplicate operations on records with > completed jobs > - > > Key: OOZIE-1984 > URL: https://issues.apache.org/jira/browse/OOZIE-1984 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Mona Chitnis > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1984-1.patch, OOZIE-1984.patch > > > Scenario: > SLA periodic run has already processed start,duration and end for a job's sla > entry. But job notification for that job came after this, and triggers the > sla listener. > Buggy part: > {code} > SLACalculatorMemory.java > else if > (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > if (slaRegBean != null) { // filter out jobs picked by SLA > job event listener > // but not actually configured for > SLA > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get( > SLASummaryQuery.GET_SLA_SUMMARY, jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > if (slaCalc.getEventProcessed() < 7) { > slaMap.put(jobId, slaCalc); > } > } > } > } > if (slaCalc != null) { > .. > Object eventProcObj = ((SLASummaryQueryExecutor) > SLASummaryQueryExecutor.getInstance()) > > .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId); > byte eventProc = ((Byte) eventProcObj).byteValue(); > .. > processJobEndSuccessSLA(slaCalc, startTime, endTime); > {code} > method processJobEndSuccesSLA goes ahead and checks second LSB bit of > eventProc and sends duration event _again_. So the bug here is two-fold: > * if all events are already processed, still invokes this function > * event processed is 8 (1000), so second LSB bit is unset and hence duration > processed. > Fix - not invoke function when eventProc = 1000 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 25166: OOZIE-1984 SLACalculator in HA mode performs duplicate operations on records with completed jobs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25166/ --- (Updated Aug. 28, 2014, 10:55 p.m.) Review request for oozie. Changes --- Ryota's comment addressed and after offline clarification Bugs: OOZIE-1984 https://issues.apache.org/jira/browse/OOZIE-1984 Repository: oozie-git Description --- see jira Diffs (updated) - core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 Diff: https://reviews.apache.org/r/25166/diff/ Testing --- existing test pass. e-2-e test will follow in QA Thanks, Mona Chitnis
[jira] [Updated] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs
[ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1984: Attachment: OOZIE-1984.patch > SLACalculator in HA mode performs duplicate operations on records with > completed jobs > - > > Key: OOZIE-1984 > URL: https://issues.apache.org/jira/browse/OOZIE-1984 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Mona Chitnis > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1984.patch > > > Scenario: > SLA periodic run has already processed start,duration and end for a job's sla > entry. But job notification for that job came after this, and triggers the > sla listener. > Buggy part: > {code} > SLACalculatorMemory.java > else if > (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > if (slaRegBean != null) { // filter out jobs picked by SLA > job event listener > // but not actually configured for > SLA > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get( > SLASummaryQuery.GET_SLA_SUMMARY, jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > if (slaCalc.getEventProcessed() < 7) { > slaMap.put(jobId, slaCalc); > } > } > } > } > if (slaCalc != null) { > .. > Object eventProcObj = ((SLASummaryQueryExecutor) > SLASummaryQueryExecutor.getInstance()) > > .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId); > byte eventProc = ((Byte) eventProcObj).byteValue(); > .. > processJobEndSuccessSLA(slaCalc, startTime, endTime); > {code} > method processJobEndSuccesSLA goes ahead and checks second LSB bit of > eventProc and sends duration event _again_. So the bug here is two-fold: > * if all events are already processed, still invokes this function > * event processed is 8 (1000), so second LSB bit is unset and hence duration > processed. > Fix - not invoke function when eventProc = 1000 -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 25166: OOZIE-1984 SLACalculator in HA mode performs duplicate operations on records with completed jobs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25166/ --- Review request for oozie. Bugs: OOZIE-1984 https://issues.apache.org/jira/browse/OOZIE-1984 Repository: oozie-git Description --- see jira Diffs - core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 Diff: https://reviews.apache.org/r/25166/diff/ Testing --- existing test pass. e-2-e test will follow in QA Thanks, Mona Chitnis
[jira] [Created] (OOZIE-1984) SLACalculator in HA mode performs duplicate operations on records with completed jobs
Mona Chitnis created OOZIE-1984: --- Summary: SLACalculator in HA mode performs duplicate operations on records with completed jobs Key: OOZIE-1984 URL: https://issues.apache.org/jira/browse/OOZIE-1984 Project: Oozie Issue Type: Bug Affects Versions: trunk Reporter: Mona Chitnis Fix For: trunk, 4.1.0 Scenario: SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener. Buggy part: {code} SLACalculatorMemory.java else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) { // jobid might not exist in slaMap in HA Setting SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get( SLARegQuery.GET_SLA_REG_ALL, jobId); if (slaRegBean != null) { // filter out jobs picked by SLA job event listener // but not actually configured for SLA SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get( SLASummaryQuery.GET_SLA_SUMMARY, jobId); slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); if (slaCalc.getEventProcessed() < 7) { slaMap.put(jobId, slaCalc); } } } } if (slaCalc != null) { .. Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance()) .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId); byte eventProc = ((Byte) eventProcObj).byteValue(); .. processJobEndSuccessSLA(slaCalc, startTime, endTime); {code} method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event _again_. So the bug here is two-fold: * if all events are already processed, still invokes this function * event processed is 8 (1000), so second LSB bit is unset and hence duration processed. Fix - not invoke function when eventProc = 1000 -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24948/#review51661 --- this is good cleanup and refactoring. Did cursory review to understand structural changes. Will review more carefully for any bugs introduced later today core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment90178> if bundleActionStatus map has some action as RUNNINGWITHERROR, why are we setting bundle job to PAUSED? core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment90177> typo in bottom core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java <https://reviews.apache.org/r/24948/#comment90179> same typo - Mona Chitnis On Aug. 26, 2014, 12:30 a.m., Purshotam Shah wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24948/ > --- > > (Updated Aug. 26, 2014, 12:30 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1940 > https://issues.apache.org/jira/browse/OOZIE-1940 > > > Repository: oozie-git > > > Description > --- > > StatusTransitService has race condition > > > Diffs > - > > core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d > core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 > core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java > e69de29 > > core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java > 36cd968 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > 3008393 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 04e6e29 > core/src/main/java/org/apache/oozie/service/StatusTransitService.java > 21ac25f > core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java > bb99138 > > Diff: https://reviews.apache.org/r/24948/diff/ > > > Testing > --- > > UTC > > > Thanks, > > Purshotam Shah > >
[jira] [Commented] (OOZIE-1940) StatusTransitService has race condition
[ https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112362#comment-14112362 ] Mona Chitnis commented on OOZIE-1940: - Agree with approach. Currently each run of Status Transit Service takes multiple seconds I believe. If it is going to hold the lock for that long, we have to asses the consequences on the other commands waiting for lock. E.g. Change command appearing to "hang" on user-facing CLI, because its synchronously trying to acquire lock held by STS. OOZIE-1885 should ideally reduce this overall time the lock is to be held by STS > StatusTransitService has race condition > --- > > Key: OOZIE-1940 > URL: https://issues.apache.org/jira/browse/OOZIE-1940 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah > > StatusTransitService doesn't acquire lock while updating DB. > We noticed one such issue while doing HA testing, thanks to [~mchiang] > We issue a change command to change pause time, which got executed on one > server. While change command was running on one server, other server started > executing StatusTransitService. > Server 1 log > {code} > 2014-07-16 17:28:05,268 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for > [org.apache.oozie.service.StatusTransitService] > 2014-07-16 17:28:09,694 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job > [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING' > 2014-07-16 17:28:15,416 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Released lock for > [org.apache.oozie.service.StatusTransitService] > {code} > Server 2 log > {code} > 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] > JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed > Jul 16 17:30:00 UTC 2014 and last action number is : 3 > 2014-07-16 17:28:06,508 INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] > JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand > for jobId=0011385-140716042555-oozie-oozi-C > {code} > CoordMaterializeTransitionXCommand has created all actions( few were in > waiting and few were in running state) and set doneMaterialization to true. > Change command deletes all waiting coords, except 3 running/SUCCEEDED action > and reset doneMaterialization. > StatusTransitService first loads a set of pending jobs and for each job it > make DB calls to check coord action status. Coord jobs are loaded only once > in beginning. > This is what happened. > 1.StatusTransitService loads the coord job which doneMaterialization is set > to true at 17:28:05,268 (server 1) > 2.Change command deletes waiting cation and reset doneMaterialization at > 17:28:06,508 (server 2) > 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status. > It never reload the doneMaterialization at 17:28:09,694 (server 1) > StatusTransitService overrides set job status to SUCCEEDED, bcz it's > doneMaterialization and all action are SUCCEEDED. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1940) StatusTransitService has race condition
[ https://issues.apache.org/jira/browse/OOZIE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112359#comment-14112359 ] Mona Chitnis commented on OOZIE-1940: - linking this as dependent of OOZIE-1885 > StatusTransitService has race condition > --- > > Key: OOZIE-1940 > URL: https://issues.apache.org/jira/browse/OOZIE-1940 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah > > StatusTransitService doesn't acquire lock while updating DB. > We noticed one such issue while doing HA testing, thanks to [~mchiang] > We issue a change command to change pause time, which got executed on one > server. While change command was running on one server, other server started > executing StatusTransitService. > Server 1 log > {code} > 2014-07-16 17:28:05,268 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Acquired lock for > [org.apache.oozie.service.StatusTransitService] > 2014-07-16 17:28:09,694 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Set coordinator job > [0011385-140716042555-oozie-oozi-C] status to 'SUCCEEDED' from 'RUNNING' > 2014-07-16 17:28:15,416 INFO StatusTransitService$StatusTransitRunnable:539 > [pool-1-thread-13] - USER[-] GROUP[-] Released lock for > [org.apache.oozie.service.StatusTransitService] > {code} > Server 2 log > {code} > 2014-07-16 17:28:06,499 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] > JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] New pause/end date is : Wed > Jul 16 17:30:00 UTC 2014 and last action number is : 3 > 2014-07-16 17:28:06,508 INFO CoordChangeXCommand:539 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB180] > JOB[0011385-140716042555-oozie-oozi-C] ACTION[-] ENDED CoordChangeXCommand > for jobId=0011385-140716042555-oozie-oozi-C > {code} > CoordMaterializeTransitionXCommand has created all actions( few were in > waiting and few were in running state) and set doneMaterialization to true. > Change command deletes all waiting coords, except 3 running/SUCCEEDED action > and reset doneMaterialization. > StatusTransitService first loads a set of pending jobs and for each job it > make DB calls to check coord action status. Coord jobs are loaded only once > in beginning. > This is what happened. > 1.StatusTransitService loads the coord job which doneMaterialization is set > to true at 17:28:05,268 (server 1) > 2.Change command deletes waiting cation and reset doneMaterialization at > 17:28:06,508 (server 2) > 3.StatusTransitService load actions for job, only 3 and in SUCCEEDED status. > It never reload the doneMaterialization at 17:28:09,694 (server 1) > StatusTransitService overrides set job status to SUCCEEDED, bcz it's > doneMaterialization and all action are SUCCEEDED. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1885) Query optimization for StatusTransitService
[ https://issues.apache.org/jira/browse/OOZIE-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112357#comment-14112357 ] Mona Chitnis commented on OOZIE-1885: - A join query is always more CPU and memory intensive. But it will probably cut down on the overall time it takes, because of the multiple queries in loop right now. Approach is fine but we should vet it with end-to-end test performance gains > Query optimization for StatusTransitService > --- > > Key: OOZIE-1885 > URL: https://issues.apache.org/jira/browse/OOZIE-1885 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah > > {code} > private void coordTransit() throws JPAExecutorException, CommandException { > List pendingJobCheckList = null; > if (lastInstanceStartTime == null) { > LOG.info("Running coordinator status service first instance"); > // this is the first instance, we need to check for all > pending jobs; > pendingJobCheckList = jpaService.execute(new > CoordJobsGetPendingJPAExecutor(limit)); > } > else { > LOG.info("Running coordinator status service from last > instance time = " > + DateUtils.formatDateOozieTZ(lastInstanceStartTime)); > // this is not the first instance, we should only check jobs > // that have actions or jobs been > // updated >= start time of last service run; > List actionsList = > CoordActionQueryExecutor.getInstance().getList( > > CoordActionQuery.GET_COORD_ACTIONS_BY_LAST_MODIFIED_TIME, > lastInstanceStartTime); > Set coordIds = new HashSet(); > for (CoordinatorActionBean action : actionsList) { > coordIds.add(action.getJobId()); > } > pendingJobCheckList = new ArrayList(); > for (String coordId : coordIds.toArray(new > String[coordIds.size()])) { > CoordinatorJobBean coordJob; > try { > coordJob = > CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB, coordId); > } > catch (JPAExecutorException jpaee) { > if (jpaee.getErrorCode().equals(ErrorCode.E0604)) { > LOG.warn("Exception happened during > StatusTransitRunnable; Coordinator Job doesn't exist", jpaee); > continue; > } else { > throw jpaee; > } > } > // Running coord job might have pending false > Job.Status coordJobStatus = coordJob.getStatus(); > if ((coordJob.isPending() || > coordJobStatus.equals(Job.Status.PAUSED) > || coordJobStatus.equals(Job.Status.RUNNING) > || > coordJobStatus.equals(Job.Status.RUNNINGWITHERROR) > || > coordJobStatus.equals(Job.Status.PAUSEDWITHERROR)) > && !coordJobStatus.equals(Job.Status.IGNORED)) { > pendingJobCheckList.add(coordJob); > } > } > > pendingJobCheckList.addAll(CoordJobQueryExecutor.getInstance().getList( > CoordJobQuery.GET_COORD_JOBS_CHANGED, > lastInstanceStartTime)); > } > aggregateCoordJobsStatus(pendingJobCheckList); > } > } > {code} > This could be done in one sql, something like > select w.id, w.status, w.pending from CoordinatorJobBean w where > w.startTimestamp <= :matTime AND (w.statusStr = 'PREP' OR w.statusStr = > 'RUNNING' or w.statusStr = 'RUNNINGWITHERROR' or w.statusStr= > 'PAUSEDWITHERROR' and w.statusStr <> 'IGNORED') w.id in ( select a.jobId > from CoordinatorActionBean a where a.lastModifiedTimestamp >= > :lastModifiedTime groupby a.jobId) > Same for bundleTransit(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1847) HA - Oozie servers should shutdown (or go in safe mode) in case of ZK failure
[ https://issues.apache.org/jira/browse/OOZIE-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112333#comment-14112333 ] Mona Chitnis commented on OOZIE-1847: - ^^ in case of timeout > 3 seconds resulting in server shutdown and job failure > HA - Oozie servers should shutdown (or go in safe mode) in case of ZK failure > - > > Key: OOZIE-1847 > URL: https://issues.apache.org/jira/browse/OOZIE-1847 > Project: Oozie > Issue Type: Bug > Components: HA >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1847-V1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1847) HA - Oozie servers should shutdown (or go in safe mode) in case of ZK failure
[ https://issues.apache.org/jira/browse/OOZIE-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112331#comment-14112331 ] Mona Chitnis commented on OOZIE-1847: - pretty straightforward patch and agree its needed. But in addition to printing in logs, should we bubble it up to action error message too? That way reason for a workflow failing can be pulled up from any of the client-facing APIs too - e.g. job-info, web-console, RESTful aPI etc > HA - Oozie servers should shutdown (or go in safe mode) in case of ZK failure > - > > Key: OOZIE-1847 > URL: https://issues.apache.org/jira/browse/OOZIE-1847 > Project: Oozie > Issue Type: Bug > Components: HA >Reporter: Purshotam Shah >Assignee: Purshotam Shah > Attachments: OOZIE-1847-V1.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1976: Attachment: OOZIE-1976-rough-design-2.pdf New design spec uploaded (rought-design-2) with additions about * Wait-for in action * EL functions initial thoughts - implementation details will follow in code patch * HCatDependencyCache changes (for the in-memory push-based hcat dependencies) * Job info API (coord-action) changes for displaying Missing Dependency. It runs the risk of being verbose if optional dataset has lot of instances. Needs thought about how to possibly truncate there. > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-rough-design-2.pdf, > OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104618#comment-14104618 ] Mona Chitnis commented on OOZIE-1976: - For Ryota's comment about priority, I think it complicates the missing dependencies field, now we require a structure to indicate something like {{P0=dep1,dep2#P1=dep3,dep4}} which in turn is nested under the AND/OR structure. So when dependencies are checked and found to exist, action will start only when all P0's are satisfied etc. I think this is essentially same as putting them in the block instead of optional block. For the N out of M case, it will start when _any_ instances >=n are available, using all M if all there, and not limit to N there. Good pointer about EL functions, that one's going to be important and will probably need a few new ones. > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator > Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104227#comment-14104227 ] Mona Chitnis commented on OOZIE-1976: - Thanks Puru and Ryota. Will incorporate your comments and come up with new design specification. As for the 'explain', this can be done as part of 'info' command displaying missing dependency itself, rather than introducing another command > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1976: Attachment: OOZIE-1976-rough-design.pdf Attaching rough design doc (pdf) > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1976: Description: All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g. * OR between instances * minimum N out of K instances * delta datasets (process data incrementally) Use-cases for this: * Different datasets are BCP, and workflow can run with either, whichever arrives earlier. * Data is not guaranteed, and while $coord:latest allows skipping to available ones, workflow will never trigger unless mentioned number of instances are found. * Workflow is like a ‘refining’ algorithm which should run after minimum required datasets are ready, and should only process the delta for efficiency. This JIRA is to discuss the design and then the review the implementation for some or all of the above features. was: All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g. * OR between instances * minimum N out of K instances * delta datasets (process data incrementally) Use-cases for this: Different datasets are BCP, and workflow can run with either, whichever arrives earlier. Data is not guaranteed, and while $coord:latest allows skipping to available ones, workflow will never trigger unless mentioned number of instances are found. Workflow is like a ‘refining’ algorithm which should run after minimum required datasets are ready, and should only process the delta for efficiency. This JIRA is to discuss the design and then the review the implementation for some or all of the above features. > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1976: Description: All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g. * OR between instances * minimum N out of K instances * delta datasets (process data incrementally) Use-cases for this: Different datasets are BCP, and workflow can run with either, whichever arrives earlier. Data is not guaranteed, and while $coord:latest allows skipping to available ones, workflow will never trigger unless mentioned number of instances are found. Workflow is like a ‘refining’ algorithm which should run after minimum required datasets are ready, and should only process the delta for efficiency. This JIRA is to discuss the design and then the review the implementation for some or all of the above features. was: All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g. * OR between instances * minimum N out of K instances * delta datasets (process data incrementally) This JIRA is to discuss the design and then the review the implementation for some or all of the above features. > Specifying coordinator input datasets in more logical ways > -- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator >Affects Versions: trunk >Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > Data is not guaranteed, and while $coord:latest allows skipping to available > ones, workflow will never trigger unless mentioned number of instances are > found. > Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (OOZIE-1976) Specifying coordinator input datasets in more logical ways
Mona Chitnis created OOZIE-1976: --- Summary: Specifying coordinator input datasets in more logical ways Key: OOZIE-1976 URL: https://issues.apache.org/jira/browse/OOZIE-1976 Project: Oozie Issue Type: New Feature Components: coordinator Affects Versions: trunk Reporter: Mona Chitnis Assignee: Mona Chitnis Fix For: trunk All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g. * OR between instances * minimum N out of K instances * delta datasets (process data incrementally) This JIRA is to discuss the design and then the review the implementation for some or all of the above features. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/ --- (Updated Aug. 14, 2014, 11:13 p.m.) Review request for oozie. Changes --- updated patch to include unit tests, and fixes uncovered in the process Bugs: OOZIE-1913 https://issues.apache.org/jira/browse/OOZIE-1913 Repository: oozie-git Description --- See Jira Diffs (updated) - client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 client/src/main/java/org/apache/oozie/client/event/jms/JMSHeaderConstants.java 2f0a45c client/src/main/java/org/apache/oozie/client/rest/RestConstants.java 5d3fc62 core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java 5d3b6af core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java ffb2d08 core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java b4b2fef core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java 02b30ef core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java cd26e07 core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 42a0968 core/src/main/java/org/apache/oozie/jms/JMSSLAEventListener.java 8296a6c core/src/main/java/org/apache/oozie/service/CoordMaterializeTriggerService.java 3fbd092 core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java fd21c45 core/src/main/resources/oozie-default.xml ebceaa7 core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 core/src/test/java/org/apache/oozie/command/coord/TestCoordSubmitXCommand.java fedf4a8 core/src/test/java/org/apache/oozie/coord/TestCoordUtils.java a39efe3 core/src/test/java/org/apache/oozie/jms/TestJMSSLAEventListener.java fa26935 core/src/test/java/org/apache/oozie/servlet/TestV2SLAServlet.java 5a35fdb core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java 210c99e Diff: https://reviews.apache.org/r/24487/diff/ Testing (updated) --- unit tests added, e-2-e test with CLI command done Thanks, Mona Chitnis
[jira] [Commented] (OOZIE-1913) Devise a way to turn off SLA alerts for bundle/coordinator flexibly
[ https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096118#comment-14096118 ] Mona Chitnis commented on OOZIE-1913: - Want to mention another point: this API also allows to disable for "ALL" sla instances for a coordinator or bundle. For bundle, that would mean all coordinators' all actions. SLARegistrationBean stores 'parentId' if the sla object pertains to coord-action/wf-action/bundle-action. To avoid heavy dB query in case of the suspend ALL for bundle(s) case, I want to change this 'parentId' to point to bundle jobId directly, if coordinator is part of a bundle. If not, it will be coord job id as it is now. The impact this has is in JMSSLAEventListener, where topicName is set to this parentId. So topicName will get set to top-level bundle-id, and user will have to change topic name being listened to. Please give feedback if this is a reasonable approach. I will make sure appropriate JMS selector options are available, if user gives this bundle id topicName, but still wants to limit per coordinator job id. > Devise a way to turn off SLA alerts for bundle/coordinator flexibly > --- > > Key: OOZIE-1913 > URL: https://issues.apache.org/jira/browse/OOZIE-1913 > Project: Oozie > Issue Type: Improvement >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > > From user: > Need to turn off the SLA miss alerts in jobs when the bundle is suspended for > grid upgrades and similar work so that when it's resumed we aren't flooded > with a bunch of alerts. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review50269 --- core/src/main/resources/oozie-default.xml <https://reviews.apache.org/r/24487/#comment87956> this change is part of OOZIE-1932 and will remove it in next patch version - Mona Chitnis On Aug. 8, 2014, 2:20 a.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Aug. 8, 2014, 2:20 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 > client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 5d3fc62 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 5d3b6af > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > ffb2d08 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > b4b2fef > core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java > 02b30ef > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > cd26e07 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 42a0968 > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 > core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > fd21c45 > core/src/main/resources/oozie-default.xml ebceaa7 > core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > ongoing > > > Thanks, > > Mona Chitnis > >
Re: Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/#review50101 --- client/src/main/java/org/apache/oozie/cli/OozieCLI.java <https://reviews.apache.org/r/24487/#comment87652> this has been removed client/src/main/java/org/apache/oozie/cli/OozieCLI.java <https://reviews.apache.org/r/24487/#comment87653> this has been removed. The action ids/dates range is read as argument for option -suspendalerts itself - Mona Chitnis On Aug. 8, 2014, 2:20 a.m., Mona Chitnis wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24487/ > --- > > (Updated Aug. 8, 2014, 2:20 a.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1913 > https://issues.apache.org/jira/browse/OOZIE-1913 > > > Repository: oozie-git > > > Description > --- > > See Jira > > > Diffs > - > > client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 > client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 > client/src/main/java/org/apache/oozie/client/rest/RestConstants.java > 5d3fc62 > core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 > core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 > core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java > 5d3b6af > > core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java > ffb2d08 > > core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java > b4b2fef > core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java > 02b30ef > core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 > > core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java > cd26e07 > core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java > 42a0968 > core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 > core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 > core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 > core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df > core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 > core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 > core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 > core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java > fd21c45 > core/src/main/resources/oozie-default.xml ebceaa7 > core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 > > Diff: https://reviews.apache.org/r/24487/diff/ > > > Testing > --- > > ongoing > > > Thanks, > > Mona Chitnis > >
Re: Problem with Oozie DeadLock
Hi Fabiano, You should definitely be able to execute multiple Oozie jobs simultaneously. The issue comes up when you have a small hadoop cluster setup, and thus very small number of queue slots for submitting jobs to the ResourceManager. Can you look into adding an additional queue by configuring your Hadoop cluster through capacity-scheduler.xml? Then you can use the approach mentioned in OOZIE-1673, to specify in your workflow's properties oozie.launcher.mapreduce.job.queuename=queue1 mapreduce.job.queuename=queue2 and avoid deadlock situation. You can also avoid deadlocks by tuning the memory requirements of your oozie launcher and child jobs, to request for lower memory container slots, and increase the number of jobs you can submit and execute that way.http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html Mona Chitnis Yahoo! On Friday, August 8, 2014 6:58 AM, Gaetano Fabiano wrote: Hi All, this is my first post on this mailing list I hope to stay here for long time. I saw the project and I love it. But recently I'm going crazy about issues. Our problem is about Oozie Deadlock, we would like to execute more than one Oozie job at the same time but when we try to execute its the entire environment becames locked all. We read a lot of post about this issue but without solution. How is possible to set two different queue? We read a lot of post where people suggest to use different queue for different execution Where we could set these queue setting? Is this the correct way to have different execution at the same time? Our bewilderment is reading this issue https://issues.apache.org/jira/browse/OOZIE-1673 and if is as described the big doubt is about the usefulness of the enteire Oozie framework. I hope someone can help us to resolve it and clear our mind about. Any suggestion is welcome. Regards Gaetano Dott. Gaetano Fabiano via Timpone n° 79 87055 San Giovanni in Fiore (Cs) ITALY mobile: +39 328 9469919 phone: +39 0984 991980 email: fabiano.gaet...@gmail.com skype: deepyoudeep skype:: gaetano.fab msn: deep...@hotmail.it twitter: @gaetanofabiano gtalk/hangoout/Google+ fg.pa...@gmail.com
[jira] [Commented] (OOZIE-1913) Devise a way to turn off SLA alerts for bundle/coordinator flexibly
[ https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090999#comment-14090999 ] Mona Chitnis commented on OOZIE-1913: - okay let me remove "-id" requirement. Regarding treating as job operation, I think it becomes ambiguous what type of alerts it means, so better to be clear with 'sla' command. Also, it removes the need to add additional param 'actions'. But can rework this if there's a consensus about what api usage is more intuitive. Asking feedback from users too > Devise a way to turn off SLA alerts for bundle/coordinator flexibly > --- > > Key: OOZIE-1913 > URL: https://issues.apache.org/jira/browse/OOZIE-1913 > Project: Oozie > Issue Type: Improvement >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > > From user: > Need to turn off the SLA miss alerts in jobs when the bundle is suspended for > grid upgrades and similar work so that when it's resumed we aren't flooded > with a bunch of alerts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1913) Devise a way to turn off SLA alerts for bundle/coordinator flexibly
[ https://issues.apache.org/jira/browse/OOZIE-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1913: Summary: Devise a way to turn off SLA alerts for bundle/coordinator flexibly (was: Devise a way to turn off SLA alerts when bundle/coordinator suspended) > Devise a way to turn off SLA alerts for bundle/coordinator flexibly > --- > > Key: OOZIE-1913 > URL: https://issues.apache.org/jira/browse/OOZIE-1913 > Project: Oozie > Issue Type: Improvement >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > > From user: > Need to turn off the SLA miss alerts in jobs when the bundle is suspended for > grid upgrades and similar work so that when it's resumed we aren't flooded > with a bunch of alerts. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24487: OOZIE-1913 Devise a way to turn off SLA alerts for bundle/coordinator flexibly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24487/ --- Review request for oozie. Bugs: OOZIE-1913 https://issues.apache.org/jira/browse/OOZIE-1913 Repository: oozie-git Description --- See Jira Diffs - client/src/main/java/org/apache/oozie/cli/OozieCLI.java 33935d3 client/src/main/java/org/apache/oozie/client/OozieClient.java b468186 client/src/main/java/org/apache/oozie/client/rest/RestConstants.java 5d3fc62 core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 core/src/main/java/org/apache/oozie/command/SubmitTransitionXCommand.java 5d3b6af core/src/main/java/org/apache/oozie/command/bundle/BundleSubmitXCommand.java ffb2d08 core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java b4b2fef core/src/main/java/org/apache/oozie/command/coord/CoordSubmitXCommand.java 02b30ef core/src/main/java/org/apache/oozie/coord/CoordUtils.java 26db068 core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java cd26e07 core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 42a0968 core/src/main/java/org/apache/oozie/servlet/SLAServlet.java 8ca2e81 core/src/main/java/org/apache/oozie/servlet/V2SLAServlet.java 8620af5 core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 67d6237 core/src/main/java/org/apache/oozie/sla/SLACalculator.java 132d4df core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 3801325 core/src/main/java/org/apache/oozie/sla/SLAOperations.java 0cad071 core/src/main/java/org/apache/oozie/sla/service/SLAService.java 2349329 core/src/main/java/org/apache/oozie/util/CoordActionsInDateRange.java fd21c45 core/src/main/resources/oozie-default.xml ebceaa7 core/src/test/java/org/apache/oozie/client/TestWorkflowClient.java e2e0f11 Diff: https://reviews.apache.org/r/24487/diff/ Testing --- ongoing Thanks, Mona Chitnis
Re: 4.1 release
OOZIE-1932 I will fix in a couple of days. Thanks, Mona Chitnis On Thursday, August 7, 2014 10:25 AM, bowen zhang wrote: Hi guys, The following link shows all the unresolved issues that will go into 4.1 release. If anyone has another ticket that needs to go into 4.1, please make the fix version "4.1.0". Issue Navigator - ASF JIRA Thanks, Bowen Issue Navigator - ASF JIRA Linked Applications Loading…… Dashboards Projects Issues Agile Help Online Help View on issues.apache.org Preview by Yahoo
Re: Review Request 24187: OOZIE-1958 address duplication of env variables in oozie.launcher.yarn.app.mapreduce.am.env when running with uber mode
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24187/#review49628 --- +1 pending minor comment about naming and checking via e-2-e test core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java <https://reviews.apache.org/r/24187/#comment86870> keep naming consistent i.e. launcherEnvMap and launcherEnvMapStr core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java <https://reviews.apache.org/r/24187/#comment86865> false formatting change.. but its ok to include core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java <https://reviews.apache.org/r/24187/#comment86866> same as above - Mona Chitnis On Aug. 1, 2014, 5:56 p.m., Ryota Egashira wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24187/ > --- > > (Updated Aug. 1, 2014, 5:56 p.m.) > > > Review request for oozie. > > > Bugs: OOZIE-1958 > https://issues.apache.org/jira/browse/OOZIE-1958 > > > Repository: oozie-git > > > Description > --- > > https://issues.apache.org/jira/browse/OOZIE-1958?filter=-1 > > > Diffs > - > > core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java > 94b55cf > > core/src/test/java/org/apache/oozie/action/hadoop/TestJavaActionExecutor.java > 72a137c > > Diff: https://reviews.apache.org/r/24187/diff/ > > > Testing > --- > > > Thanks, > > Ryota Egashira > >
Re: Thanks for fixing UTC.
No problem. OOZIE-1811 did fix root cause of non-flaky tests failing randomly but there's one which is timing sensitive and needs to be fixed org.apache.oozie.service.TestCallableQueueService.testConcurrencyReachedAndChooseNextEligible and OOZIE-1952 will fix TestPurgeXCommand which uses old StoreService code. And then we're done! :) Mona Chitnis Software Engineer, Hadoop Team Yahoo! On Friday, August 1, 2014 4:25 PM, Purshotam Shah wrote: Thanks Mona for fixing testcases. It feel so good to see no UTC failure. On 8/1/14, 4:13 PM, "Hadoop QA (JIRA)" wrote: > > [ >https://issues.apache.org/jira/browse/OOZIE-1939?page=com.atlassian.jira.p >lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083151#com >ment-14083151 ] > >Hadoop QA commented on OOZIE-1939: >-- > >Testing JIRA OOZIE-1939 > >Cleaning local git workspace > > > >{color:green}+1 PATCH_APPLIES{color} >{color:green}+1 CLEAN{color} >{color:green}+1 RAW_PATCH_ANALYSIS{color} >. {color:green}+1{color} the patch does not introduce any @author tags >. {color:green}+1{color} the patch does not introduce any tabs >. {color:green}+1{color} the patch does not introduce any trailing >spaces >. {color:green}+1{color} the patch does not introduce any line longer >than 132 >. {color:green}+1{color} the patch does adds/modifies 1 testcase(s) >{color:green}+1 RAT{color} >. {color:green}+1{color} the patch does not seem to introduce new RAT >warnings >{color:green}+1 JAVADOC{color} >. {color:green}+1{color} the patch does not seem to introduce new >Javadoc warnings >{color:green}+1 COMPILE{color} >. {color:green}+1{color} HEAD compiles >. {color:green}+1{color} patch compiles >. {color:green}+1{color} the patch does not seem to introduce new >javac warnings >{color:green}+1 BACKWARDS_COMPATIBILITY{color} >. {color:green}+1{color} the patch does not change any JPA >Entity/Colum/Basic/Lob/Transient annotations >. {color:green}+1{color} the patch does not modify JPA files >{color:green}+1 TESTS{color} >. Tests run: 1506 >{color:green}+1 DISTRO{color} >. {color:green}+1{color} distro tarball builds with the patch > > >{color:green}*+1 Overall result, good!, no -1s*{color} > > >The full output of the test-patch run is available at > >. https://builds.apache.org/job/oozie-trunk-precommit-build/1377/ > >> Incorrect job information is set while logging >> -- >> >> Key: OOZIE-1939 >> URL: https://issues.apache.org/jira/browse/OOZIE-1939 >> Project: Oozie >> Issue Type: Bug >> Reporter: Purshotam Shah >> Assignee: Azrael >> Attachments: OOZIE-1939.1.patch, OOZIE-1939.2.patch >> >> >> {code} >> 2014-07-16 17:28:06,422 DEBUG CoordChangeXCommand:545 >>[http-0.0.0.0-4443-5] - USER[hadoopqa] GROUP[users] TOKEN[] >>APP[coordB236] JOB[0011514-140716042555-oozie-oozi-C] ACTION[-] Acquired >>lock for [0011385-140716042555-oozie-oozi-C] in [coord_change] >> 2014-07-16 17:28:06,422 TRACE CoordChangeXCommand:548 >>[http-0.0.0.0-4443-5] - USER[hadoopqa] GROUP[users] TOKEN[] >>APP[coordB236] JOB[0011514-140716042555-oozie-oozi-C] ACTION[-] Load >>state for [0011385-140716042555-oozie-oozi-C] >> {code} >> {code} >> protected void loadState() throws CommandException { >> jpaService = Services.get().get(JPAService.class); >> if (jpaService == null) { >> LOG.error(ErrorCode.E0610); >> } >> try { >> coordJob = >>CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB_MATER >>IALIZE, jobId); >> prevStatus = coordJob.getStatus(); >> } >> catch (JPAExecutorException jex) { >> throw new CommandException(jex); >> } >> // calculate start materialize and end materialize time >> calcMatdTime(); >> LogUtils.setLogInfo(coordJob, logInfo); >> } >> {code} >> Most of the commands set jobinfo after loadstate, because of that few >>log statements ( like acquiring lock, load state) logs with previous >>jobinfo. > > > >-- >This message was sent by Atlassian JIRA >(v6.2#6252)
[jira] [Commented] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082590#comment-14082590 ] Mona Chitnis commented on OOZIE-1932: - okay thanks. will revise the order > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-addendum.patch, > OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: (was: OOZIE-1811-3.patch) > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch, > OOZIE-1811-3.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: OOZIE-1811-3.patch good catch! uploaded new patch > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch, > OOZIE-1811-3.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: OOZIE-1811-3.patch > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch, > OOZIE-1811-3.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: (was: OOZIE-1811-3.patch) > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch, > OOZIE-1811-3.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: OOZIE-1811-3.patch addressed review comments and fixed couple of classes missed in earlier patch - BatchQueryExecutor, SLA*QueryExecutors to be consistent > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch, > OOZIE-1811-3.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1939) Incorrect job information is set while logging
[ https://issues.apache.org/jira/browse/OOZIE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082317#comment-14082317 ] Mona Chitnis commented on OOZIE-1939: - Yet it will work with threadlocal params too. Fix was done to minimize overall change and just clear prefix and set it to what object the thread is handling now. same will apply with threadlocal params too > Incorrect job information is set while logging > -- > > Key: OOZIE-1939 > URL: https://issues.apache.org/jira/browse/OOZIE-1939 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Azrael > Attachments: OOZIE-1939.1.patch, OOZIE-1939.2.patch > > > {code} > 2014-07-16 17:28:06,422 DEBUG CoordChangeXCommand:545 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB236] > JOB[0011514-140716042555-oozie-oozi-C] ACTION[-] Acquired lock for > [0011385-140716042555-oozie-oozi-C] in [coord_change] > 2014-07-16 17:28:06,422 TRACE CoordChangeXCommand:548 [http-0.0.0.0-4443-5] - > USER[hadoopqa] GROUP[users] TOKEN[] APP[coordB236] > JOB[0011514-140716042555-oozie-oozi-C] ACTION[-] Load state for > [0011385-140716042555-oozie-oozi-C] > {code} > {code} > protected void loadState() throws CommandException { > jpaService = Services.get().get(JPAService.class); > if (jpaService == null) { > LOG.error(ErrorCode.E0610); > } > try { > coordJob = > CoordJobQueryExecutor.getInstance().get(CoordJobQuery.GET_COORD_JOB_MATERIALIZE, > jobId); > prevStatus = coordJob.getStatus(); > } > catch (JPAExecutorException jex) { > throw new CommandException(jex); > } > // calculate start materialize and end materialize time > calcMatdTime(); > LogUtils.setLogInfo(coordJob, logInfo); > } > {code} > Most of the commands set jobinfo after loadstate, because of that few log > statements ( like acquiring lock, load state) logs with previous jobinfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081820#comment-14081820 ] Mona Chitnis commented on OOZIE-1811: - {{. -1 the patch contains 2 line(s) with trailing spaces}} located and fixed in the xml file - {{coord-action-sla.xml}} > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1932: Attachment: OOZIE-1932-addendum.patch attaching simple change to initialize CallableQueueService at the very end so that its destroyed first in order. unit test TestBulkMonitorWebServiceAPI failed in my local machine but not able to determine if cause related to this change. I will let the pre-commit build test run decide > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932-addendum.patch, > OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078158#comment-14078158 ] Mona Chitnis commented on OOZIE-1811: - above failures due to strange network error on the host. Happened before at https://builds.apache.org/job/oozie-trunk-precommit-build/1363/ too. Ran the whole suit locally and only 1 failed, which I've mentioned is going to be part of OOZIE-1952. > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (OOZIE-1932) Services should load CallableQueueService after MemoryLocksService
[ https://issues.apache.org/jira/browse/OOZIE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis reopened OOZIE-1932: - Reopening issue to fix similar issue with URIHandlerService should be loaded before CallableQueueService, so that its closed before. This JIRA's scope to include a permanent fix to the services ordering to work for all cases, and avoid all NPEs and other issues with the services during server shutdown/startup > Services should load CallableQueueService after MemoryLocksService > -- > > Key: OOZIE-1932 > URL: https://issues.apache.org/jira/browse/OOZIE-1932 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1932-2.patch, OOZIE-1932.patch > > > This is not a problem during startup but is during shutdown, as services are > destroyed in reverse order of initialization. Hence, when MemoryLocksService > destroy sets it to null, and commands are still executing due to > CallableQueueService still active, they all encounter NPEs during locking. > This is a simple fix in oozie-default.xml to set MemoryLocksService before in > the order of services loading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: OOZIE-1811-2.patch updated patch to apply cleanly to trunk HEAD > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch, OOZIE-1811-2.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1811: Attachment: OOZIE-1811-1.patch attaching patch which fixes the QueryExecutors and TestSLAEventGeneration. Errors related to StoreService usage in tests can be fixed as part of overall StoreService fix in OOZIE-1952 > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > Attachments: OOZIE-1811-1.patch > > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (OOZIE-1952) Cleanup duplicate/obsolete code - Command, StoreService
Mona Chitnis created OOZIE-1952: --- Summary: Cleanup duplicate/obsolete code - Command, StoreService Key: OOZIE-1952 URL: https://issues.apache.org/jira/browse/OOZIE-1952 Project: Oozie Issue Type: Task Reporter: Mona Chitnis StoreService has been superceded by JPAService, and Command has been superceded by XCommand. These old classes have been lying around long enough and probably only referenced through unit tests, creating some confusion when tests have to be fixed for flaky failures -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074117#comment-14074117 ] Mona Chitnis commented on OOZIE-1811: - I'd suggest getting rid of the static reference to JPAService in each of the Query Executors. We can always get the reference to it from the Services singleton, while executing the query. By keeping another static reference and manipulating it through the constructor and destroy(), we run the risk of nullifying it inadvertently. This is why suddenly so many tests are becoming flaky and it is very tough to detect exact patterns or even fix tests in a foolproof way. I ran the whole suit with the static reference removed and only 2 tests failed - which is quite an improvement! {code} Results : Failed tests: testBundleId(org.apache.oozie.servlet.TestBulkMonitorWebServiceAPI): expected: but was: Tests in error: testSucCoordPurgeXCommand(org.apache.oozie.command.TestPurgeXCommand): E0604: Job does not exist [000-140724213655573-oozie-chit-C] {code} Test#2 here is failing with error StoreService cannot work without JPAService. We can replace usage of StoreService completely as it is superceded by JPAService anyway. Test #1 doesnt really have any error except random assert fail, and this test is not usually flaky so can ignore > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1944) Recursive variable resolution broken when same parameter name in config-default and action conf
[ https://issues.apache.org/jira/browse/OOZIE-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1944: Attachment: OOZIE-1944-2.patch adding null check for configDefault which was causing TestWorkflowAppParser tests to fail > Recursive variable resolution broken when same parameter name in > config-default and action conf > --- > > Key: OOZIE-1944 > URL: https://issues.apache.org/jira/browse/OOZIE-1944 > Project: Oozie > Issue Type: Bug > Components: workflow >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1944-1.patch, OOZIE-1944-2.patch > > > Hitting error > {code} > can not create DagEngine for submitting jobs > org.apache.oozie.DagEngineException: E0803: IO error, Variable > substitution depth too large: 20 ${param}/000 > {code} > when config-default.xml has > {{param=default}} > and action conf has > {code} > > ... > > > param > ${param}/000 > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1944) Recursive variable resolution broken when same parameter name in config-default and action conf
[ https://issues.apache.org/jira/browse/OOZIE-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1944: Attachment: OOZIE-1944-1.patch Attaching patch. approach is to switch from using XConfiguration.injectDefaults() method to copy(), since the former does a Configuration.get() which tries to recursively resolve params. So simply, copy over defaults, global, and finally action , in this order of precedence > Recursive variable resolution broken when same parameter name in > config-default and action conf > --- > > Key: OOZIE-1944 > URL: https://issues.apache.org/jira/browse/OOZIE-1944 > Project: Oozie > Issue Type: Bug > Components: workflow >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > Attachments: OOZIE-1944-1.patch > > > Hitting error > {code} > can not create DagEngine for submitting jobs > org.apache.oozie.DagEngineException: E0803: IO error, Variable > substitution depth too large: 20 ${param}/000 > {code} > when config-default.xml has > {{param=default}} > and action conf has > {code} > > ... > > > param > ${param}/000 > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1944) Recursive variable resolution broken when same parameter name in config-default and action conf
[ https://issues.apache.org/jira/browse/OOZIE-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1944: Fix Version/s: (was: trunk) > Recursive variable resolution broken when same parameter name in > config-default and action conf > --- > > Key: OOZIE-1944 > URL: https://issues.apache.org/jira/browse/OOZIE-1944 > Project: Oozie > Issue Type: Bug > Components: workflow >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: 4.1.0 > > > Hitting error > {code} > can not create DagEngine for submitting jobs > org.apache.oozie.DagEngineException: E0803: IO error, Variable > substitution depth too large: 20 ${param}/000 > {code} > when config-default.xml has > {{param=default}} > and action conf has > {code} > > ... > > > param > ${param}/000 > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1872) TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime is failing for past couple of builds
[ https://issues.apache.org/jira/browse/OOZIE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1872: Fix Version/s: (was: trunk) > TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime > is failing for past couple of builds > --- > > Key: OOZIE-1872 > URL: https://issues.apache.org/jira/browse/OOZIE-1872 > Project: Oozie > Issue Type: Bug > Components: tests >Affects Versions: trunk, 4.1.0 >Reporter: Rohini Palaniswamy > Fix For: 4.1.0 > > Attachments: OOZIE-1872-1.patch > > > https://builds.apache.org/job/oozie-trunk-precommit-build/1291/testReport/junit/org.apache.oozie.command.coord/TestCoordActionInputCheckXCommand/testActionInputCheckLatestActionCreationTime/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1872) TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime is failing for past couple of builds
[ https://issues.apache.org/jira/browse/OOZIE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1872: Component/s: tests > TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime > is failing for past couple of builds > --- > > Key: OOZIE-1872 > URL: https://issues.apache.org/jira/browse/OOZIE-1872 > Project: Oozie > Issue Type: Bug > Components: tests >Affects Versions: trunk, 4.1.0 >Reporter: Rohini Palaniswamy > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1872-1.patch > > > https://builds.apache.org/job/oozie-trunk-precommit-build/1291/testReport/junit/org.apache.oozie.command.coord/TestCoordActionInputCheckXCommand/testActionInputCheckLatestActionCreationTime/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1872) TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime is failing for past couple of builds
[ https://issues.apache.org/jira/browse/OOZIE-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1872: Attachment: OOZIE-1872-1.patch attaching fix to test case. Root causes of failure were * materialize command was directly queuing input-check command with zero delay, with no changes in action-actual-time taking effect. Hence actual time was not updated to desired value, instead remaining at 'current time' and giving wrong dependency results * explicitly invoked input-check command was failing precondition verification due to earlier command transitioning action to FAILED. * flakiness was due to timing issues of the direct vs explicit input-check commands > TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime > is failing for past couple of builds > --- > > Key: OOZIE-1872 > URL: https://issues.apache.org/jira/browse/OOZIE-1872 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk, 4.1.0 >Reporter: Rohini Palaniswamy > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1872-1.patch > > > https://builds.apache.org/job/oozie-trunk-precommit-build/1291/testReport/junit/org.apache.oozie.command.coord/TestCoordActionInputCheckXCommand/testActionInputCheckLatestActionCreationTime/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (OOZIE-1945) NPE in JaveActionExecutor#check()
Mona Chitnis created OOZIE-1945: --- Summary: NPE in JaveActionExecutor#check() Key: OOZIE-1945 URL: https://issues.apache.org/jira/browse/OOZIE-1945 Project: Oozie Issue Type: Bug Affects Versions: trunk Reporter: Mona Chitnis Priority: Trivial Fix For: trunk, 4.1.0 in method check() {code} String errorCode = props.getProperty("error.code"); if (errorCode.equals("0")) { errorCode = "JA018"; } if (errorCode.equals("-1")) { errorCode = "JA019"; } errorReason = props.getProperty("error.reason"); {code} if error.code is null, these leads to NPEs easy fix {code} if ("0".equals(errorCode)) ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1536) Coordinator action reruns start a new workflow
[ https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1536: Assignee: (was: Mona Chitnis) > Coordinator action reruns start a new workflow > -- > > Key: OOZIE-1536 > URL: https://issues.apache.org/jira/browse/OOZIE-1536 > Project: Oozie > Issue Type: Improvement >Reporter: Srikanth Sundarrajan > > Coordinator action reruns start a new workflow and if existing workflow for > the action is in running state, the same is not checked. Coord rerun can > possibly do a workflow re-run to prevent this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1944) Recursive variable resolution broken when same parameter name in config-default and action conf
[ https://issues.apache.org/jira/browse/OOZIE-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1944: Fix Version/s: 4.1.0 > Recursive variable resolution broken when same parameter name in > config-default and action conf > --- > > Key: OOZIE-1944 > URL: https://issues.apache.org/jira/browse/OOZIE-1944 > Project: Oozie > Issue Type: Bug > Components: workflow >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk, 4.1.0 > > > Hitting error > {code} > can not create DagEngine for submitting jobs > org.apache.oozie.DagEngineException: E0803: IO error, Variable > substitution depth too large: 20 ${param}/000 > {code} > when config-default.xml has > {{param=default}} > and action conf has > {code} > > ... > > > param > ${param}/000 > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (OOZIE-1944) Recursive variable resolution broken when same parameter name in config-default and action conf
Mona Chitnis created OOZIE-1944: --- Summary: Recursive variable resolution broken when same parameter name in config-default and action conf Key: OOZIE-1944 URL: https://issues.apache.org/jira/browse/OOZIE-1944 Project: Oozie Issue Type: Bug Components: workflow Affects Versions: trunk Reporter: Mona Chitnis Assignee: Mona Chitnis Fix For: trunk Hitting error {code} can not create DagEngine for submitting jobs org.apache.oozie.DagEngineException: E0803: IO error, Variable substitution depth too large: 20 ${param}/000 {code} when config-default.xml has {{param=default}} and action conf has {code} ... param ${param}/000 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: OOZIE-1933-unit-tests-fix.patch updated patch for cleanly apply to trunk > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > OOZIE-1933-unit-tests-fix.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: (was: sla_unit_tests-1.patch) > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > OOZIE-1933-unit-tests-fix.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis resolved OOZIE-1933. - Resolution: Fixed failing unit tests fix committed to trunk after review. Thanks! > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > OOZIE-1933-unit-tests-fix.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: (was: sla_unit_tests.patch) > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > OOZIE-1933-unit-tests-fix.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1811) Current test failures in trunk
[ https://issues.apache.org/jira/browse/OOZIE-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066771#comment-14066771 ] Mona Chitnis commented on OOZIE-1811: - {{org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand.testActionInputCheckLatestCurrentTime}} also failing because JPAService null. Same class test but using latest calculation with rest to action creation time (old behavior) {{org.apache.oozie.command.coord.TestCoordActionInputCheckXCommand.testActionInputCheckLatestActionCreationTime}} however, failing with a dependency mismatch problem - OOZIE-1872 > Current test failures in trunk > -- > > Key: OOZIE-1811 > URL: https://issues.apache.org/jira/browse/OOZIE-1811 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Mona Chitnis >Priority: Critical > > There's a bunch of test failures currently in trunk; I'm not sure what > commit(s) is the cause, but I think it was somewhat recent. > e.g. https://builds.apache.org/job/oozie-trunk-precommit-build/1199/ > Reproducible by running these tests, instead of having to run them all, which > takes a lot longer :) > {noformat} > mvn clean test > -Dtest=TestSubWorkflowActionExecutor,TestBunldeChangeXCommand,TestCoordUpdateXCommand,TestCoordJobQueryExecutor,TestStatusTransitService,TestSLAEventGeneration > {noformat} > {noformat} > Results : > Failed tests: > testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration): > expected:<...11921-oozie-rkan-C@1[]> but was:<...11921-oozie-rkan-C@1[2]> > > testCoordStatusTransitServiceDoneWithError(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > > testBundleStatusTransitRunningFromKilled(org.apache.oozie.service.TestStatusTransitService): > expected: but was: > Tests in error: > testGetList(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > testInsert(org.apache.oozie.executor.jpa.TestCoordJobQueryExecutor) > Tests run: 62, Failures: 3, Errors: 2, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: sla_unit_tests-1.patch updated patch to include another broken testcase. All other failed tests pass locally and are known to be flaky Test run: {code} Results : Failed tests: testCoordinatorActionCommandsSubmitAndStart(org.apache.oozie.sla.TestSLAEventGeneration) testRecovery(org.apache.oozie.action.hadoop.TestJavaActionExecutor): expected:<[SUCCEED]ED> but was:<[FAILED/KILL]ED> testCoordStatusTransitServiceBackwardSupport(org.apache.oozie.service.TestStatusTransitService) Tests in error: testOnJobEvent(org.apache.oozie.sla.TestSLAJobEventListener): invalid child id [wa1] testActionReuseWfJobAppPath(org.apache.oozie.command.wf.TestActionStartXCommand): E0607: Other error in operation [action.start], null testWorkflowRun(org.apache.oozie.command.wf.TestLastModified): org.apache.oozie.DagEngineException: E0607: Other error in operation [start], null testSucJobPurgeXCommand(org.apache.oozie.command.TestPurgeXCommand): E0604: Job does not exist [001-140717193440158-oozie-chit-W] testSucCoordPurgeXCommand(org.apache.oozie.command.TestPurgeXCommand): E0604: Job does not exist [000-140717193442386-oozie-chit-C] {code} > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > sla_unit_tests-1.patch, sla_unit_tests.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: sla_unit_tests.patch > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch, > sla_unit_tests.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis reopened OOZIE-1933: - adding test cases broken by the patch > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis resolved OOZIE-1933. - Resolution: Fixed committed to trunk. thanks for review! > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (OOZIE-1938) Fork-join job does not execute join node sometimes during HA failover
[ https://issues.apache.org/jira/browse/OOZIE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064458#comment-14064458 ] Mona Chitnis commented on OOZIE-1938: - More context - all actions are completed, some via server 1 others via server 2. 1) Checking the SignalXCommand code also against the WF_ACTIONS table for all actions for this job, all of them have pending=0. This probably explains why they weren't recovered by ActionCheckerRunnable. 2) As each forked action finishes, two signals are sent - signal value OK and signal value :sync:. The 'sync' is needed to maintain the fork-join count, so increment on initial forks sending signal :sync:, and then decrement on joins sending signal :sync:. I think because of the time when one of the servers was down, these :sync:'s were lost or failed to get processed. We dont see this problem in a different scenario when both servers were up before actions finished and started signaling :sync:. Not very confident about changing the way we handle the :sync:, so would like to discuss the best approach here. The easier approach would be to set the action's pending flag in this process so that recovery will pick up action and help restore correct :sync: count. Feedback/corrections? > Fork-join job does not execute join node sometimes during HA failover > - > > Key: OOZIE-1938 > URL: https://issues.apache.org/jira/browse/OOZIE-1938 > Project: Oozie > Issue Type: Bug > Components: HA >Affects Versions: trunk >Reporter: Mona Chitnis > Fix For: trunk > > > Reported by [~mchiang]. > Scenario: (2 Oozie HA servers) > 21:38:56 submit job at oozie client > 21:41:42 shut down server1 > 21:46:52 shut down server2 > 21:47:30 start server1 > 22:15:05 start server2 > the last fork path end time is 21:52:53. > 22:36:48 the job is still RUNNING, not moving to join node. > Digging into the logs, the locking part seems to work fine with forked action > processing distributed amongst the two servers when both running or when one > of them is down. The issue seems to be why even RecoveryService fails to pick > up the job after all the forks had completed -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: OOZIE-1933-4-1.patch > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch, OOZIE-1933-4-1.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23524: Logging improvements (amendment to OOZIE-1911) + OOZIE-1933
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23524/ --- (Updated July 16, 2014, 10:47 p.m.) Review request for oozie. Changes --- addressed comments Bugs: OOZIE-1933 https://issues.apache.org/jira/browse/OOZIE-1933 Repository: oozie-git Description --- See JIRA Diffs (updated) - core/src/main/java/org/apache/oozie/service/EventHandlerService.java 6c075ab core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 5349b33 core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 5b30fc0 core/src/main/java/org/apache/oozie/util/LogUtils.java 723ac36 core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java ffb25e7 core/src/test/java/org/apache/oozie/service/TestHASLAService.java 419e98b core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java 438f2c2 core/src/test/java/org/apache/oozie/sla/TestSLAService.java 205bcd1 core/src/test/java/org/apache/oozie/test/XTestCase.java 6bf0a8f Diff: https://reviews.apache.org/r/23524/diff/ Testing --- added new tests and checked existing ones pass Thanks, Mona Chitnis
[jira] [Updated] (OOZIE-1933) SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
[ https://issues.apache.org/jira/browse/OOZIE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mona Chitnis updated OOZIE-1933: Attachment: OOZIE-1933-3.patch attaching patch reviewed and updated from ReviewBoard > SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs > - > > Key: OOZIE-1933 > URL: https://issues.apache.org/jira/browse/OOZIE-1933 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk > Reporter: Mona Chitnis >Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1933-3.patch > > > SLACalculatorMemory.addJobStatus() > {code} > else { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get(SLASummaryQuery.GET_SLA_SUMMARY, > jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > {code} > Because of SLA Listener, job notification event triggers this even for jobs > with no SLA configured - leading to NPE in the SLACalcStatus constructor and > annoying exception stacktraces in logs > Patch to also include log prefix addition to some SLACalculator log line -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23524: Logging improvements (amendment to OOZIE-1911) + OOZIE-1933
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23524/ --- (Updated July 16, 2014, 8:09 p.m.) Review request for oozie. Changes --- addressed review comments. checked all tests (new+existing ones) pass Bugs: OOZIE-1933 https://issues.apache.org/jira/browse/OOZIE-1933 Repository: oozie-git Description --- See JIRA Diffs (updated) - core/src/main/java/org/apache/oozie/service/EventHandlerService.java 6c075ab core/src/main/java/org/apache/oozie/sla/SLACalcStatus.java 5349b33 core/src/main/java/org/apache/oozie/sla/SLACalculatorMemory.java 5b30fc0 core/src/main/java/org/apache/oozie/util/LogUtils.java 723ac36 core/src/test/java/org/apache/oozie/service/TestEventHandlerService.java ffb25e7 core/src/test/java/org/apache/oozie/sla/TestSLACalculatorMemory.java 438f2c2 core/src/test/java/org/apache/oozie/sla/TestSLAService.java 205bcd1 core/src/test/java/org/apache/oozie/test/XTestCase.java 6bf0a8f Diff: https://reviews.apache.org/r/23524/diff/ Testing --- added new tests and checked existing ones pass Thanks, Mona Chitnis