Thanks Robert for the ideas. Some explanation: we rerun all the tests within a test class in case there is a failing test case because surefire does not allow running individual test cases from multiple test classes (see bin/test-patch-20-tests). In other words: passing -Dtest=Test1#testMethod1,Test2#testMethod2 to mvn test does not work, while -DTest1,Test2 works at the cost of running all other tests in Test1 an Test2.
Clean up before and after tests is a good idea. In fact, I have seen multiple undeleted test resources (after mvn clean too): core/activemq-data/ core/dist2.txt core/dist3.txt core/distcp-log4j.properties core/distcp-oozie-1514391729048.log core/dst1.txt core/test-invalid-workflow-app.xml core/test-workflow-app.xml I filed some new JIRA-s: - OOZIE-3145 TestDistcpMain shall remove created files after test execution - OOZIE-3148 Rerun Failing Tests through Maven surefire ( Right now I am running again mkdistro.sh with -Dsurefire.rerunFailingTestsCount=2. I will look into mini cluster logs if there are failures.) Unfortunately, there are some known flaky tests in Oozie (tracked by OOZIE-3111 umbrella JIRA). I am not sure if we need to fix all of them before the first 5.0.0-beta1 release candidate; just out of curiosity, I tried to run bin/mkdistro.sh on release-4.3.0 and some tests failed too. - Attila On Fri, Dec 22, 2017 at 1:20 AM, Robert Kanter <rkan...@cloudera.com> wrote: > I took a look at the latest PreCommit job > <https://builds.apache.org/job/PreCommit-OOZIE-Build/288/consoleFull> and > it reported 55 rerun tests. > Tests rerun: 55 > Tests failed at first run: > org.apache.oozie.action.hadoop.TestJavaActionExecutor, > > However, looking through the actual output, I only see 1 test that failed > (and was rerun): TestJavaActionExecutor. testCredentialsSkip. > > [INFO] Running org.apache.oozie.action.hadoop.TestJavaActionExecutor > [ERROR] Tests run: 55, Failures: 0, Errors: 1, Skipped: 0, Time > elapsed: 126.213 s <<< FAILURE! - in > org.apache.oozie.action.hadoop.TestJavaActionExecutor > [ERROR] testCredentialsSkip(org.apache.oozie.action.hadoop. > TestJavaActionExecutor) > Time elapsed: 0.532 s <<< ERROR! > org.apache.oozie.action.ActionExecutorException: JA020: Could not load > credentials of type [abc] with name [abcname]]; perhaps it was not > defined in oozie-site.xml? > at org.apache.oozie.action.hadoop.TestJavaActionExecutor. > _testCredentialsSkip(TestJavaActionExecutor.java:1106) > at org.apache.oozie.action.hadoop.TestJavaActionExecutor. > testCredentialsSkip(TestJavaActionExecutor.java:1006) > > > In fact, the report only lists the one test class, not 55 of them. So I > think there's something wrong with our reporting here. > > Anyway, typically, when we see test that succeed on their own but fail when > run all together, that means that (likely some other) test is not cleaning > up properly. This is unfortunately tricky to debug because it's hard to > figure out what the other test is. A long time ago, a big example of this > problem was not properly shutting down the Services singleton, so we'd have > duplicates and other issues. > > For these specific issues, some hints: > # RUNNINGWITHERROR: This problem means that a yarn job in the mini cluster > has failed. To find out why, you should be able to dig out the app id from > the test output, and then find it's yarn logs somewhere (there's a > minicluster logs dir, but I forget where). That'll hopefully make it > obvious what's going on. > # Credentials: There's probably an oozie-site or Configuration class > leaking from somewhere or not properly cleaned up by a previous test or > setup by this test. The Credentials class is missing. > # TestJMSAccessorService: Sounds like something didn't get cleaned up. > > One way that might be easier to fix this is to have the setUp methods > ensure that things are clean just in case. We actually have a number of > tests that do things like this too. > > > - Robert > > > > On Thu, Dec 21, 2017 at 12:40 PM, Attila Sasvari <asasv...@cloudera.com> > wrote: > > > Update: > > - bin/mkdistro.sh fails because there are test failures. > > [ERROR] Failures: > > > > > > [ERROR] > > TestCoordActionsKillXCommand.testActionKillCommandActionNumbers:96 > > expected:<RUNNING> but was:<RUNNINGWITHERROR> > > > > > > [ERROR] Errors: > > > > > > [ERROR] > > TestJavaActionExecutor.testCredentialsSkip:1006->_ > testCredentialsSkip:1106 > > ? ActionExecutor > > > > [ERROR] TestJMSAccessorService.testConnectionRetryExceptionLi > stener:211 > > ? > > InstanceAlreadyExists > > > > > > - if I run the tests separately, they pass. Looking at the latest > precommit > > builds (https://builds.apache.org/job/PreCommit-OOZIE-Build), it turns > out > > that a lot of tests had to be re-executed to get a +1 for the TESTS part. > > Problem is that the Oozie tests have impact on each other, and it looks > > like the execution order matters too. > > - uploaded work in progress (SNAPSHOT) artifacts here: > > http://people.apache.org/~asasvari/oozie-5.0.0-beta1-SNAPSHOT/ > > - I plan to update the "How To Release" page as it contains some errors > > (e.g sftp shall be used to upload artifacts) > > - agreed with Artem that OOZIE-2231 will slip to 5.0.0 > > > > Regards, > > - Attila > > > > On Mon, Dec 18, 2017 at 11:47 PM, Attila Sasvari <asasv...@cloudera.com> > > wrote: > > > > > Hi everyone, > > > > > > branch-5.0.0-beta1 has been created. > > > > > > Regards, > > > Attila > > > > > > On Mon, Dec 18, 2017 at 7:25 PM, Attila Sasvari <asasv...@cloudera.com > > > > > wrote: > > > > > >> Thanks gp. I will follow the steps described on > > >> https://cwiki.apache.org/confluence/display/OOZIE/How+To+Release > during > > >> the process. > > >> > > >> Next steps: > > >> - A new branch is about to be created from master. > > >> > > >> Artem, many thanks. I will review and commit that patch if everything > is > > >> okay. I don't see any problem with including it in 5.0.0-beta1. > > >> > > >> On Mon, Dec 18, 2017 at 4:32 PM, Artem Ervits <artemerv...@gmail.com> > > >> wrote: > > >> > > >>> just uploaded patch for OOZIE-2231. > > >>> > > >>> On Mon, Dec 18, 2017 at 10:04 AM, Peter Cseh <gezap...@cloudera.com> > > >>> wrote: > > >>> > > >>> > Hey Attila, > > >>> > > > >>> > I won't be able to work on the release for a couple weeks now. > > >>> > Thanks for getting the release rolling! > > >>> > > > >>> > Cheers, > > >>> > gp > > >>> > > > >>> > > > >>> > On Mon, Dec 18, 2017 at 4:00 PM, Attila Sasvari < > > asasv...@cloudera.com > > >>> > > > >>> > wrote: > > >>> > > > >>> > > Hi everyone, > > >>> > > > > >>> > > I would like to create the release branch, branch-5.0.0-beta1 > > >>> (following > > >>> > > Hadoop release versioning), earlier. > > >>> > > > > >>> > > Looking at https://issues.apache.org/ > jira/projects/OOZIE/versions/ > > >>> > 12342048 > > >>> > > there are 3 issues in progress (OOZIE-2231, OOZIE-2942, > OOZIE-2974) > > >>> and 3 > > >>> > > issues to do (OOZIE-2600, OOZIE-3093, OOZIE-1987). I will push > > those > > >>> out > > >>> > to > > >>> > > 5.0.0 if there are no objections. > > >>> > > > > >>> > > At the same time, I am volunteering to be the release manager if > > >>> Peter > > >>> > Cseh > > >>> > > does not mind. > > >>> > > > > >>> > > Regards, > > >>> > > Attila > > >>> > > > > >>> > > On Wed, Dec 6, 2017 at 8:28 PM, Robert Kanter < > > rkan...@cloudera.com> > > >>> > > wrote: > > >>> > > > > >>> > > > Sounds good to me! > > >>> > > > > > >>> > > > On Wed, Dec 6, 2017 at 5:17 AM, Andras Piros < > > >>> > andras.pi...@cloudera.com> > > >>> > > > wrote: > > >>> > > > > > >>> > > > > Good idea Gezapeti! > > >>> > > > > > > >>> > > > > Time to wrap things up towards a stable 5.0.0 - a release > > >>> candidate > > >>> > on > > >>> > > > > 5.0.0b1 is a good first step. > > >>> > > > > > > >>> > > > > Since other components that Oozie uses like Pig and Hive do > not > > >>> > (fully) > > >>> > > > > support Hadoop 3, we have to wait with OOZIE-2973 > > >>> > > > > <https://issues.apache.org/jira/browse/OOZIE-2973>. > > >>> > > > > > > >>> > > > > Thanks, > > >>> > > > > > > >>> > > > > Andras > > >>> > > > > > > >>> > > > > On Wed, Dec 6, 2017 at 1:50 PM, Peter Cseh < > > >>> gezap...@cloudera.com> > > >>> > > > wrote: > > >>> > > > > > > >>> > > > > > Hi everyone! > > >>> > > > > > > > >>> > > > > > Now that OOZIE-2969 <https://issues.apache.org/ > > >>> > > jira/browse/OOZIE-2969> > > >>> > > > > is > > >>> > > > > > committed I'd like to start the process of creating the > > branch > > >>> for > > >>> > > > > 5.0.0b1 > > >>> > > > > > and building a release from there. > > >>> > > > > > It's unfortunate that we won't be able to support Hadoop 3 > in > > >>> the > > >>> > > beta > > >>> > > > > for > > >>> > > > > > reasons described in OOZIE-2973 > > >>> > > > > > <https://issues.apache.org/jira/browse/OOZIE-2973> > > >>> > > > > > I don't see any more blockers for the beta1 and I hope we > > won't > > >>> > find > > >>> > > > > > hard-to-fix major issues so we can release Oozie 5.0.0 in > > early > > >>> > 2018. > > >>> > > > > > Please let me know if you have any suggestions. > > >>> > > > > > Thanks > > >>> > > > > > gp > > >>> > > > > > > > >>> > > > > > -- > > >>> > > > > > Peter Cseh > > >>> > > > > > Software Engineer > > >>> > > > > > <http://www.cloudera.com> > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > -- > > >>> > > -- > > >>> > > Attila Sasvari > > >>> > > Software Engineer > > >>> > > <http://www.cloudera.com/> > > >>> > > > > >>> > > > >>> > > > >>> > > > >>> > -- > > >>> > Peter Cseh > > >>> > Software Engineer > > >>> > <http://www.cloudera.com> > > >>> > > > >>> > > >> > > >> > > >> > > >> -- > > >> -- > > >> Attila Sasvari > > >> Software Engineer > > >> <http://www.cloudera.com/> > > >> > > > > > > > > > > > > -- > > > -- > > > Attila Sasvari > > > Software Engineer > > > <http://www.cloudera.com/> > > > > > > > > > > > -- > > -- > > Attila Sasvari > > Software Engineer > > <http://www.cloudera.com/> > > > -- -- Attila Sasvari Software Engineer <http://www.cloudera.com/>