[ https://issues.apache.org/jira/browse/OOZIE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949467#comment-14949467 ]
Rohini Palaniswamy commented on OOZIE-2385: ------------------------------------------- +1 > org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming is > flakey > ------------------------------------------------------------------------------- > > Key: OOZIE-2385 > URL: https://issues.apache.org/jira/browse/OOZIE-2385 > Project: Oozie > Issue Type: Bug > Components: tests > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Fix For: trunk > > Attachments: OOZIE-2385.001.patch > > > {{org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming}} is > flakey. > It can fail with an error like this: > {noformat} > junit.framework.ComparisonFailure: expected:<Tue Oct 06 16:27:2[5] PDT 2015> > but was:<Tue Oct 06 16:27:2[4] PDT 2015> > at junit.framework.Assert.assertEquals(Assert.java:85) > at junit.framework.Assert.assertEquals(Assert.java:91) > at > org.apache.oozie.TestCoordinatorEngineStreamLog.__CLR4_0_38acukov0f(TestCoordinatorEngineStreamLog.java:202) > at > org.apache.oozie.TestCoordinatorEngineStreamLog.testCoordLogStreaming(TestCoordinatorEngineStreamLog.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:24) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > This was tricky to track down. The problem is essentially that the filtering > API uses minute granularity but the actions are seconds, or even milliseconds > apart, so there are some edge cases here. The code takes minute granularity, > goes back to milliseconds when querying the DB, and then uses seconds when > checking that the test worked correctly. > This can lead to a situation like this example: > \@1 created/nominal time: Oct 06 16:27:24 <and some milliseconds> > \@2 created/nominal time: Oct 06 16:27:25 <and some milliseconds> > The test passes \@2's created time as the starting time argument to the > filter, converted with {{DateUtill's}} UTC MASK (i.e. minutes granularity): > - Oct 06 16:27 > This gets parsed back to a Date object, which seconds/milliseconds assumed to > be 0's > - Oct 06 16:27:00 0000 > This gets turned into a TimeStamp object, and used in the DB filtering query > - gets all actions with created time > Oct 06 16:27:00, which includes \@1 > CoordinatorEngine sorts by action ID (i.e. \@#) and uses created time of > smallest, which is \@1 > - Oct 06 16:27:24 > The test checks that \@2’s created time is equal to previously determined > time from filter (i.e. \@1’s created time) > - Does a toString(), which uses seconds —-> not equal > Note: The filter querying is actually done against the nominal times, but the > example is simpler without all that and conveys the same idea. > The actions get created within 100s of milliseconds during the test (at least > on my machine), so most of them time, they have the same seconds, and the > toString() comparison is equal. When the timing works out that the first and > second action are created at different clock seconds (even if less than a > second apart), it fails. I couldn’t reproduce this on my machine, but is > more likely to happen on the generally slower Jenkins machines we run because > the time between action creations will be longer. > Basically the problem is that the filtering works at a minute granularity but > the actions are seconds, if not milliseconds, apart. In the real world this > isn’t a big deal because the API takes minutes and we simply use that, but to > verify the test is working we should simply make the actions minutes apart. > To fix this, we can just directly insert the job and actions into the DB > instead of actually running it so we can space the actions apart by minutes; > this should also make the test faster as well (the test takes about 6 minutes > on my machine as-is, but only 1 minute and 30 seconds with the fix). -- This message was sent by Atlassian JIRA (v6.3.4#6332)