On 1/13/15 2:33 PM, sebb wrote: > On 13 January 2015 at 21:26, Thomas Neidhart <thomas.neidh...@gmail.com> > wrote: >> On 01/13/2015 09:01 PM, Phil Steitz wrote: >>> On 1/12/15 3:21 PM, Thomas Neidhart wrote: >>>> On 01/12/2015 11:17 PM, Phil Steitz wrote: >>>>> On 1/12/15 2:30 PM, Thomas Neidhart wrote: >>>>>> On 01/12/2015 10:26 PM, Thomas Neidhart wrote: >>>>>>> On 01/12/2015 08:09 PM, Phil Steitz wrote: >>>>>>>> On 1/12/15 11:37 AM, sebb wrote: >>>>>>>>> On 12 January 2015 at 18:11, Phil Steitz <phil.ste...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> On 1/12/15 10:50 AM, sebb wrote: >>>>>>>>>>> On 11 January 2015 at 22:10, Phil Steitz <phil.ste...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>> On 1/11/15 11:19 AM, Phil Steitz wrote: >>>>>>>>>>>>> On 1/10/15 10:49 PM, Phil Steitz wrote: >>>>>>>>>>>>>> On 1/9/15 6:09 PM, sebb wrote: >>>>>>>>>>>>>>> On 10 January 2015 at 01:01, Phil Steitz >>>>>>>>>>>>>>> <phil.ste...@gmail.com> wrote: >>>>>>>>>>>>>>>> On 1/9/15 5:32 PM, sebb wrote: >>>>>>>>>>>>>>>>> On 9 January 2015 at 23:48, sebb <seb...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> Of the last 6 runs, only 1 had a problem with unit test >>>>>>>>>>>>>>>>>> failures. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> All the builds ran on ubuntu3, apart from the failure which >>>>>>>>>>>>>>>>>> ran on H10. >>>>>>>>>>>>>>>>>> This may have some bearing on the result; I don't yet know. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I had a quick look at 2 tests that failed: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfect >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfectNegative >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Although the test case has some instance data, these >>>>>>>>>>>>>>>>>> particular tests >>>>>>>>>>>>>>>>>> do not use any, so it does not look like a concurrency issue >>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>> unit test itself. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The SimpleRegression class has mutable instance data, but >>>>>>>>>>>>>>>>>> the test >>>>>>>>>>>>>>>>>> cases create their own instance. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I don't know anything about the math functions involved, but >>>>>>>>>>>>>>>>>> it looks >>>>>>>>>>>>>>>>>> as though Infinity might result from getSignificance() if >>>>>>>>>>>>>>>>>> getSlopeStdErr() returns 0, as the latter is used as a >>>>>>>>>>>>>>>>>> divisor. Or if >>>>>>>>>>>>>>>>>> the field sumXX is 0 because that is also used as a divisor. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Maybe the H10 host has different floating point hardware? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'll try running some more tests on H10. >>>>>>>>>>>>>>>>> the build failed again on H10; exactly the same tests failed >>>>>>>>>>>>>>>>> as before: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This test: >>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math%20H10/1/console >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Previous failure: >>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math/14/console >>>>>>>>>>>>>>>> This is actually a bug. Thanks, sebb (and Jenkins)! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Has been here since 1.x. What is going on is that the data >>>>>>>>>>>>>>>> sets >>>>>>>>>>>>>>>> used in the test cases are set up to be perfect linear >>>>>>>>>>>>>>>> relationships, which should in fact lead to mean square error >>>>>>>>>>>>>>>> (and >>>>>>>>>>>>>>>> hence slope standard error) equal to 0. The Jenkins box must >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> getting exact 0. The funny thing is the test is there to >>>>>>>>>>>>>>>> validate >>>>>>>>>>>>>>>> correct performance for models like this. Its success >>>>>>>>>>>>>>>> unfortunately >>>>>>>>>>>>>>>> depends on poor precision. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will open a JIRA for this. I don't think it is a release >>>>>>>>>>>>>>>> blocker >>>>>>>>>>>>>>>> for 3.4.1, as I am sure you would get the same thing in any >>>>>>>>>>>>>>>> earlier >>>>>>>>>>>>>>>> version of [math]. >>>>>>>>>>>>>>> OK good to know. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'll leave the H10 Jenkins job for now to make it easy to >>>>>>>>>>>>>>> retest. >>>>>>>>>>>>>> My first guess here was wrong. The infinities are being handled >>>>>>>>>>>>>> correctly for the JDKs I have. Something must be going awry in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> t distribution cumulative probability computation for +INF on the >>>>>>>>>>>>>> box that is failing. Is there a way to find out exactly what JDK >>>>>>>>>>>>>> and OS version are being used? >>>>>>>>>>>>> I just committed a test that tests the t distribution computations >>>>>>>>>>>>> directly. It seems to have run clean; but the other test ran >>>>>>>>>>>>> clean >>>>>>>>>>>>> too. Is there any way to force the build to use the host that >>>>>>>>>>>>> fails? >>>>>>>>>>>> I can't make any sense of what is going on with the Jenkins builds. >>>>>>>>>>>> Clean runs and then lots of errors. This one explains the >>>>>>>>>>>> SimpleRegression "problem" (which is not a problem with that class >>>>>>>>>>>> at least) >>>>>>>>>>>> >>>>>>>>>>>> testCumulativeProbablilityExtremes(org.apache.commons.math3.distribution.TDistributionTest) >>>>>>>>>>>> Time elapsed: 0.001 sec <<< FAILURE! >>>>>>>>>>>> java.lang.AssertionError: expected:<1.0> but was:<-Infinity> >>>>>>>>>>>> at org.junit.Assert.fail(Assert.java:88) >>>>>>>>>>>> at org.junit.Assert.failNotEquals(Assert.java:743) >>>>>>>>>>>> at org.junit.Assert.assertEquals(Assert.java:494) >>>>>>>>>>>> at org.junit.Assert.assertEquals(Assert.java:592) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.commons.math3.distribution.TDistributionTest.testCumulativeProbablilityExtremes(TDistributionTest.java:109) >>>>>>>>>>>> >>>>>>>>>>>> Earlier runs this ran clean. There is nothing non-deterministic >>>>>>>>>>>> about this test (or quite a few of the others that randomly seem >>>>>>>>>>>> to fail). >>>>>>>>>>>> >>>>>>>>>>>> I wonder if we have a bad cpu or something somewhere. >>>>>>>>>>> AFAICS all the failed builds ran on H10. >>>>>>>>>>> >>>>>>>>>>> IMO it is consistent; the apparent randomness comes from the fact >>>>>>>>>>> the >>>>>>>>>>> there are several Ubuntu hosts, including H10. >>>>>>>>>> Am I reading it / looking at the wrong one, or did this one succeed? >>>>>>>>>> >>>>>>>>>> https://builds.apache.org/view/All/job/Commons%20Math%20H10/6/ >>>>>>>>>> >>>>>>>>>> That one was right after I added tests confirming that the t >>>>>>>>>> distribution cum prob handles INFs correctly. >>>>>>>>> That did run on H10 and did succeed; I'd not noticed that one before. >>>>>>>>> >>>>>>>>> I think it is still true that the failures have only occurred on H10. >>>>>>>>> >>>>>>>>> However, the latest one is failing: >>>>>>>>> >>>>>>>>> https://builds.apache.org/job/Commons%20Math/24/console >>>>>>>>> >>>>>>>>> This is on H11 - I think that's the first time H11 has been used. >>>>>>>>> >>>>>>>>> I suppose it's possible that H10 and H11 have a common failing, but it >>>>>>>>> seems less likely. >>>>>>>>> >>>>>>>>> I added a bit more debug - showing the value of sumXX - but that seems >>>>>>>>> OK on H11. >>>>>>>>> >>>>>>>>> I just added a bit more debug. >>>>>>>> I am pretty sure the SimpleRegressionTest failure is actually cause >>>>>>>> by the same thing causing the t-distribution test to fail (the >>>>>>>> reason I added that one). >>>>>>>> >>>>>>>> One that is more straightforward to chase is this one, which fails >>>>>>>> pretty consistently when "bad things happen" >>>>>>>> >>>>>>>> testExpInf(org.apache.commons.math3.complex.ComplexTest) Time >>>>>>>> elapsed: 0.001 sec <<< FAILURE! >>>>>>>> java.lang.AssertionError: expected:<0.0> but was:<Infinity> >>>>>>>> at org.junit.Assert.fail(Assert.java:88) >>>>>>>> at org.junit.Assert.failNotEquals(Assert.java:743) >>>>>>>> at org.junit.Assert.assertEquals(Assert.java:494) >>>>>>>> at org.junit.Assert.assertEquals(Assert.java:592) >>>>>>>> at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:76) >>>>>>>> at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:84) >>>>>>>> at >>>>>>>> org.apache.commons.math3.complex.ComplexTest.testExpInf(ComplexTest.java:788) >>>>>>>> >>>>>>>> I would wager that what is going on here is 0.0 * -INF = INF. >>>>>>> The output returned by the debug statements added by sebb is: >>>>>>> >>>>>>> expReal=Infinity >>>>>>> cosImag=0.5403023058681398 >>>>>>> sinImag=0.8414709848078965 >>>>>>> result=(Infinity, Infinity) >>>>>>> >>>>>>> while expReal should be -Infinity. >>>>>>> >>>>>>> of course, Math.exp(Infinity) = Infinity. >>>>>> oh stupid mistake, please forget my last post. >>>>>> I messed up expReal with the actual real value. >>>>> But it should be 0, since expReal should be exp(-INF) >>>> just added a few more debug output to the test and the result is: >>>> >>>> real=-Infinity >>>> -real=2147483647 >>>> expReal=Infinity >>>> >>>> according to FastMath.exp(), with these values, the code path should be >>>> as follows: >>>> >>>> if (x < 0.0) { >>>> intVal = (int) -x; >>>> >>>> if (intVal > 746) { >>>> if (hiPrec != null) { >>>> hiPrec[0] = 0.0; >>>> hiPrec[1] = 0.0; >>>> } >>>> --> return 0.0; >>>> } >>>> >>>> >>>> but obviously it doesn't do this. I guess we can only inspect the >>>> generated class files for a potential compiler bug. >>> I did a little more poking about last night in the failed tests and >>> the ones I spot-checked could all have had to do with incorrect >>> computations of exp(-INF). What is strange is that the cast you >>> show above is working correctly (compliant with JLS) and the code >>> path should be as you have it there. It seems very strange that >>> just this one code path is sporadically having problems. >> You can see the result of various test builds here: >> >> https://builds.apache.org/job/Commons%20Math%20H10/ >> >> Everytime I added more debug output to FastMath.exp(), the tests succeeded. >> >> I also setup a jenkins instance with the same maven / jdk version to >> build commons-math, but could never reproduce an error so far. >> >> Without direct access to one of the failing servers, I doubt that we >> will be able to find / fix this problem. > I wonder if it could be a runtime optimiser issue? > That might explain why debug affected the outcome. > > Might be worth trying the non-debug code with Java 6 or 7. > > If the issue is related to Java 5, then it could explain why it was > not seen by developer testing - I don't think many devs use Java 5 (or > the incompatibility would not have crept into the release).
I looked at the history of this file and while it has been cosmetically munged a bit in the last couple of years, the method causing the problem hasn't been touched since lots of us were running 1.5. It's strange we never saw this failure. I also ran quite a few 1.5 builds under both Ant and maven using Oracle JDKs and saw no failures. Phil > > >> Thomas >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org