On 13 January 2015 at 21:26, Thomas Neidhart <thomas.neidh...@gmail.com> wrote:
> On 01/13/2015 09:01 PM, Phil Steitz wrote:
>> On 1/12/15 3:21 PM, Thomas Neidhart wrote:
>>> On 01/12/2015 11:17 PM, Phil Steitz wrote:
>>>> On 1/12/15 2:30 PM, Thomas Neidhart wrote:
>>>>> On 01/12/2015 10:26 PM, Thomas Neidhart wrote:
>>>>>> On 01/12/2015 08:09 PM, Phil Steitz wrote:
>>>>>>> On 1/12/15 11:37 AM, sebb wrote:
>>>>>>>> On 12 January 2015 at 18:11, Phil Steitz <phil.ste...@gmail.com> wrote:
>>>>>>>>> On 1/12/15 10:50 AM, sebb wrote:
>>>>>>>>>> On 11 January 2015 at 22:10, Phil Steitz <phil.ste...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> On 1/11/15 11:19 AM, Phil Steitz wrote:
>>>>>>>>>>>> On 1/10/15 10:49 PM, Phil Steitz wrote:
>>>>>>>>>>>>> On 1/9/15 6:09 PM, sebb wrote:
>>>>>>>>>>>>>> On 10 January 2015 at 01:01, Phil Steitz <phil.ste...@gmail.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On 1/9/15 5:32 PM, sebb wrote:
>>>>>>>>>>>>>>>> On 9 January 2015 at 23:48, sebb <seb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Of the last 6 runs, only 1 had a problem with unit test 
>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> All the builds ran on ubuntu3, apart from the failure which 
>>>>>>>>>>>>>>>>> ran on H10.
>>>>>>>>>>>>>>>>> This may have some bearing on the result; I don't yet know.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I had a quick look at 2 tests that failed:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfect
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfectNegative
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Although the test case has some instance data, these 
>>>>>>>>>>>>>>>>> particular tests
>>>>>>>>>>>>>>>>> do not use any, so it does not look like a concurrency issue 
>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>> unit test itself.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The SimpleRegression class has mutable instance data, but the 
>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>> cases create their own instance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't know anything about the math functions involved, but 
>>>>>>>>>>>>>>>>> it looks
>>>>>>>>>>>>>>>>> as though Infinity might result from getSignificance() if
>>>>>>>>>>>>>>>>> getSlopeStdErr() returns 0, as the latter is used as a 
>>>>>>>>>>>>>>>>> divisor. Or if
>>>>>>>>>>>>>>>>> the field sumXX is 0 because that is also used as a divisor.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Maybe the H10 host has different floating point hardware?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'll try running some more tests on H10.
>>>>>>>>>>>>>>>> the build failed again on H10; exactly the same tests failed 
>>>>>>>>>>>>>>>> as before:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This test:
>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math%20H10/1/console
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Previous failure:
>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math/14/console
>>>>>>>>>>>>>>> This is actually a bug.  Thanks, sebb (and Jenkins)!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Has been here since 1.x.  What is going on is that the data sets
>>>>>>>>>>>>>>> used in the test cases are set up to be perfect linear
>>>>>>>>>>>>>>> relationships, which should in fact lead to mean square error 
>>>>>>>>>>>>>>> (and
>>>>>>>>>>>>>>> hence slope standard error) equal to 0.  The Jenkins box must be
>>>>>>>>>>>>>>> getting exact 0.  The funny thing is the test is there to 
>>>>>>>>>>>>>>> validate
>>>>>>>>>>>>>>> correct performance for models like this.  Its success 
>>>>>>>>>>>>>>> unfortunately
>>>>>>>>>>>>>>> depends on poor precision.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will open a JIRA for this.  I don't think it is a release 
>>>>>>>>>>>>>>> blocker
>>>>>>>>>>>>>>> for 3.4.1, as I am sure you would get the same thing in any 
>>>>>>>>>>>>>>> earlier
>>>>>>>>>>>>>>> version of [math].
>>>>>>>>>>>>>> OK good to know.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll leave the H10 Jenkins job for now to make it easy to retest.
>>>>>>>>>>>>> My first guess here was wrong.  The infinities are being handled
>>>>>>>>>>>>> correctly for the JDKs I have.  Something must be going awry in 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> t distribution cumulative probability computation for +INF on the
>>>>>>>>>>>>> box that is failing.  Is there a way to find out exactly what JDK
>>>>>>>>>>>>> and OS version are being used?
>>>>>>>>>>>> I just committed a test that tests the t distribution computations
>>>>>>>>>>>> directly.  It seems to have run clean; but the other test ran clean
>>>>>>>>>>>> too.  Is there any way to force the build to use the host that 
>>>>>>>>>>>> fails?
>>>>>>>>>>> I can't make any sense of what is going on with the Jenkins builds.
>>>>>>>>>>> Clean runs and then lots of errors.  This one explains the
>>>>>>>>>>> SimpleRegression "problem" (which is not a problem with that class
>>>>>>>>>>> at least)
>>>>>>>>>>>
>>>>>>>>>>> testCumulativeProbablilityExtremes(org.apache.commons.math3.distribution.TDistributionTest)
>>>>>>>>>>>   Time elapsed: 0.001 sec  <<< FAILURE!
>>>>>>>>>>> java.lang.AssertionError: expected:<1.0> but was:<-Infinity>
>>>>>>>>>>>         at org.junit.Assert.fail(Assert.java:88)
>>>>>>>>>>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>>>>>>>>>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>>>>>>>>>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>>>>>>>>>>         at 
>>>>>>>>>>> org.apache.commons.math3.distribution.TDistributionTest.testCumulativeProbablilityExtremes(TDistributionTest.java:109)
>>>>>>>>>>>
>>>>>>>>>>> Earlier runs this ran clean. There is nothing non-deterministic 
>>>>>>>>>>> about this test (or quite a few of the others that randomly seem to 
>>>>>>>>>>> fail).
>>>>>>>>>>>
>>>>>>>>>>> I wonder if we have a bad cpu or something somewhere.
>>>>>>>>>> AFAICS all the failed builds ran on H10.
>>>>>>>>>>
>>>>>>>>>> IMO it is consistent; the apparent randomness comes from the fact the
>>>>>>>>>> there are several Ubuntu hosts, including H10.
>>>>>>>>> Am I reading it / looking at the wrong one, or did this one succeed?
>>>>>>>>>
>>>>>>>>> https://builds.apache.org/view/All/job/Commons%20Math%20H10/6/
>>>>>>>>>
>>>>>>>>> That one was right after I added tests confirming that the t
>>>>>>>>> distribution cum prob handles INFs correctly.
>>>>>>>> That did run on H10 and did succeed; I'd not noticed that one before.
>>>>>>>>
>>>>>>>> I think it is still true that the failures have only occurred on H10.
>>>>>>>>
>>>>>>>> However, the latest one is failing:
>>>>>>>>
>>>>>>>> https://builds.apache.org/job/Commons%20Math/24/console
>>>>>>>>
>>>>>>>> This is on H11 - I think that's the first time H11 has been used.
>>>>>>>>
>>>>>>>> I suppose it's possible that H10 and H11 have a common failing, but it
>>>>>>>> seems less likely.
>>>>>>>>
>>>>>>>> I added a bit more debug - showing the value of sumXX - but that seems
>>>>>>>> OK on H11.
>>>>>>>>
>>>>>>>> I just added a bit more debug.
>>>>>>> I am pretty sure the SimpleRegressionTest failure is actually cause
>>>>>>> by the same thing causing the t-distribution test to fail (the
>>>>>>> reason I added that one).
>>>>>>>
>>>>>>> One that is more straightforward to chase is this one, which fails
>>>>>>> pretty consistently when "bad things happen"
>>>>>>>
>>>>>>> testExpInf(org.apache.commons.math3.complex.ComplexTest)  Time elapsed: 
>>>>>>> 0.001 sec  <<< FAILURE!
>>>>>>> java.lang.AssertionError: expected:<0.0> but was:<Infinity>
>>>>>>>  at org.junit.Assert.fail(Assert.java:88)
>>>>>>>  at org.junit.Assert.failNotEquals(Assert.java:743)
>>>>>>>  at org.junit.Assert.assertEquals(Assert.java:494)
>>>>>>>  at org.junit.Assert.assertEquals(Assert.java:592)
>>>>>>>  at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:76)
>>>>>>>  at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:84)
>>>>>>>  at 
>>>>>>> org.apache.commons.math3.complex.ComplexTest.testExpInf(ComplexTest.java:788)
>>>>>>>
>>>>>>> I would wager that what is going on here is 0.0 * -INF = INF.
>>>>>> The output returned by the debug statements added by sebb is:
>>>>>>
>>>>>> expReal=Infinity
>>>>>> cosImag=0.5403023058681398
>>>>>> sinImag=0.8414709848078965
>>>>>> result=(Infinity, Infinity)
>>>>>>
>>>>>> while expReal should be -Infinity.
>>>>>>
>>>>>> of course, Math.exp(Infinity) = Infinity.
>>>>> oh stupid mistake, please forget my last post.
>>>>> I messed up expReal with the actual real value.
>>>> But it should be 0, since expReal should be exp(-INF)
>>> just added a few more debug output to the test and the result is:
>>>
>>> real=-Infinity
>>> -real=2147483647
>>> expReal=Infinity
>>>
>>> according to FastMath.exp(), with these values, the code path should be
>>> as follows:
>>>
>>>         if (x < 0.0) {
>>>             intVal = (int) -x;
>>>
>>>             if (intVal > 746) {
>>>                 if (hiPrec != null) {
>>>                     hiPrec[0] = 0.0;
>>>                     hiPrec[1] = 0.0;
>>>                 }
>>> -->             return 0.0;
>>>             }
>>>
>>>
>>> but obviously it doesn't do this. I guess we can only inspect the
>>> generated class files for a potential compiler bug.
>>
>> I did a little more poking about last night in the failed tests and
>> the ones I spot-checked could all have had to do with incorrect
>> computations of exp(-INF).  What is strange is that the cast you
>> show above is working correctly (compliant with JLS) and the code
>> path should be as you have it there.  It seems very strange that
>> just this one code path is sporadically having problems.
>
> You can see the result of various test builds here:
>
> https://builds.apache.org/job/Commons%20Math%20H10/
>
> Everytime I added more debug output to FastMath.exp(), the tests succeeded.
>
> I also setup a jenkins instance with the same maven / jdk version to
> build commons-math, but could never reproduce an error so far.
>
> Without direct access to one of the failing servers, I doubt that we
> will be able to find / fix this problem.

I wonder if it could be a runtime optimiser issue?
That might explain why debug affected the outcome.

Might be worth trying the non-debug code with Java 6 or 7.

If the issue is related to Java 5, then it could explain why it was
not seen by developer testing - I don't think many devs use Java 5 (or
the incompatibility would not have crept into the release).


> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to