On 1/13/15 2:33 PM, sebb wrote:
> On 13 January 2015 at 21:26, Thomas Neidhart <thomas.neidh...@gmail.com> 
> wrote:
>> On 01/13/2015 09:01 PM, Phil Steitz wrote:
>>> On 1/12/15 3:21 PM, Thomas Neidhart wrote:
>>>> On 01/12/2015 11:17 PM, Phil Steitz wrote:
>>>>> On 1/12/15 2:30 PM, Thomas Neidhart wrote:
>>>>>> On 01/12/2015 10:26 PM, Thomas Neidhart wrote:
>>>>>>> On 01/12/2015 08:09 PM, Phil Steitz wrote:
>>>>>>>> On 1/12/15 11:37 AM, sebb wrote:
>>>>>>>>> On 12 January 2015 at 18:11, Phil Steitz <phil.ste...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>> On 1/12/15 10:50 AM, sebb wrote:
>>>>>>>>>>> On 11 January 2015 at 22:10, Phil Steitz <phil.ste...@gmail.com> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 1/11/15 11:19 AM, Phil Steitz wrote:
>>>>>>>>>>>>> On 1/10/15 10:49 PM, Phil Steitz wrote:
>>>>>>>>>>>>>> On 1/9/15 6:09 PM, sebb wrote:
>>>>>>>>>>>>>>> On 10 January 2015 at 01:01, Phil Steitz 
>>>>>>>>>>>>>>> <phil.ste...@gmail.com> wrote:
>>>>>>>>>>>>>>>> On 1/9/15 5:32 PM, sebb wrote:
>>>>>>>>>>>>>>>>> On 9 January 2015 at 23:48, sebb <seb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Of the last 6 runs, only 1 had a problem with unit test 
>>>>>>>>>>>>>>>>>> failures.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> All the builds ran on ubuntu3, apart from the failure which 
>>>>>>>>>>>>>>>>>> ran on H10.
>>>>>>>>>>>>>>>>>> This may have some bearing on the result; I don't yet know.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I had a quick look at 2 tests that failed:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfect
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> SimpleRegressionTest.testPerfectNegative
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Although the test case has some instance data, these 
>>>>>>>>>>>>>>>>>> particular tests
>>>>>>>>>>>>>>>>>> do not use any, so it does not look like a concurrency issue 
>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>> unit test itself.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The SimpleRegression class has mutable instance data, but 
>>>>>>>>>>>>>>>>>> the test
>>>>>>>>>>>>>>>>>> cases create their own instance.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't know anything about the math functions involved, but 
>>>>>>>>>>>>>>>>>> it looks
>>>>>>>>>>>>>>>>>> as though Infinity might result from getSignificance() if
>>>>>>>>>>>>>>>>>> getSlopeStdErr() returns 0, as the latter is used as a 
>>>>>>>>>>>>>>>>>> divisor. Or if
>>>>>>>>>>>>>>>>>> the field sumXX is 0 because that is also used as a divisor.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Maybe the H10 host has different floating point hardware?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'll try running some more tests on H10.
>>>>>>>>>>>>>>>>> the build failed again on H10; exactly the same tests failed 
>>>>>>>>>>>>>>>>> as before:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This test:
>>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math%20H10/1/console
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Previous failure:
>>>>>>>>>>>>>>>>> https://builds.apache.org/job/Commons%20Math/14/console
>>>>>>>>>>>>>>>> This is actually a bug.  Thanks, sebb (and Jenkins)!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Has been here since 1.x.  What is going on is that the data 
>>>>>>>>>>>>>>>> sets
>>>>>>>>>>>>>>>> used in the test cases are set up to be perfect linear
>>>>>>>>>>>>>>>> relationships, which should in fact lead to mean square error 
>>>>>>>>>>>>>>>> (and
>>>>>>>>>>>>>>>> hence slope standard error) equal to 0.  The Jenkins box must 
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> getting exact 0.  The funny thing is the test is there to 
>>>>>>>>>>>>>>>> validate
>>>>>>>>>>>>>>>> correct performance for models like this.  Its success 
>>>>>>>>>>>>>>>> unfortunately
>>>>>>>>>>>>>>>> depends on poor precision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will open a JIRA for this.  I don't think it is a release 
>>>>>>>>>>>>>>>> blocker
>>>>>>>>>>>>>>>> for 3.4.1, as I am sure you would get the same thing in any 
>>>>>>>>>>>>>>>> earlier
>>>>>>>>>>>>>>>> version of [math].
>>>>>>>>>>>>>>> OK good to know.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'll leave the H10 Jenkins job for now to make it easy to 
>>>>>>>>>>>>>>> retest.
>>>>>>>>>>>>>> My first guess here was wrong.  The infinities are being handled
>>>>>>>>>>>>>> correctly for the JDKs I have.  Something must be going awry in 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> t distribution cumulative probability computation for +INF on the
>>>>>>>>>>>>>> box that is failing.  Is there a way to find out exactly what JDK
>>>>>>>>>>>>>> and OS version are being used?
>>>>>>>>>>>>> I just committed a test that tests the t distribution computations
>>>>>>>>>>>>> directly.  It seems to have run clean; but the other test ran 
>>>>>>>>>>>>> clean
>>>>>>>>>>>>> too.  Is there any way to force the build to use the host that 
>>>>>>>>>>>>> fails?
>>>>>>>>>>>> I can't make any sense of what is going on with the Jenkins builds.
>>>>>>>>>>>> Clean runs and then lots of errors.  This one explains the
>>>>>>>>>>>> SimpleRegression "problem" (which is not a problem with that class
>>>>>>>>>>>> at least)
>>>>>>>>>>>>
>>>>>>>>>>>> testCumulativeProbablilityExtremes(org.apache.commons.math3.distribution.TDistributionTest)
>>>>>>>>>>>>   Time elapsed: 0.001 sec  <<< FAILURE!
>>>>>>>>>>>> java.lang.AssertionError: expected:<1.0> but was:<-Infinity>
>>>>>>>>>>>>         at org.junit.Assert.fail(Assert.java:88)
>>>>>>>>>>>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>>>>>>>>>>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>>>>>>>>>>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>>>>>>>>>>>         at 
>>>>>>>>>>>> org.apache.commons.math3.distribution.TDistributionTest.testCumulativeProbablilityExtremes(TDistributionTest.java:109)
>>>>>>>>>>>>
>>>>>>>>>>>> Earlier runs this ran clean. There is nothing non-deterministic 
>>>>>>>>>>>> about this test (or quite a few of the others that randomly seem 
>>>>>>>>>>>> to fail).
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder if we have a bad cpu or something somewhere.
>>>>>>>>>>> AFAICS all the failed builds ran on H10.
>>>>>>>>>>>
>>>>>>>>>>> IMO it is consistent; the apparent randomness comes from the fact 
>>>>>>>>>>> the
>>>>>>>>>>> there are several Ubuntu hosts, including H10.
>>>>>>>>>> Am I reading it / looking at the wrong one, or did this one succeed?
>>>>>>>>>>
>>>>>>>>>> https://builds.apache.org/view/All/job/Commons%20Math%20H10/6/
>>>>>>>>>>
>>>>>>>>>> That one was right after I added tests confirming that the t
>>>>>>>>>> distribution cum prob handles INFs correctly.
>>>>>>>>> That did run on H10 and did succeed; I'd not noticed that one before.
>>>>>>>>>
>>>>>>>>> I think it is still true that the failures have only occurred on H10.
>>>>>>>>>
>>>>>>>>> However, the latest one is failing:
>>>>>>>>>
>>>>>>>>> https://builds.apache.org/job/Commons%20Math/24/console
>>>>>>>>>
>>>>>>>>> This is on H11 - I think that's the first time H11 has been used.
>>>>>>>>>
>>>>>>>>> I suppose it's possible that H10 and H11 have a common failing, but it
>>>>>>>>> seems less likely.
>>>>>>>>>
>>>>>>>>> I added a bit more debug - showing the value of sumXX - but that seems
>>>>>>>>> OK on H11.
>>>>>>>>>
>>>>>>>>> I just added a bit more debug.
>>>>>>>> I am pretty sure the SimpleRegressionTest failure is actually cause
>>>>>>>> by the same thing causing the t-distribution test to fail (the
>>>>>>>> reason I added that one).
>>>>>>>>
>>>>>>>> One that is more straightforward to chase is this one, which fails
>>>>>>>> pretty consistently when "bad things happen"
>>>>>>>>
>>>>>>>> testExpInf(org.apache.commons.math3.complex.ComplexTest)  Time 
>>>>>>>> elapsed: 0.001 sec  <<< FAILURE!
>>>>>>>> java.lang.AssertionError: expected:<0.0> but was:<Infinity>
>>>>>>>>  at org.junit.Assert.fail(Assert.java:88)
>>>>>>>>  at org.junit.Assert.failNotEquals(Assert.java:743)
>>>>>>>>  at org.junit.Assert.assertEquals(Assert.java:494)
>>>>>>>>  at org.junit.Assert.assertEquals(Assert.java:592)
>>>>>>>>  at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:76)
>>>>>>>>  at org.apache.commons.math3.TestUtils.assertSame(TestUtils.java:84)
>>>>>>>>  at 
>>>>>>>> org.apache.commons.math3.complex.ComplexTest.testExpInf(ComplexTest.java:788)
>>>>>>>>
>>>>>>>> I would wager that what is going on here is 0.0 * -INF = INF.
>>>>>>> The output returned by the debug statements added by sebb is:
>>>>>>>
>>>>>>> expReal=Infinity
>>>>>>> cosImag=0.5403023058681398
>>>>>>> sinImag=0.8414709848078965
>>>>>>> result=(Infinity, Infinity)
>>>>>>>
>>>>>>> while expReal should be -Infinity.
>>>>>>>
>>>>>>> of course, Math.exp(Infinity) = Infinity.
>>>>>> oh stupid mistake, please forget my last post.
>>>>>> I messed up expReal with the actual real value.
>>>>> But it should be 0, since expReal should be exp(-INF)
>>>> just added a few more debug output to the test and the result is:
>>>>
>>>> real=-Infinity
>>>> -real=2147483647
>>>> expReal=Infinity
>>>>
>>>> according to FastMath.exp(), with these values, the code path should be
>>>> as follows:
>>>>
>>>>         if (x < 0.0) {
>>>>             intVal = (int) -x;
>>>>
>>>>             if (intVal > 746) {
>>>>                 if (hiPrec != null) {
>>>>                     hiPrec[0] = 0.0;
>>>>                     hiPrec[1] = 0.0;
>>>>                 }
>>>> -->             return 0.0;
>>>>             }
>>>>
>>>>
>>>> but obviously it doesn't do this. I guess we can only inspect the
>>>> generated class files for a potential compiler bug.
>>> I did a little more poking about last night in the failed tests and
>>> the ones I spot-checked could all have had to do with incorrect
>>> computations of exp(-INF).  What is strange is that the cast you
>>> show above is working correctly (compliant with JLS) and the code
>>> path should be as you have it there.  It seems very strange that
>>> just this one code path is sporadically having problems.
>> You can see the result of various test builds here:
>>
>> https://builds.apache.org/job/Commons%20Math%20H10/
>>
>> Everytime I added more debug output to FastMath.exp(), the tests succeeded.
>>
>> I also setup a jenkins instance with the same maven / jdk version to
>> build commons-math, but could never reproduce an error so far.
>>
>> Without direct access to one of the failing servers, I doubt that we
>> will be able to find / fix this problem.
> I wonder if it could be a runtime optimiser issue?
> That might explain why debug affected the outcome.
>
> Might be worth trying the non-debug code with Java 6 or 7.
>
> If the issue is related to Java 5, then it could explain why it was
> not seen by developer testing - I don't think many devs use Java 5 (or
> the incompatibility would not have crept into the release).

I looked at the history of this file and while it has been
cosmetically munged a bit in the last couple of years, the method
causing the problem hasn't been touched since lots of us were
running 1.5.  It's strange we never saw this failure.

I also ran quite a few 1.5 builds under both Ant and maven using
Oracle JDKs and saw no failures.

Phil
>
>
>> Thomas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to