Does the same problem occur with another JVM, Dmitriy? JRockit or
IBM's J9? Or even a different version of HotSpot? I remember we were
once trying to figure out a similar bug (lots of computations,
different result) in our code and it turned out that the problem was
due to different native floating point precision roundings and code
JITting at different points in time (this was a very hard stuff to
debug).

The result in our case did not propagate to such large differences,
but I would still try to reproduce with a different JVM just to see if
this makes any difference.

My next step would then be to instrument the code with AspectJ
(statically) and run on a machine (jvm) where the test passes and on
the one where it fails and dump, dump, dump execution progress (all
method entry/exit points, with arguments). These are usually very
large files, but you can easily diff them against each other and see
where _exactly_ the execution started to diverge. If you add
-XX:PrintCompilation to hotspot you may even be able to tell if this
is a JIT problem or something else.

Shooting at the dark here, but maybe you'll find it helpful. I can
help you with aspectj instrumentation if you find this interesting.

Dawid

On Thu, Dec 29, 2011 at 9:42 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> yes. i would venture to say that U computation (or restoration) is
> somehow corrupted starting with 2nd block. at least it looks this way.
>
> On Thu, Dec 29, 2011 at 12:31 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>> A-reconstructed difference looks good up to row 399 but starting at
>> row 400 differences do not add up to 0 anymore (although both inputs
>> are not 0).
>>
>> So it doesn't look like trivial case of something is not initialized
>> on top of it. It does seem something to do with blocking mechanism
>> though since apparently 400th row is a boundary of some blocking
>> somewhere, but it is hard for me to see where it fails at this point.
>>
>> On Thu, Dec 29, 2011 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>>> oh. it's because the synthetic input has only 4 singular values.
>>>
>>> On Wed, Dec 28, 2011 at 11:56 PM, Dmitriy Lyubimov <dlie...@gmail.com> 
>>> wrote:
>>>> But it is not a problem reading U or V files, that's indeed what U and
>>>> V contain.
>>>>
>>>> On Wed, Dec 28, 2011 at 11:49 PM, Dmitriy Lyubimov <dlie...@gmail.com> 
>>>> wrote:
>>>>> U and V look suspect, degenerate (only 4 first columns are nonzero,
>>>>> the rest of matrices are zeros.
>>>>>
>>>>> On Wed, Dec 28, 2011 at 11:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> 
>>>>> wrote:
>>>>>> Yeah, fails for me on ubuntu without any special environment issues.
>>>>>> Which makes it easier, i can step thru.
>>>>>>
>>>>>> On Wed, Dec 28, 2011 at 9:01 PM, Ted Dunning <ted.dunn...@gmail.com> 
>>>>>> wrote:
>>>>>>> What do checksums look like?
>>>>>>>
>>>>>>> On Wed, Dec 28, 2011 at 6:33 PM, Grant Ingersoll 
>>>>>>> <gsing...@apache.org>wrote:
>>>>>>>
>>>>>>>> I commented out the deletion of the dir in the tearDown.  Not sure if 
>>>>>>>> that
>>>>>>>> looks reasonable or not, but on the surface they look equivalent.
>>>>>>>>
>>>>>>>> Here's the contents of the dir on Ubuntu:
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000000
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000200
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000400
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000600
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1387722 2011-12-28 21:17 A-000000800
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168312 2011-12-28 21:17 B-000000000
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168312 2011-12-28 21:17 B-000000210
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168312 2011-12-28 21:17 B-000000420
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168312 2011-12-28 21:17 B-000000630
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  144312 2011-12-28 21:17 B-000000840
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  160412 2011-12-28 21:17 U-0
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  160412 2011-12-28 21:17 U-200
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  160412 2011-12-28 21:17 U-400
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  160412 2011-12-28 21:17 U-600
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  136352 2011-12-28 21:17 U-800
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168432 2011-12-28 21:17 V-0
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168432 2011-12-28 21:17 V-1
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168432 2011-12-28 21:17 V-2
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  168432 2011-12-28 21:17 V-3
>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX  144372 2011-12-28 21:17 V-4
>>>>>>>>
>>>>>>>> Here's what my Mac looks like:
>>>>>>>> total 20296
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   1.6M Dec 28 21:28 A-000000000
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   1.6M Dec 28 21:28 A-000000200
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   1.6M Dec 28 21:28 A-000000400
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   1.6M Dec 28 21:28 A-000000600
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   1.3M Dec 28 21:28 A-000000800
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 B-000000000
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 B-000000210
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 B-000000420
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 B-000000630
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   141K Dec 28 21:28 B-000000840
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   157K Dec 28 21:28 U-0
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   157K Dec 28 21:28 U-200
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   157K Dec 28 21:28 U-400
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   157K Dec 28 21:28 U-600
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   133K Dec 28 21:28 U-800
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 V-0
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 V-1
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 V-2
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   164K Dec 28 21:28 V-3
>>>>>>>> -rw-r--r--  1 XXXXXX  staff   141K Dec 28 21:28 V-4
>>>>>>>>
>>>>>>>> On Dec 28, 2011, at 7:15 PM, Ted Dunning wrote:
>>>>>>>>
>>>>>>>> > Yeah.. but this is a difference from the correct answer.  I am 
>>>>>>>> > moderately
>>>>>>>> > sure that this is a problem writing to the temp directory.
>>>>>>>> >
>>>>>>>> > On Wed, Dec 28, 2011 at 3:45 PM, Grant Ingersoll <gsing...@apache.org
>>>>>>>> >wrote:
>>>>>>>> >
>>>>>>>> >> It's expecting the answer to be 0, but it's some really large value.
>>>>>>>> >>
>>>>>>>> testSingularValues(org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest):
>>>>>>>> >> expected:<0.0> but was:<4131200.0000000037>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Dec 28, 2011, at 6:30 PM, Ted Dunning wrote:
>>>>>>>> >>
>>>>>>>> >>> I think that the answer is 0 because the model is not being read 
>>>>>>>> >>> and we
>>>>>>>> >> are
>>>>>>>> >>> swallowing an exception somewhere.  This is what an uninitialized
>>>>>>>> matrix
>>>>>>>> >>> would give as a result.
>>>>>>>> >>>
>>>>>>>> >>> On Wed, Dec 28, 2011 at 3:21 PM, Grant Ingersoll 
>>>>>>>> >>> <gsing...@apache.org
>>>>>>>> >>> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> I can reproduce outside of Jenkins.  It really seems odd that the
>>>>>>>> answer
>>>>>>>> >>>> is off by so much.
>>>>>>>> >>>>
>>>>>>>> >>>> On Dec 28, 2011, at 2:15 AM, Dmitriy Lyubimov wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> I vaguely remember Jenkins had problems with creating stuff in 
>>>>>>>> >>>>> Java
>>>>>>>> tmp
>>>>>>>> >>>>> dir. E.g. I remember that was creating problems for Mr tasks in 
>>>>>>>> >>>>> local
>>>>>>>> >> mr
>>>>>>>> >>>>> mode legitimately using boxed task temporary space.
>>>>>>>> >>>>>
>>>>>>>> >>>>> OK I'll try to scan for the problem tomorrow.
>>>>>>>> >>>>> On Dec 27, 2011 10:50 PM, "Ted Dunning" <ted.dunn...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>>> So I am like everybody else.  The test works for me.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> My suspicion is that there is something going on with the 
>>>>>>>> >>>>>> temporary
>>>>>>>> >>>>>> directory that I am trying to use and that the environment that
>>>>>>>> >> Jenkins
>>>>>>>> >>>> is
>>>>>>>> >>>>>> using is somehow strange.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> The only slightly surprising idiom I am using is to create a
>>>>>>>> temporary
>>>>>>>> >>>>>> file, delete it and recreate it as a directory.  I even check 
>>>>>>>> >>>>>> the
>>>>>>>> >> return
>>>>>>>> >>>>>> values from the delete and the mkdir.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I will keep looking.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Tue, Dec 27, 2011 at 10:37 PM, Ted Dunning <
>>>>>>>> ted.dunn...@gmail.com>
>>>>>>>> >>>>>> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> Indeed it does.  Thanks for pointing that out.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> This error is very strange.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On Tue, Dec 27, 2011 at 10:06 PM, Dmitriy Lyubimov <
>>>>>>>> >> dlie...@gmail.com
>>>>>>>> >>>>>>> wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>> Ted,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> do you have an idea why this test may be failing? I think this
>>>>>>>> test
>>>>>>>> >>>>>> comes
>>>>>>>> >>>>>>>> with M-792 commit.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> I can take a look at it, I suspect something in the 
>>>>>>>> >>>>>>>> environment
>>>>>>>> can
>>>>>>>> >> be
>>>>>>>> >>>>>>>> tripping it.
>>>>>>>> >>>>>>>> On Dec 27, 2011 8:54 PM, "Sean Owen" <sro...@gmail.com> wrote:
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>> It's all errors in the Apache infrastructure, rather than a 
>>>>>>>> >>>>>>>>> real
>>>>>>>> >> test
>>>>>>>> >>>>>>>>> failure. At least, stuff passes for me locally, and that's 
>>>>>>>> >>>>>>>>> what's
>>>>>>>> >>>>>>>>> important.
>>>>>>>> >>>>>>>>> So I'm ignoring these.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> On Tue, Dec 27, 2011 at 9:34 PM, Jeff Eastman
>>>>>>>> >>>>>>>>> <jeast...@windwardsolutions.com> wrote:
>>>>>>>> >>>>>>>>>> I'm getting a lot of these emails yet all the tests run 
>>>>>>>> >>>>>>>>>> locally
>>>>>>>> >> for
>>>>>>>> >>>>>>>> me.
>>>>>>>> >>>>>>>>> Does
>>>>>>>> >>>>>>>>>> anybody have an idea what the problem is? This close to a
>>>>>>>> release
>>>>>>>> >> it
>>>>>>>> >>>>>>>>> would
>>>>>>>> >>>>>>>>>> be really nice to have Jenkins on our side.
>>>>>>>> >>>>>>>>>> Jeff
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --------------------------------------------
>>>>>>>> >>>> Grant Ingersoll
>>>>>>>> >>>> http://www.lucidimagination.com
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>
>>>>>>>> >> --------------------------------------------
>>>>>>>> >> Grant Ingersoll
>>>>>>>> >> http://www.lucidimagination.com
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>>
>>>>>>>> --------------------------------------------
>>>>>>>> Grant Ingersoll
>>>>>>>> http://www.lucidimagination.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>

Reply via email to