Does the same problem occur with another JVM, Dmitriy? JRockit or IBM's J9? Or even a different version of HotSpot? I remember we were once trying to figure out a similar bug (lots of computations, different result) in our code and it turned out that the problem was due to different native floating point precision roundings and code JITting at different points in time (this was a very hard stuff to debug).
The result in our case did not propagate to such large differences, but I would still try to reproduce with a different JVM just to see if this makes any difference. My next step would then be to instrument the code with AspectJ (statically) and run on a machine (jvm) where the test passes and on the one where it fails and dump, dump, dump execution progress (all method entry/exit points, with arguments). These are usually very large files, but you can easily diff them against each other and see where _exactly_ the execution started to diverge. If you add -XX:PrintCompilation to hotspot you may even be able to tell if this is a JIT problem or something else. Shooting at the dark here, but maybe you'll find it helpful. I can help you with aspectj instrumentation if you find this interesting. Dawid On Thu, Dec 29, 2011 at 9:42 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > yes. i would venture to say that U computation (or restoration) is > somehow corrupted starting with 2nd block. at least it looks this way. > > On Thu, Dec 29, 2011 at 12:31 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> A-reconstructed difference looks good up to row 399 but starting at >> row 400 differences do not add up to 0 anymore (although both inputs >> are not 0). >> >> So it doesn't look like trivial case of something is not initialized >> on top of it. It does seem something to do with blocking mechanism >> though since apparently 400th row is a boundary of some blocking >> somewhere, but it is hard for me to see where it fails at this point. >> >> On Thu, Dec 29, 2011 at 12:13 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>> oh. it's because the synthetic input has only 4 singular values. >>> >>> On Wed, Dec 28, 2011 at 11:56 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>> wrote: >>>> But it is not a problem reading U or V files, that's indeed what U and >>>> V contain. >>>> >>>> On Wed, Dec 28, 2011 at 11:49 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>>> wrote: >>>>> U and V look suspect, degenerate (only 4 first columns are nonzero, >>>>> the rest of matrices are zeros. >>>>> >>>>> On Wed, Dec 28, 2011 at 11:44 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>>>> wrote: >>>>>> Yeah, fails for me on ubuntu without any special environment issues. >>>>>> Which makes it easier, i can step thru. >>>>>> >>>>>> On Wed, Dec 28, 2011 at 9:01 PM, Ted Dunning <ted.dunn...@gmail.com> >>>>>> wrote: >>>>>>> What do checksums look like? >>>>>>> >>>>>>> On Wed, Dec 28, 2011 at 6:33 PM, Grant Ingersoll >>>>>>> <gsing...@apache.org>wrote: >>>>>>> >>>>>>>> I commented out the deletion of the dir in the tearDown. Not sure if >>>>>>>> that >>>>>>>> looks reasonable or not, but on the surface they look equivalent. >>>>>>>> >>>>>>>> Here's the contents of the dir on Ubuntu: >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000000 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000200 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000400 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000600 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1387722 2011-12-28 21:17 A-000000800 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000000 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000210 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000420 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000630 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 144312 2011-12-28 21:17 B-000000840 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-0 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-200 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-400 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-600 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 136352 2011-12-28 21:17 U-800 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-0 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-1 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-2 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-3 >>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 144372 2011-12-28 21:17 V-4 >>>>>>>> >>>>>>>> Here's what my Mac looks like: >>>>>>>> total 20296 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000000 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000200 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000400 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000600 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.3M Dec 28 21:28 A-000000800 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000000 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000210 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000420 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000630 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 141K Dec 28 21:28 B-000000840 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-0 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-200 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-400 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-600 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 133K Dec 28 21:28 U-800 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-0 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-1 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-2 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-3 >>>>>>>> -rw-r--r-- 1 XXXXXX staff 141K Dec 28 21:28 V-4 >>>>>>>> >>>>>>>> On Dec 28, 2011, at 7:15 PM, Ted Dunning wrote: >>>>>>>> >>>>>>>> > Yeah.. but this is a difference from the correct answer. I am >>>>>>>> > moderately >>>>>>>> > sure that this is a problem writing to the temp directory. >>>>>>>> > >>>>>>>> > On Wed, Dec 28, 2011 at 3:45 PM, Grant Ingersoll <gsing...@apache.org >>>>>>>> >wrote: >>>>>>>> > >>>>>>>> >> It's expecting the answer to be 0, but it's some really large value. >>>>>>>> >> >>>>>>>> testSingularValues(org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest): >>>>>>>> >> expected:<0.0> but was:<4131200.0000000037> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Dec 28, 2011, at 6:30 PM, Ted Dunning wrote: >>>>>>>> >> >>>>>>>> >>> I think that the answer is 0 because the model is not being read >>>>>>>> >>> and we >>>>>>>> >> are >>>>>>>> >>> swallowing an exception somewhere. This is what an uninitialized >>>>>>>> matrix >>>>>>>> >>> would give as a result. >>>>>>>> >>> >>>>>>>> >>> On Wed, Dec 28, 2011 at 3:21 PM, Grant Ingersoll >>>>>>>> >>> <gsing...@apache.org >>>>>>>> >>> wrote: >>>>>>>> >>> >>>>>>>> >>>> I can reproduce outside of Jenkins. It really seems odd that the >>>>>>>> answer >>>>>>>> >>>> is off by so much. >>>>>>>> >>>> >>>>>>>> >>>> On Dec 28, 2011, at 2:15 AM, Dmitriy Lyubimov wrote: >>>>>>>> >>>> >>>>>>>> >>>>> I vaguely remember Jenkins had problems with creating stuff in >>>>>>>> >>>>> Java >>>>>>>> tmp >>>>>>>> >>>>> dir. E.g. I remember that was creating problems for Mr tasks in >>>>>>>> >>>>> local >>>>>>>> >> mr >>>>>>>> >>>>> mode legitimately using boxed task temporary space. >>>>>>>> >>>>> >>>>>>>> >>>>> OK I'll try to scan for the problem tomorrow. >>>>>>>> >>>>> On Dec 27, 2011 10:50 PM, "Ted Dunning" <ted.dunn...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>> >>>>>>>> >>>>>> So I am like everybody else. The test works for me. >>>>>>>> >>>>>> >>>>>>>> >>>>>> My suspicion is that there is something going on with the >>>>>>>> >>>>>> temporary >>>>>>>> >>>>>> directory that I am trying to use and that the environment that >>>>>>>> >> Jenkins >>>>>>>> >>>> is >>>>>>>> >>>>>> using is somehow strange. >>>>>>>> >>>>>> >>>>>>>> >>>>>> The only slightly surprising idiom I am using is to create a >>>>>>>> temporary >>>>>>>> >>>>>> file, delete it and recreate it as a directory. I even check >>>>>>>> >>>>>> the >>>>>>>> >> return >>>>>>>> >>>>>> values from the delete and the mkdir. >>>>>>>> >>>>>> >>>>>>>> >>>>>> I will keep looking. >>>>>>>> >>>>>> >>>>>>>> >>>>>> On Tue, Dec 27, 2011 at 10:37 PM, Ted Dunning < >>>>>>>> ted.dunn...@gmail.com> >>>>>>>> >>>>>> wrote: >>>>>>>> >>>>>> >>>>>>>> >>>>>>> Indeed it does. Thanks for pointing that out. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> This error is very strange. >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> On Tue, Dec 27, 2011 at 10:06 PM, Dmitriy Lyubimov < >>>>>>>> >> dlie...@gmail.com >>>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Ted, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> do you have an idea why this test may be failing? I think this >>>>>>>> test >>>>>>>> >>>>>> comes >>>>>>>> >>>>>>>> with M-792 commit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I can take a look at it, I suspect something in the >>>>>>>> >>>>>>>> environment >>>>>>>> can >>>>>>>> >> be >>>>>>>> >>>>>>>> tripping it. >>>>>>>> >>>>>>>> On Dec 27, 2011 8:54 PM, "Sean Owen" <sro...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> It's all errors in the Apache infrastructure, rather than a >>>>>>>> >>>>>>>>> real >>>>>>>> >> test >>>>>>>> >>>>>>>>> failure. At least, stuff passes for me locally, and that's >>>>>>>> >>>>>>>>> what's >>>>>>>> >>>>>>>>> important. >>>>>>>> >>>>>>>>> So I'm ignoring these. >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> On Tue, Dec 27, 2011 at 9:34 PM, Jeff Eastman >>>>>>>> >>>>>>>>> <jeast...@windwardsolutions.com> wrote: >>>>>>>> >>>>>>>>>> I'm getting a lot of these emails yet all the tests run >>>>>>>> >>>>>>>>>> locally >>>>>>>> >> for >>>>>>>> >>>>>>>> me. >>>>>>>> >>>>>>>>> Does >>>>>>>> >>>>>>>>>> anybody have an idea what the problem is? This close to a >>>>>>>> release >>>>>>>> >> it >>>>>>>> >>>>>>>>> would >>>>>>>> >>>>>>>>>> be really nice to have Jenkins on our side. >>>>>>>> >>>>>>>>>> Jeff >>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>>> >>>>>>>> >>>>>> >>>>>>>> >>>> >>>>>>>> >>>> -------------------------------------------- >>>>>>>> >>>> Grant Ingersoll >>>>>>>> >>>> http://www.lucidimagination.com >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >> >>>>>>>> >> -------------------------------------------- >>>>>>>> >> Grant Ingersoll >>>>>>>> >> http://www.lucidimagination.com >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >>>>>>>> -------------------------------------------- >>>>>>>> Grant Ingersoll >>>>>>>> http://www.lucidimagination.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >