Any progress on this? Seems like it is a show stopper for code freeze. On Dec 29, 2011, at 3:58 AM, Dmitriy Lyubimov wrote:
> note identical md5 for u200 and u400. > > On Thu, Dec 29, 2011 at 12:57 AM, Dmitriy Lyubimov <[email protected]> wrote: >> MD5 sums >> >> b8217318a29ef69c58b921013eb019e5 /tmp/matrix8554072597307396201/A-000000000 >> 41db088ff74c5efd5b766dba253efc03 /tmp/matrix8554072597307396201/A-000000200 >> b8217318a29ef69c58b921013eb019e5 /tmp/matrix8554072597307396201/A-000000400 >> 41db088ff74c5efd5b766dba253efc03 /tmp/matrix8554072597307396201/A-000000600 >> c8dc2a7df82065b5c1e8284ff23aecc6 /tmp/matrix8554072597307396201/A-000000800 >> 83bccdd2fa191e01d34646e2030f0e77 /tmp/matrix8554072597307396201/B-000000000 >> 9d6878fb789d61d5453b994ea1a5c6db /tmp/matrix8554072597307396201/B-000000210 >> cbdf720b17ce25feb686effd1aa0ebef /tmp/matrix8554072597307396201/B-000000420 >> 2f71d6ba6891b242575b5cc6ba1c4358 /tmp/matrix8554072597307396201/B-000000630 >> f50b76bb48c8f6a791a6d8206d980492 /tmp/matrix8554072597307396201/B-000000840 >> 6bb29ca304889a6c8effff6e5c062dc8 /tmp/matrix8554072597307396201/U-0 >> 019c881c1d7c5748a1cacb3e0b3e5899 /tmp/matrix8554072597307396201/U-200 >> 019c881c1d7c5748a1cacb3e0b3e5899 /tmp/matrix8554072597307396201/U-400 >> b84e4b01ffb9d691c87b496f8b4d84ec /tmp/matrix8554072597307396201/U-600 >> 6bb29ca304889a6c8effff6e5c062dc8 /tmp/matrix8554072597307396201/U-770 >> >> >> On Thu, Dec 29, 2011 at 12:52 AM, Ted Dunning <[email protected]> wrote: >>> Thanks. Good hints. >>> >>> I will take a look on a linux machine in the next few days. >>> >>> On Thu, Dec 29, 2011 at 12:42 AM, Dmitriy Lyubimov <[email protected]>wrote: >>> >>>> yes. i would venture to say that U computation (or restoration) is >>>> somehow corrupted starting with 2nd block. at least it looks this way. >>>> >>>> On Thu, Dec 29, 2011 at 12:31 AM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>>> A-reconstructed difference looks good up to row 399 but starting at >>>>> row 400 differences do not add up to 0 anymore (although both inputs >>>>> are not 0). >>>>> >>>>> So it doesn't look like trivial case of something is not initialized >>>>> on top of it. It does seem something to do with blocking mechanism >>>>> though since apparently 400th row is a boundary of some blocking >>>>> somewhere, but it is hard for me to see where it fails at this point. >>>>> >>>>> On Thu, Dec 29, 2011 at 12:13 AM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>>>> oh. it's because the synthetic input has only 4 singular values. >>>>>> >>>>>> On Wed, Dec 28, 2011 at 11:56 PM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>>>>> But it is not a problem reading U or V files, that's indeed what U and >>>>>>> V contain. >>>>>>> >>>>>>> On Wed, Dec 28, 2011 at 11:49 PM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>>>>>> U and V look suspect, degenerate (only 4 first columns are nonzero, >>>>>>>> the rest of matrices are zeros. >>>>>>>> >>>>>>>> On Wed, Dec 28, 2011 at 11:44 PM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>>>>>>> Yeah, fails for me on ubuntu without any special environment issues. >>>>>>>>> Which makes it easier, i can step thru. >>>>>>>>> >>>>>>>>> On Wed, Dec 28, 2011 at 9:01 PM, Ted Dunning <[email protected]> >>>> wrote: >>>>>>>>>> What do checksums look like? >>>>>>>>>> >>>>>>>>>> On Wed, Dec 28, 2011 at 6:33 PM, Grant Ingersoll < >>>> [email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> I commented out the deletion of the dir in the tearDown. Not sure >>>> if that >>>>>>>>>>> looks reasonable or not, but on the surface they look equivalent. >>>>>>>>>>> >>>>>>>>>>> Here's the contents of the dir on Ubuntu: >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000000 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000200 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000400 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1632612 2011-12-28 21:17 A-000000600 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 1387722 2011-12-28 21:17 A-000000800 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000000 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000210 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000420 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168312 2011-12-28 21:17 B-000000630 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 144312 2011-12-28 21:17 B-000000840 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-0 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-200 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-400 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 160412 2011-12-28 21:17 U-600 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 136352 2011-12-28 21:17 U-800 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-0 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-1 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-2 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 168432 2011-12-28 21:17 V-3 >>>>>>>>>>> -rw-rw-r-- 1 XXXXXX XXXXXX 144372 2011-12-28 21:17 V-4 >>>>>>>>>>> >>>>>>>>>>> Here's what my Mac looks like: >>>>>>>>>>> total 20296 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000000 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000200 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000400 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.6M Dec 28 21:28 A-000000600 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 1.3M Dec 28 21:28 A-000000800 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000000 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000210 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000420 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 B-000000630 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 141K Dec 28 21:28 B-000000840 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-0 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-200 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-400 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 157K Dec 28 21:28 U-600 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 133K Dec 28 21:28 U-800 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-0 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-1 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-2 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 164K Dec 28 21:28 V-3 >>>>>>>>>>> -rw-r--r-- 1 XXXXXX staff 141K Dec 28 21:28 V-4 >>>>>>>>>>> >>>>>>>>>>> On Dec 28, 2011, at 7:15 PM, Ted Dunning wrote: >>>>>>>>>>> >>>>>>>>>>>> Yeah.. but this is a difference from the correct answer. I am >>>> moderately >>>>>>>>>>>> sure that this is a problem writing to the temp directory. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Dec 28, 2011 at 3:45 PM, Grant Ingersoll < >>>> [email protected] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> It's expecting the answer to be 0, but it's some really large >>>> value. >>>>>>>>>>>>> >>>>>>>>>>> >>>> testSingularValues(org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest): >>>>>>>>>>>>> expected:<0.0> but was:<4131200.0000000037> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Dec 28, 2011, at 6:30 PM, Ted Dunning wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think that the answer is 0 because the model is not being >>>> read and we >>>>>>>>>>>>> are >>>>>>>>>>>>>> swallowing an exception somewhere. This is what an >>>> uninitialized >>>>>>>>>>> matrix >>>>>>>>>>>>>> would give as a result. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Dec 28, 2011 at 3:21 PM, Grant Ingersoll < >>>> [email protected] >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can reproduce outside of Jenkins. It really seems odd that >>>> the >>>>>>>>>>> answer >>>>>>>>>>>>>>> is off by so much. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Dec 28, 2011, at 2:15 AM, Dmitriy Lyubimov wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I vaguely remember Jenkins had problems with creating stuff >>>> in Java >>>>>>>>>>> tmp >>>>>>>>>>>>>>>> dir. E.g. I remember that was creating problems for Mr tasks >>>> in local >>>>>>>>>>>>> mr >>>>>>>>>>>>>>>> mode legitimately using boxed task temporary space. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK I'll try to scan for the problem tomorrow. >>>>>>>>>>>>>>>> On Dec 27, 2011 10:50 PM, "Ted Dunning" < >>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So I am like everybody else. The test works for me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My suspicion is that there is something going on with the >>>> temporary >>>>>>>>>>>>>>>>> directory that I am trying to use and that the environment >>>> that >>>>>>>>>>>>> Jenkins >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> using is somehow strange. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The only slightly surprising idiom I am using is to create a >>>>>>>>>>> temporary >>>>>>>>>>>>>>>>> file, delete it and recreate it as a directory. I even >>>> check the >>>>>>>>>>>>> return >>>>>>>>>>>>>>>>> values from the delete and the mkdir. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I will keep looking. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Dec 27, 2011 at 10:37 PM, Ted Dunning < >>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Indeed it does. Thanks for pointing that out. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This error is very strange. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Dec 27, 2011 at 10:06 PM, Dmitriy Lyubimov < >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Ted, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> do you have an idea why this test may be failing? I think >>>> this >>>>>>>>>>> test >>>>>>>>>>>>>>>>> comes >>>>>>>>>>>>>>>>>>> with M-792 commit. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I can take a look at it, I suspect something in the >>>> environment >>>>>>>>>>> can >>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> tripping it. >>>>>>>>>>>>>>>>>>> On Dec 27, 2011 8:54 PM, "Sean Owen" <[email protected]> >>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It's all errors in the Apache infrastructure, rather >>>> than a real >>>>>>>>>>>>> test >>>>>>>>>>>>>>>>>>>> failure. At least, stuff passes for me locally, and >>>> that's what's >>>>>>>>>>>>>>>>>>>> important. >>>>>>>>>>>>>>>>>>>> So I'm ignoring these. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Dec 27, 2011 at 9:34 PM, Jeff Eastman >>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> I'm getting a lot of these emails yet all the tests run >>>> locally >>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>>>>> Does >>>>>>>>>>>>>>>>>>>>> anybody have an idea what the problem is? This close to >>>> a >>>>>>>>>>> release >>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>>> be really nice to have Jenkins on our side. >>>>>>>>>>>>>>>>>>>>> Jeff >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -------------------------------------------- >>>>>>>>>>>>>>> Grant Ingersoll >>>>>>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -------------------------------------------- >>>>>>>>>>>>> Grant Ingersoll >>>>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -------------------------------------------- >>>>>>>>>>> Grant Ingersoll >>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>> -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com
