I dont recall. ------ Robin Anil
On Thu, Jun 21, 2012 at 2:54 PM, Dan Brickley <[email protected]> wrote: > Robin, > > Do you remember if this test ran successfully to completion? If not, > I'll submit a JIRA when I've a complete log of a failed run... > > Dan > > ---------- Forwarded message ---------- > From: Grant Ingersoll <[email protected]> > Date: 21 June 2012 21:33 > Subject: Re: Spectral Kmeans wiki category data test - can you confirm > if you ran it to completion? > To: Dan Brickley <[email protected]> > Cc: Shannon Quinn <[email protected]> > > > I'd ask on dev@, as Robin was actually the one who ran it. > > On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote: > > Hi > > With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in > 0.7, this doesn't die so quickly ... but I'm still not seeing it run > to completion. > > Using the template commandline you suggested, 'bin/mahout > spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o > your/output/path/ > > I've seen it fail with -k 20, and -k 10 > > Unfortunately I was running this in a screen session without proper > logging and I want to double-check everything before reporting so I'm > re-running with -k 10 now and will file a bug if it fails, ... but > meanwhile I wanted to check in with you to see if you'd had a > successful run. I'm testing with the 0.7 distro. > > The failure was an IndexException, here's the -k 20 version, > > mahout spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/ -o > spectral/output/ > > 12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the > corpus so far... > Exception in thread "main" org.apache.mahout.math.IndexException: > Index 20 is outside allowable range of [0,20) > at > org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479) > at > org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132) > at > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73) > at > org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148) > at > org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86) > > It's barfing out here, > > // Next step: perform eigen-decomposition using LanczosSolver > // since some of the eigen-output is spurious and will be eliminated > // upon verification, we have to aim to overshoot and then discard > // unnecessary vectors later > int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER); > DistributedLanczosSolver solver = new DistributedLanczosSolver(); > LanczosState state = new LanczosState(L, overshoot, > solver.getInitialVector(L)); > Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" + > (System.nanoTime() & 0xFF)); > solver.runJob(conf, > state, > overshoot, > true, > lanczosSeqFiles.toString()); > > With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10 > passes through the corpus so far... > Exception in thread "main" org.apache.mahout.math.IndexException: > Index 10 is outside allowable range of [0,10) > at > org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)". > > ...although the logs also showed "12/06/20 20:40:18 INFO > lanczos.LanczosSolver: Finding 20 singular vectors of matrix with > 4192499 rows, via Lanczos" which confused me until Shannon reminded me > of the overshoot. > > I'm happy to +cc the mailing lists but for starters thought I'd check > to see if the test run had succeeded for you; if so, maybe I've some > local problem. > > Dan > > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com >
