Ok, here's another interesting bit that I somehow only just now uncovered: my verification job isn't returning any eigenvectors. When I hand the ${mapred.output.dir}/largestCleanEigen path off to the KMeans buildRandomSeed() method, it errors out with an IndexOutOfBoundsException after attempting to access index 0 (the actual sequence file supposedly containing the results has the header and nothing else in it).

I started with the parameters of minEigenValue = 0.01, maxError = 0.05, but relaxed these to 0.0 and 1, respectively, with no effect.

Also, why is the computePairwiseInnerProducts() method called in the verification job's run(), but the return value (a VectorIterable) never used?

Thanks!

Shannon

On 6/24/2010 2:25 PM, Jake Mannix wrote:
On Thu, Jun 24, 2010 at 6:21 PM, Ted Dunning<[email protected]>  wrote:

I think that the normal nomenclature is to assume that the eigen-vectors
are
column vectors (hence the V' in the singular decomposition) and thus most
references would refer to clustering *rows* of the eigenvector matrix
(which
has one row per column of the original matrix and one column per
eigenvalue).

Everything in Distributed-Mahout matrix land is a row.
We have no columns here, sorry to break convention. :)

   -jake


Reply via email to