Ok, here's another interesting bit that I somehow only just now
uncovered: my verification job isn't returning any eigenvectors. When I
hand the ${mapred.output.dir}/largestCleanEigen path off to the KMeans
buildRandomSeed() method, it errors out with an
IndexOutOfBoundsException after attempting to access index 0 (the actual
sequence file supposedly containing the results has the header and
nothing else in it).
I started with the parameters of minEigenValue = 0.01, maxError = 0.05,
but relaxed these to 0.0 and 1, respectively, with no effect.
Also, why is the computePairwiseInnerProducts() method called in the
verification job's run(), but the return value (a VectorIterable) never
used?
Thanks!
Shannon
On 6/24/2010 2:25 PM, Jake Mannix wrote:
On Thu, Jun 24, 2010 at 6:21 PM, Ted Dunning<[email protected]> wrote:
I think that the normal nomenclature is to assume that the eigen-vectors
are
column vectors (hence the V' in the singular decomposition) and thus most
references would refer to clustering *rows* of the eigenvector matrix
(which
has one row per column of the original matrix and one column per
eigenvalue).
Everything in Distributed-Mahout matrix land is a row.
We have no columns here, sorry to break convention. :)
-jake