Hi Jake and Ted,
Let me be clear in understanding this: you take the matrix of eigenvectors,
which has desiredRank rows, of originalSize columns each, and take the
*columns* of this matrix (all originalSize of them, each of which has
desiredRank entries) and cluster them with KMeans, right
I think that the normal nomenclature is to assume that the eigen-vectors are
column vectors (hence the V' in the singular decomposition) and thus most
references would refer to clustering *rows* of the eigenvector matrix (which
has one row per column of the original matrix and one column per
eigenvalue).
This is precisely it; I will have to load the cleaned eigenvectors into
a DistributedRowMatrix and take its transpose in order to get the
arrangement I'm looking for (unless KMeans can interpret the row vectors
on its own?).
Still working on this patch...
Shannon