Hi all,
Hopefully these two questions will be my last, at least until my next
sprint... :)
I've run the EigenVerification task, and from what I can tell it
modifies the SequenceFiles themselves that contain the results of the
LanczosSolver. My first question is fairly straightforward: since I need
to do as Jake suggested earlier - set my desiredRank for the
LanczosSolver as 1.2-1.5 times what I actually want, then discard the
highest-order eigenvectors down to exactly desiredRank - how do I
actually perform the discard of the extra rows in the SequenceFiles? I
tried making a DistributedRowMatrix out of the results and hard-setting
the number of rows, but all the rows written by the LanczosSolver showed up.
Part of this spectral clustering is to use the components of the
eigenvectors as proxies for the real data, so after I've performed
k-means clustering, I need to be able to read the cluster assignments
programmatically, and transfer those assignments back to the original
data. I know of the clusterdump tool, but to be honest I'm having
trouble interpreting its output, plus I'm unsure of how I would output
the cluster assignments from my program. It would seem, for
compatibility purposes, that the format of clusterdump would be ideal,
but I'm not sure how to do this when I'm proxying the cluster
assignments. Any thoughts on this would be wonderful.
Thank you!
Shannon
- Selectively discarding EigenVerification results and cluster... Shannon Quinn
-