Selectively discarding EigenVerification results and clustering assignments

Shannon Quinn Thu, 24 Jun 2010 10:03:38 -0700

Hi all,

Hopefully these two questions will be my last, at least until my nextsprint... :)

I've run the EigenVerification task, and from what I can tell itmodifies the SequenceFiles themselves that contain the results of theLanczosSolver. My first question is fairly straightforward: since I needto do as Jake suggested earlier - set my desiredRank for theLanczosSolver as 1.2-1.5 times what I actually want, then discard thehighest-order eigenvectors down to exactly desiredRank - how do Iactually perform the discard of the extra rows in the SequenceFiles? Itried making a DistributedRowMatrix out of the results and hard-settingthe number of rows, but all the rows written by the LanczosSolver showed up.

Part of this spectral clustering is to use the components of theeigenvectors as proxies for the real data, so after I've performedk-means clustering, I need to be able to read the cluster assignmentsprogrammatically, and transfer those assignments back to the originaldata. I know of the clusterdump tool, but to be honest I'm havingtrouble interpreting its output, plus I'm unsure of how I would outputthe cluster assignments from my program. It would seem, forcompatibility purposes, that the format of clusterdump would be ideal,but I'm not sure how to do this when I'm proxying the clusterassignments. Any thoughts on this would be wonderful.


Thank you!

Shannon

Selectively discarding EigenVerification results and clustering assignments

Reply via email to