That's an excellent analogy! Employing that strategy, would it be possible (and not too expensive) to do the QAQ^-1 operation to get the original data matrix, after we've clustered the points in eigenspace?
On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <jeast...@narus.com> wrote: > For the display example, it is not necessary to cluster the original > points. The other clustering display examples only train the clusters and do > not classify the points. They are drawn first and the cluster centers & > radii are superimposed afterwards. Thus I think it is only necessary to > back-transform the clusters. > > My EE gut tells me this is like Fourier transforms between time- and > frequency-domains. If this is true then what we need is the inverse > transform. Is this a correct analogy? > > -----Original Message----- > From: squinn.squ...@gmail.com [mailto:squinn.squ...@gmail.com] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 11:39 AM > To: dev@mahout.apache.org > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > This is actually something I could use a little expert Hadoop assistance > on. > The general idea is that the points that are clustered in eigenspace have a > 1-to-1 correspondence with the original points (which is how you get your > cluster assignments), but this back-mapping after clustering isn't > explicitly implemented yet, since that's the core of the IO issue. > > My block on this is my lack of understanding in how the actual ordering of > the points change (or not?) from when they are projected into eigenspace > (the Lanczos solver) and when K-means makes its cluster assignments. On a > one-node setup the original ordering appears to be preserved through all > the > operations, so the labels of the original points can be assigned by giving > original_point[i] the label of projected_point[i], hence the cluster > assignments are easy to determine. For multi-node setups, however, I simply > don't know if this heuristic holds. > > But I believe the immediate issue here is that we're feeding the projected > points to the display, when it should be the original points *annotated* > with the cluster assignments from the corresponding projected points. The > question is how to shift those assignments over robustly; right now it's > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's > just the version I have locally :o) > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <jeast...@narus.com> wrote: > > > Yes, I expect it is pilot error on my part. The original implementation > was > > failing in this manner because I was requesting 5 eigenvectors > (clusters). I > > changed it to 2 and now it displays something but it is not even close to > > correct. I think this is because I have not transformed back from eigen > > space to vector space. This all relates to the IO issue for the spectral > > clustering code which I don't grok. > > > > The display driver begins with the sample points and generates the > affinity > > matrix using a distance measure. Not clear this is even a correct > > interpretation of that matrix. Then spectral kmeans runs and produces 2 > > clusters which I display directly. Seems like this number should be more > > like the k in kmeans, and 5 was more realistic given the data. I believe > > there is a missing output transformation to recover the clusters from the > > eigenvectors but I don't know how to do that. > > > > I bet you do :) > > > > -----Original Message----- > > From: Shannon Quinn (JIRA) [mailto:j...@apache.org] > > Sent: Tuesday, May 24, 2011 8:07 AM > > To: dev@mahout.apache.org > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > > fails > > > > > > [ > > > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 > ] > > > > Shannon Quinn commented on MAHOUT-524: > > -------------------------------------- > > > > +1, I'm on it. > > > > I'm a little unclear as to the context of the initial Hudson comment: the > > display method is expecting 2D vectors, but getting 5D ones? > > > > > DisplaySpectralKMeans example fails > > > ----------------------------------- > > > > > > Key: MAHOUT-524 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > > Project: Mahout > > > Issue Type: Bug > > > Components: Clustering > > > Affects Versions: 0.4, 0.5 > > > Reporter: Jeff Eastman > > > Assignee: Jeff Eastman > > > Labels: clustering, k-means, visualization > > > Fix For: 0.6 > > > > > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > > > > > > > I've committed a new display example that attempts to push the standard > > mixture of models data set through spectral k-means. After some tweaking > of > > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > > k-means to completion. The display example is expecting 2-d clustered > points > > and the example is producing 5-d points. Additional I/O work is needed > > before this will play with the rest of the clustering algorithms. > > > > -- > > This message is automatically generated by JIRA. > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > >