For the display example, it is not necessary to cluster the original points. The other clustering display examples only train the clusters and do not classify the points. They are drawn first and the cluster centers & radii are superimposed afterwards. Thus I think it is only necessary to back-transform the clusters.
My EE gut tells me this is like Fourier transforms between time- and frequency-domains. If this is true then what we need is the inverse transform. Is this a correct analogy? -----Original Message----- From: squinn.squ...@gmail.com [mailto:squinn.squ...@gmail.com] On Behalf Of Shannon Quinn Sent: Tuesday, May 24, 2011 11:39 AM To: dev@mahout.apache.org Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails This is actually something I could use a little expert Hadoop assistance on. The general idea is that the points that are clustered in eigenspace have a 1-to-1 correspondence with the original points (which is how you get your cluster assignments), but this back-mapping after clustering isn't explicitly implemented yet, since that's the core of the IO issue. My block on this is my lack of understanding in how the actual ordering of the points change (or not?) from when they are projected into eigenspace (the Lanczos solver) and when K-means makes its cluster assignments. On a one-node setup the original ordering appears to be preserved through all the operations, so the labels of the original points can be assigned by giving original_point[i] the label of projected_point[i], hence the cluster assignments are easy to determine. For multi-node setups, however, I simply don't know if this heuristic holds. But I believe the immediate issue here is that we're feeding the projected points to the display, when it should be the original points *annotated* with the cluster assignments from the corresponding projected points. The question is how to shift those assignments over robustly; right now it's just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's just the version I have locally :o) On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <jeast...@narus.com> wrote: > Yes, I expect it is pilot error on my part. The original implementation was > failing in this manner because I was requesting 5 eigenvectors (clusters). I > changed it to 2 and now it displays something but it is not even close to > correct. I think this is because I have not transformed back from eigen > space to vector space. This all relates to the IO issue for the spectral > clustering code which I don't grok. > > The display driver begins with the sample points and generates the affinity > matrix using a distance measure. Not clear this is even a correct > interpretation of that matrix. Then spectral kmeans runs and produces 2 > clusters which I display directly. Seems like this number should be more > like the k in kmeans, and 5 was more realistic given the data. I believe > there is a missing output transformation to recover the clusters from the > eigenvectors but I don't know how to do that. > > I bet you do :) > > -----Original Message----- > From: Shannon Quinn (JIRA) [mailto:j...@apache.org] > Sent: Tuesday, May 24, 2011 8:07 AM > To: dev@mahout.apache.org > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > > [ > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608] > > Shannon Quinn commented on MAHOUT-524: > -------------------------------------- > > +1, I'm on it. > > I'm a little unclear as to the context of the initial Hudson comment: the > display method is expecting 2D vectors, but getting 5D ones? > > > DisplaySpectralKMeans example fails > > ----------------------------------- > > > > Key: MAHOUT-524 > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > > Affects Versions: 0.4, 0.5 > > Reporter: Jeff Eastman > > Assignee: Jeff Eastman > > Labels: clustering, k-means, visualization > > Fix For: 0.6 > > > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed > before this will play with the rest of the clustering algorithms. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira >