Would that give you the original data matrix, the clustered data matrix, or the clustered affinity matrix? Even with the analogy in mind I'm having trouble connecting the dots. Seems like I lost the original data matrix in step 1 when I used a distance measure to produce A from it. If the returned eigenvectors define Q, then what is the significance of QAQ^-1? And, more importantly, if the Q eigenvectors define the clusters in eigenspace, what is the inverse transformation?
-----Original Message----- From: squinn.squ...@gmail.com [mailto:squinn.squ...@gmail.com] On Behalf Of Shannon Quinn Sent: Tuesday, May 24, 2011 12:07 PM To: dev@mahout.apache.org Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails That's an excellent analogy! Employing that strategy, would it be possible (and not too expensive) to do the QAQ^-1 operation to get the original data matrix, after we've clustered the points in eigenspace? On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <jeast...@narus.com> wrote: > For the display example, it is not necessary to cluster the original > points. The other clustering display examples only train the clusters and do > not classify the points. They are drawn first and the cluster centers & > radii are superimposed afterwards. Thus I think it is only necessary to > back-transform the clusters. > > My EE gut tells me this is like Fourier transforms between time- and > frequency-domains. If this is true then what we need is the inverse > transform. Is this a correct analogy? > > -----Original Message----- > From: squinn.squ...@gmail.com [mailto:squinn.squ...@gmail.com] On Behalf > Of Shannon Quinn > Sent: Tuesday, May 24, 2011 11:39 AM > To: dev@mahout.apache.org > Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > fails > > This is actually something I could use a little expert Hadoop assistance > on. > The general idea is that the points that are clustered in eigenspace have a > 1-to-1 correspondence with the original points (which is how you get your > cluster assignments), but this back-mapping after clustering isn't > explicitly implemented yet, since that's the core of the IO issue. > > My block on this is my lack of understanding in how the actual ordering of > the points change (or not?) from when they are projected into eigenspace > (the Lanczos solver) and when K-means makes its cluster assignments. On a > one-node setup the original ordering appears to be preserved through all > the > operations, so the labels of the original points can be assigned by giving > original_point[i] the label of projected_point[i], hence the cluster > assignments are easy to determine. For multi-node setups, however, I simply > don't know if this heuristic holds. > > But I believe the immediate issue here is that we're feeding the projected > points to the display, when it should be the original points *annotated* > with the cluster assignments from the corresponding projected points. The > question is how to shift those assignments over robustly; right now it's > just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's > just the version I have locally :o) > > On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <jeast...@narus.com> wrote: > > > Yes, I expect it is pilot error on my part. The original implementation > was > > failing in this manner because I was requesting 5 eigenvectors > (clusters). I > > changed it to 2 and now it displays something but it is not even close to > > correct. I think this is because I have not transformed back from eigen > > space to vector space. This all relates to the IO issue for the spectral > > clustering code which I don't grok. > > > > The display driver begins with the sample points and generates the > affinity > > matrix using a distance measure. Not clear this is even a correct > > interpretation of that matrix. Then spectral kmeans runs and produces 2 > > clusters which I display directly. Seems like this number should be more > > like the k in kmeans, and 5 was more realistic given the data. I believe > > there is a missing output transformation to recover the clusters from the > > eigenvectors but I don't know how to do that. > > > > I bet you do :) > > > > -----Original Message----- > > From: Shannon Quinn (JIRA) [mailto:j...@apache.org] > > Sent: Tuesday, May 24, 2011 8:07 AM > > To: dev@mahout.apache.org > > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example > > fails > > > > > > [ > > > https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608 > ] > > > > Shannon Quinn commented on MAHOUT-524: > > -------------------------------------- > > > > +1, I'm on it. > > > > I'm a little unclear as to the context of the initial Hudson comment: the > > display method is expecting 2D vectors, but getting 5D ones? > > > > > DisplaySpectralKMeans example fails > > > ----------------------------------- > > > > > > Key: MAHOUT-524 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > > > Project: Mahout > > > Issue Type: Bug > > > Components: Clustering > > > Affects Versions: 0.4, 0.5 > > > Reporter: Jeff Eastman > > > Assignee: Jeff Eastman > > > Labels: clustering, k-means, visualization > > > Fix For: 0.6 > > > > > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > > > > > > > I've committed a new display example that attempts to push the standard > > mixture of models data set through spectral k-means. After some tweaking > of > > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > > k-means to completion. The display example is expecting 2-d clustered > points > > and the example is producing 5-d points. Additional I/O work is needed > > before this will play with the rest of the clustering algorithms. > > > > -- > > This message is automatically generated by JIRA. > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > >