That's an excellent analogy! Employing that strategy, would it be possible
(and not too expensive) to do the QAQ^-1 operation to get the original data
matrix, after we've clustered the points in eigenspace?

On Tue, May 24, 2011 at 2:59 PM, Jeff Eastman <jeast...@narus.com> wrote:

> For the display example, it is not necessary to cluster the original
> points. The other clustering display examples only train the clusters and do
> not classify the points. They are drawn first and the cluster centers &
> radii are superimposed afterwards. Thus I think it is only necessary to
> back-transform the clusters.
>
> My EE gut tells me this is like Fourier transforms between time- and
> frequency-domains. If this is true then what we need is the inverse
> transform. Is this a correct analogy?
>
> -----Original Message-----
> From: squinn.squ...@gmail.com [mailto:squinn.squ...@gmail.com] On Behalf
> Of Shannon Quinn
> Sent: Tuesday, May 24, 2011 11:39 AM
> To: dev@mahout.apache.org
> Subject: Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> fails
>
> This is actually something I could use a little expert Hadoop assistance
> on.
> The general idea is that the points that are clustered in eigenspace have a
> 1-to-1 correspondence with the original points (which is how you get your
> cluster assignments), but this back-mapping after clustering isn't
> explicitly implemented yet, since that's the core of the IO issue.
>
> My block on this is my lack of understanding in how the actual ordering of
> the points change (or not?) from when they are projected into eigenspace
> (the Lanczos solver) and when K-means makes its cluster assignments. On a
> one-node setup the original ordering appears to be preserved through all
> the
> operations, so the labels of the original points can be assigned by giving
> original_point[i] the label of projected_point[i], hence the cluster
> assignments are easy to determine. For multi-node setups, however, I simply
> don't know if this heuristic holds.
>
> But I believe the immediate issue here is that we're feeding the projected
> points to the display, when it should be the original points *annotated*
> with the cluster assignments from the corresponding projected points. The
> question is how to shift those assignments over robustly; right now it's
> just a hack job in the SpectralKMeansDriver...or maybe (hopefully!) it's
> just the version I have locally :o)
>
> On Tue, May 24, 2011 at 2:13 PM, Jeff Eastman <jeast...@narus.com> wrote:
>
> > Yes, I expect it is pilot error on my part. The original implementation
> was
> > failing in this manner because I was requesting 5 eigenvectors
> (clusters). I
> > changed it to 2 and now it displays something but it is not even close to
> > correct. I think this is because I have not transformed back from eigen
> > space to vector space. This all relates to the IO issue for the spectral
> > clustering code which I don't grok.
> >
> > The display driver begins with the sample points and generates the
> affinity
> > matrix using a distance measure. Not clear this is even a correct
> > interpretation of that matrix. Then spectral kmeans runs and produces 2
> > clusters which I display directly. Seems like this number should be more
> > like the k in kmeans, and 5 was more realistic given the data. I believe
> > there is a missing output transformation to recover the clusters from the
> > eigenvectors but I don't know how to do that.
> >
> > I bet you do :)
> >
> > -----Original Message-----
> > From: Shannon Quinn (JIRA) [mailto:j...@apache.org]
> > Sent: Tuesday, May 24, 2011 8:07 AM
> > To: dev@mahout.apache.org
> > Subject: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example
> > fails
> >
> >
> >    [
> >
> https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038608#comment-13038608
> ]
> >
> > Shannon Quinn commented on MAHOUT-524:
> > --------------------------------------
> >
> > +1, I'm on it.
> >
> > I'm a little unclear as to the context of the initial Hudson comment: the
> > display method is expecting 2D vectors, but getting 5D ones?
> >
> > > DisplaySpectralKMeans example fails
> > > -----------------------------------
> > >
> > >                 Key: MAHOUT-524
> > >                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
> > >             Project: Mahout
> > >          Issue Type: Bug
> > >          Components: Clustering
> > >    Affects Versions: 0.4, 0.5
> > >            Reporter: Jeff Eastman
> > >            Assignee: Jeff Eastman
> > >              Labels: clustering, k-means, visualization
> > >             Fix For: 0.6
> > >
> > >         Attachments: aff.txt, raw.txt, spectralkmeans.png
> > >
> > >
> > > I've committed a new display example that attempts to push the standard
> > mixture of models data set through spectral k-means. After some tweaking
> of
> > configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> > k-means to completion. The display example is expecting 2-d clustered
> points
> > and the example is producing 5-d points. Additional I/O work is needed
> > before this will play with the rest of the clustering algorithms.
> >
> > --
> > This message is automatically generated by JIRA.
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
>

Reply via email to