[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143510#comment-13143510 ]
Shannon Quinn edited comment on MAHOUT-524 at 11/3/11 8:33 PM: --------------------------------------------------------------- After implementing the same code in Python, my suspicions are actually that the clusters of the K-means at the conclusion of the spectral algorithm are throwing off the final results shown in DisplaySKM. Regular K-means is running on the spectral data: the top k-eigenvectors of the affinities, rather than the original data. I don't know K-means well enough to know for sure, but my guess is that all the distance measurements that come back in its output format are relative to the spectral data, rather than the original data. So what you see in the end-result graph are circles around where the spectral data are. That'd be my first guess, anyway. I'm working on a couple things to help with this: a sequential version of spectral k-means, and a job to read raw data (text format: whitespace or comma-separated n-dimensional points) and convert it to affinities (a la issue 518, finally!). Hopefully these will help diagnose spectral k-means. But if it is a data issue, I'm not sure how we can translate the distance measurements on the spectral data back onto the original data for the DisplaySKM code. I would argue, though, that since spectral k-means doesn't operate on the same GMM-type basis that regular K-means does, overlaying K gaussians isn't really what we want here, anyway. If at all possible, my suggestion would be colored dots to indicate the clusters. was (Author: magsol): After implementing the same code in Python, my suspicions are actually that the results of the K-means at the conclusion of the spectral algorithm is throwing off the results. Regular K-means is running on the spectral data: the top k-eigenvectors of the affinities, rather than the original data. I don't know K-means well enough to know for sure, but my guess is that all the distance measurements that come back in its output format are relative to the spectral data, rather than the original data. So what you see in the end-result graph are circles around where the spectral data are. That'd be my first guess, anyway. I'm working on a couple things to help with this: a sequential version of spectral k-means, and a job to read raw data (text format: whitespace or comma-separated n-dimensional points) and convert it to affinities (a la issue 518, finally!). Hopefully these will help diagnose spectral k-means. But if it is a data issue, I'm not sure how we can translate the distance measurements on the spectral data back onto the original data for the DisplaySKM code. I would argue, though, that since spectral k-means doesn't operate on the same GMM-type basis that regular K-means does, overlaying K gaussians isn't really what we want here, anyway. If at all possible, my suggestion would be colored dots to indicate the clusters. > DisplaySpectralKMeans example fails > ----------------------------------- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.4, 0.5 > Reporter: Jeff Eastman > Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, aff.txt, raw.txt, > screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira