[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206256#comment-13206256 ]
Dan Brickley commented on MAHOUT-524: ------------------------------------- I just tried spectral k-means with some wikipedia/dbpedia data (1.0 affinities for every page and topic category URL pair in the Wiki. Data came from http://downloads.dbpedia.org/3.7/en/article_categories_en.nt.bz2 and is dropped in the Web at http://danbri.org/2012/spectral/dbpedia/ (I posted .csv plus an int-to-URL dictionary file). My best guess at commandline (running this w/ today's trunk + a fresh 0.20.203.0 hadoop pseudo-cluster) was this: mahout spectralkmeans -i wiki/ -o output1 -k 20 -d 4192499 --maxIter 10 (where hdfs wiki/ subdir contains the .csv data file) Unfortunately I'm hitting one of the various problems discussed above. If anyone else can reproduce this, perhaps a fresh JIRA is needed. It gets stuck after the first job, with an essentially empty seqfile. Full transcript here: https://gist.github.com/1804016 (checked with "mahout seqdumper --seqFile output1/calculations/diagonal/part-r-00000") This is essentially the same experience I had back in Sept (see above) running a similar test. > DisplaySpectralKMeans example fails > ----------------------------------- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.4, 0.5 > Reporter: Jeff Eastman > Assignee: Shannon Quinn > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: EclipseLog_20110918.txt, MAHOUT-524.patch, > MAHOUT-524.patch, MAHOUT-524.patch, SpectralKMeans_fail_20110919.txt, > aff.txt, raw.txt, screenshot-1.jpg, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira