[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086595#comment-13086595 ]
Jeff Eastman commented on MAHOUT-524: ------------------------------------- The original example was extracting 5 eigenvectors and thus returned 5-d results. I changed it to extract 2 vectors and it used to run but displayed incorrect results. I'm (still since pre 0.5 testing, IIRC) getting a FileNotFoundException in the bowels of DRM.times while running this in local Hadoop mode. I wonder if it is possible to add a --method sequential implementation for SpectralKMeans to help separate the algorithmetic issues from the file bookkeeping ones? We have a sequential Lanczos implementation... Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist. at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:222) at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104) at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:72) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:155) at org.apache.mahout.clustering.display.DisplaySpectralKMeans.main(DisplaySpectralKMeans.java:72) Caused by: java.io.FileNotFoundException: File file:/home/dev/workspace/mahout/examples/output/calculations/laplacian-33/tmp/data does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:51) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:211) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:765) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1200) at org.apache.mahout.math.hadoop.DistributedRowMatrix.times(DistributedRowMatrix.java:214) ... 4 more > DisplaySpectralKMeans example fails > ----------------------------------- > > Key: MAHOUT-524 > URL: https://issues.apache.org/jira/browse/MAHOUT-524 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.4, 0.5 > Reporter: Jeff Eastman > Assignee: Jeff Eastman > Labels: clustering, k-means, visualization > Fix For: 0.6 > > Attachments: aff.txt, raw.txt, spectralkmeans.png > > > I've committed a new display example that attempts to push the standard > mixture of models data set through spectral k-means. After some tweaking of > configuration arguments and a bug fix in EigenCleanupJob it runs spectral > k-means to completion. The display example is expecting 2-d clustered points > and the example is producing 5-d points. Additional I/O work is needed before > this will play with the rest of the clustering algorithms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira