I ran the K-means clustering algorithm against a set of sequence files. However, the generated result looks like this:
0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] Would you like to let me know why I get this type of result? Is that because of any specific parameter setting requirement or anything else? The program I use is borrowed from NewsKMeansClustering.java, an example given in chapter 9 of Mahout-in-Action. The core clustering code in this program is CanopyDriver.run(vectorsFolder, canopyCentroids, new EuclideanDistanceMeasure(), 250, 120, false, false); KMeansDriver.run(conf, vectorsFolder, new Path(canopyCentroids, "clusters-0"), clusterOutput, new TanimotoDistanceMeasure(), 0.01, 20, true, false);
