(i've been taking a look into the examples starting with clustering. the kmeans clustering example is broken in trunk but i hope to be able to track down the problem sometime soonish.)
currently org.apache.mahout.clustering.kmeans.Cluster#decodeCluster[1] silently returns a null when it encounters an unknown format. i find that this approach makes it difficult to debug format issues. i'd like to contribute a patch but i'm unsure about the conventional error handling strategy used in Mahout... - robert [1] /** * Decodes and returns a Cluster from the formattedString * * @param formattedString a String produced by formatCluster * @return a new Canopy */ public static Cluster decodeCluster(String formattedString) { int beginIndex = formattedString.indexOf('['); String id = formattedString.substring(0, beginIndex); String center = formattedString.substring(beginIndex); char firstChar = id.charAt(0); boolean startsWithV = firstChar == 'V'; if (firstChar == 'C' || startsWithV) { int clusterId = Integer.parseInt(formattedString.substring(1, beginIndex - 2)); Vector clusterCenter = AbstractVector.decodeVector(center); Cluster cluster = new Cluster(clusterCenter, clusterId); cluster.converged = startsWithV; return cluster; } return null; }