(i've been taking a look into the examples starting with clustering.
the kmeans clustering example is broken in trunk but i hope to be able
to track down the problem sometime soonish.)

currently org.apache.mahout.clustering.kmeans.Cluster#decodeCluster[1]
silently returns a null when it encounters an unknown format. i find
that this approach makes it difficult to debug format issues. i'd like
to contribute a patch but i'm unsure about the conventional error
handling strategy used in Mahout...

- robert

[1]
  /**
   * Decodes and returns a Cluster from the formattedString
   *
   * @param formattedString a String produced by formatCluster
   * @return a new Canopy
   */
  public static Cluster decodeCluster(String formattedString) {
    int beginIndex = formattedString.indexOf('[');
    String id = formattedString.substring(0, beginIndex);
    String center = formattedString.substring(beginIndex);
    char firstChar = id.charAt(0);
    boolean startsWithV = firstChar == 'V';
    if (firstChar == 'C' || startsWithV) {
      int clusterId = Integer.parseInt(formattedString.substring(1,
          beginIndex - 2));
      Vector clusterCenter = AbstractVector.decodeVector(center);
      Cluster cluster = new Cluster(clusterCenter, clusterId);
      cluster.converged = startsWithV;
      return cluster;
    }
    return null;
  }

Reply via email to