Hi Jerry,
DirichletClusters are not similar enough to ClusterBase to make that
workable, so you are correct that the utility won't dump them. Writing a
dump utility that can is a great idea, though it does tend to be rather
Model specific. Maybe Models should have some printable representation
a-la asFormatString().
Look at the code in
/MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
/MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java
for ideas on how you might be able to dump out your DirichletClusters
and their Models.
I've actually considered making ClusterBase into a Model and
generalizing DirichletCluster to be the root of all clusters. I think
the distance measures used by canopy and k-means could be cast as Model
pdfs but the whole idea is still only half-baked.
Jeff
Jerry Ye wrote:
I'm trying to view the output of my experiment using Dirichlet Process
Clustering. When attempting to use the ClusterDumper utility on the output
directory, an exception is thrown. Upon looking closer, DirichletCluster does
not extend ClusterBase. The error is below.
Is there some other way that I can view the cluster labels?
Thanks!
- jerry
-bash-3.1$ java -cp
mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo
dependency/*.jar . | sed 's/ /:/g')
org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException:
org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to
org.apache.mahout.clustering.ClusterBase
at
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)