I agree, but this will require an API extension to Model, as I suggested
below, because each model type has its own parameters that need to be
represented. I'll open a Jira for it.
Jeff
Grant Ingersoll wrote:
We probably should have ClusterDumper still handle Dirichlet jobs, so that users don't need to deal w/ more than one interface.
On Jan 26, 2010, at 11:25 PM, Jeff Eastman wrote:
Hi Jerry,
DirichletClusters are not similar enough to ClusterBase to make that workable,
so you are correct that the utility won't dump them. Writing a dump utility
that can is a great idea, though it does tend to be rather Model specific.
Maybe Models should have some printable representation a-la asFormatString().
Look at the code in
/MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
/MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java
for ideas on how you might be able to dump out your DirichletClusters and their
Models.
I've actually considered making ClusterBase into a Model and generalizing
DirichletCluster to be the root of all clusters. I think the distance measures
used by canopy and k-means could be cast as Model pdfs but the whole idea is
still only half-baked.
Jeff
Jerry Ye wrote:
I'm trying to view the output of my experiment using Dirichlet Process
Clustering. When attempting to use the ClusterDumper utility on the output
directory, an exception is thrown. Upon looking closer, DirichletCluster does
not extend ClusterBase. The error is below.
Is there some other way that I can view the cluster labels?
Thanks!
- jerry
-bash-3.1$ java -cp
mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo
dependency/*.jar . | sed 's/ /:/g')
org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException:
org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to
org.apache.mahout.clustering.ClusterBase
at
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
at
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search