Make cluster top terms code more reusable
-----------------------------------------
Key: MAHOUT-845
URL: https://issues.apache.org/jira/browse/MAHOUT-845
Project: Mahout
Issue Type: Improvement
Components: Clustering
Reporter: Frank Scholten
Priority: Minor
When working with Mahout text clustering I find that I keep writing code
similar to the contents of
public static String getTopFeatures(Cluster cluster, String[] dictionary, int
numTerms)
in ClusterDumper in order to determine cluster labels.
I think it would be useful if (parts of) this code are added to the cluster or
vector API so that you could do something like
Cluster cluster = ... // get the cluster from seq file iterable
String clusterLabel = cluster.getTopTerms(1, dictionary); // Do something with
the label
I think this would make it easier to export and post-process clustering
results, like indexing or storing them elsewhere.
Thoughts?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira