Make cluster top terms code more reusable
-----------------------------------------

                 Key: MAHOUT-845
                 URL: https://issues.apache.org/jira/browse/MAHOUT-845
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
            Reporter: Frank Scholten
            Priority: Minor


When working with Mahout text clustering I find that I keep writing code 
similar to the contents of

public static String getTopFeatures(Cluster cluster, String[] dictionary, int 
numTerms)

in ClusterDumper in order to determine cluster labels.

I think it would be useful if (parts of) this code are added to the cluster or 
vector API so that you could do something like

Cluster cluster = ... // get the cluster from seq file iterable
String clusterLabel = cluster.getTopTerms(1, dictionary); // Do something with 
the label  

I think this would make it easier to export and post-process clustering 
results, like indexing or storing them elsewhere.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to