[ https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859040#action_12859040 ]
Ted Dunning commented on MAHOUT-236: ------------------------------------ Typically any place where you have an algorithm that assumes a hard-membership, but what you have is a soft membership clustering algorithm, you can just pick the cluster with the strongest membership signal. You don't need a threshold. Conversely, in applications where you need soft membership and have hard membership, you should insert (1-epsilon) for the one cluster the document is in and epsilon/(k-1) for the other k-1 clusters. Epsilon should be tuned for best results on a corpus but should generally not be zero. > Cluster Evaluation Tools > ------------------------ > > Key: MAHOUT-236 > URL: https://issues.apache.org/jira/browse/MAHOUT-236 > Project: Mahout > Issue Type: New Feature > Components: Clustering > Reporter: Grant Ingersoll > Attachments: MAHOUT-236.patch > > > Per > http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6, > it would be great to have some utilities to help evaluate the effectiveness > of clustering. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.