[ 
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859040#action_12859040
 ] 

Ted Dunning commented on MAHOUT-236:
------------------------------------


Typically any place where you have an algorithm that assumes a hard-membership, 
but what you have is a soft membership clustering algorithm, you can just pick 
the cluster with the strongest membership signal.  You don't need a threshold.

Conversely, in applications where you need soft membership and have hard 
membership, you should insert (1-epsilon) for the one cluster the document is 
in and epsilon/(k-1) for the other k-1 clusters.  Epsilon should be tuned for 
best results on a corpus but should generally not be zero.

 

> Cluster Evaluation Tools
> ------------------------
>
>                 Key: MAHOUT-236
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-236
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Grant Ingersoll
>         Attachments: MAHOUT-236.patch
>
>
> Per 
> http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6,
>  it would be great to have some utilities to help evaluate the effectiveness 
> of clustering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to