[ 
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414751#comment-13414751
 ] 

Jeff Eastman commented on MAHOUT-1045:
--------------------------------------

- It's easy enough to create a method that accumulates a map of intra-densities 
and another that takes their average. I will do this and post a patch so you 
can try it.
- It is certainly possible to write the densities to a file too.
- The invalidClusters() pruning is supposed to catch all clusters where the 
representative points are all identical to the cluster center. This can be 
caused by the RepresentativePointsDriver if there are no points assigned to the 
cluster in the clustering step (empty clusters can occur in kmeans and others). 
- Thanks Sean. Changing to NEGATIVE and POSITIVE_INFINITY causes the NaN 
problem to reoccur. I'm going to dig into why those clusters are not getting 
pruned.
- "The Book" is "Mahout in Action" p 146. BTW, it uses 0 and Double.MAX_VALUE 
(-;
                
> Cluster evaluators returning bad results
> ----------------------------------------
>
>                 Key: MAHOUT-1045
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1045
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6, 0.7, 0.8
>         Environment: Several environments and data sets
>            Reporter: Pat Ferrel
>             Fix For: 0.8
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is 
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have 
> also seen several cases where CDbw fails to return any results but have not 
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff 
> Eastman.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to