[ 
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419936#comment-13419936
 ] 

Jeff Eastman commented on MAHOUT-1045:
--------------------------------------

I would imagine that is a consequence of ignoring NaN, but I also changed the 
inter-cluster density to be an average of the individual densities (as the text 
in the book describes it) though the equation (1) does not normalize it by the 
count. This seems more consistent with the other metrics we compute but it 
might be wrong. The inter-cluster density is also used in the separation. Of 
course, zero / count is still zero so it is moot right now. I'm going to try 
this on the other test cases and see how that looks.
                
> Cluster evaluators returning bad results
> ----------------------------------------
>
>                 Key: MAHOUT-1045
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1045
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6, 0.7, 0.8
>         Environment: Several environments and data sets
>            Reporter: Pat Ferrel
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1045.patch, MAHOUT-1045.patch, MAHOUT-1045.patch, 
> MAHOUT-1045.patch, first-time-density-nan.txt
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is 
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have 
> also seen several cases where CDbw fails to return any results but have not 
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff 
> Eastman.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to