[ 
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414664#comment-13414664
 ] 

Jeff Eastman commented on MAHOUT-1045:
--------------------------------------

Initial results indicate you have some very, very dense clusters that have max 
distance from the cluster center and min distance both == 0. This makes the 
intra-density NaN for that cluster and, since the overall density is an average 
of the per-cluster intra-density, that too is NaN. I've checked one of those 
clusters and the invalidCluster() calculation indicates at least one of the 
representative points for the cluster is not identical to the cluster center. 
This should yield a non-zero distance, even for CosineDistanceMeasure, but this 
does not seem to be the case. Continuing...
                
> Cluster evaluators returning bad results
> ----------------------------------------
>
>                 Key: MAHOUT-1045
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1045
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6, 0.7, 0.8
>         Environment: Several environments and data sets
>            Reporter: Pat Ferrel
>             Fix For: 0.8
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is 
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have 
> also seen several cases where CDbw fails to return any results but have not 
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff 
> Eastman.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to