[
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419937#comment-13419937
]
Pat Ferrel commented on MAHOUT-1045:
------------------------------------
As to the CDbw Inter-Cluster Density: 0.0
When I was reading the paper it came from I was struck with a couple
observations. First the data seemed somewhat contrived and the clustering
algorithms were quirky in that they seemed to have been designed or chosen to
solve problems in the data set examined. Since I'm planning to put some weight
on the values I'll do a search to see how often the paper has been sited.
Unless you have already done that.
In any case I see what you mean about its calculation and since it does not
even enter the validity index calc will consider the issue closed.
I'll run this on the same data set with the same values for k and compare the
new results with the previous ones. Back in a bit...
> Cluster evaluators returning bad results
> ----------------------------------------
>
> Key: MAHOUT-1045
> URL: https://issues.apache.org/jira/browse/MAHOUT-1045
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6, 0.7, 0.8
> Environment: Several environments and data sets
> Reporter: Pat Ferrel
> Fix For: 0.8
>
> Attachments: MAHOUT-1045.patch, MAHOUT-1045.patch, MAHOUT-1045.patch,
> MAHOUT-1045.patch, first-time-density-nan.txt
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have
> also seen several cases where CDbw fails to return any results but have not
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff
> Eastman.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira