[
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414757#comment-13414757
]
Pat Ferrel edited comment on MAHOUT-1045 at 7/15/12 8:06 PM:
-------------------------------------------------------------
I changed the init values in inter and intra-cluster methods and iterated
through intra-cluster calc until the first time you get a density = NaN. The
breakpoint is on this line.
log.info("Intra-Cluster Density[{}] = {}", cluster.getId(), density);
I've attached the debug output. It will mean more to you than me but one odd
thing is that the cluster radius vector has weights of NaN on some keys?
The way the average is calculated once a NaN comes along the average will stay
NaN. It look likes once the radius is wonked the distance measure returns 0 and
so density = NaN. All of which seems to indicate that any NaN for any cluster
will mess up the calc.
Something wrong in the radius calculation or is that a red herring?
was (Author: pferrel):
I changed the init values in inter and intra-cluster methods and iterated
through intra-cluster calc until the first time you get a density = NaN. The
breakpoint is on this line.
log.info("Intra-Cluster Density[{}] = {}", cluster.getId(), density);
I've attached the debug output. It will mean more to you than me but one odd
thing is that the cluster radius vector has weights of NaN on some keys?
The way the average is calculated once a NaN comes along the average will stay
NaN. It look likes once the radius is wonked the distance measure returns 0 and
so density = NaN. All of which seems to indicate that any NaN for any cluster
will mess up the calc.
> Cluster evaluators returning bad results
> ----------------------------------------
>
> Key: MAHOUT-1045
> URL: https://issues.apache.org/jira/browse/MAHOUT-1045
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6, 0.7, 0.8
> Environment: Several environments and data sets
> Reporter: Pat Ferrel
> Fix For: 0.8
>
> Attachments: MAHOUT-1045.patch, first-time-density-nan.txt
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have
> also seen several cases where CDbw fails to return any results but have not
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff
> Eastman.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira