[
https://issues.apache.org/jira/browse/MAHOUT-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414675#comment-13414675
]
Jeff Eastman commented on MAHOUT-1045:
--------------------------------------
I tried changing the initialization of the max loop variable from 0 to
Double.MIN_VALUE (see below) and now it produces a legal value (0.656) for
average Intra-Cluster Density. Is this useful? Does it look like a reasonable
change?
{code}
public double intraClusterDensity() {
pruneInvalidClusters();
double avgDensity = 0;
for (Cluster cluster : clusters) {
int count = 0;
double max = Double.MIN_VALUE;
double min = Double.MAX_VALUE;
double sum = 0;
List<VectorWritable> repPoints =
representativePoints.get(cluster.getId());
for (int i = 0; i < repPoints.size(); i++) {
for (int j = i + 1; j < repPoints.size(); j++) {
double d = measure.distance(repPoints.get(i).get(),
repPoints.get(j).get());
min = Math.min(d, min);
max = Math.max(d, max);
sum += d;
count++;
}
}
double denom = max - min;
double density = (sum / count - min) / denom;
avgDensity += density;
log.info("Intra-Cluster Density[{}] = {}", cluster.getId(), density);
}
avgDensity = clusters.isEmpty() ? 0 : avgDensity / clusters.size();
log.info("Intra-Cluster Density = {}", avgDensity);
return avgDensity;
}
{code}
> Cluster evaluators returning bad results
> ----------------------------------------
>
> Key: MAHOUT-1045
> URL: https://issues.apache.org/jira/browse/MAHOUT-1045
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6, 0.7, 0.8
> Environment: Several environments and data sets
> Reporter: Pat Ferrel
> Fix For: 0.8
>
>
> With real world crawl data the Intra-cluster density from ClusterEvaluator is
> almost always NaN. The CDbw inter-cluster density is almost always 0. I have
> also seen several cases where CDbw fails to return any results but have not
> tracked down why yet.
> I have sent a link to an 8G data set that reproduces these errors to Jeff
> Eastman.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira