liutengfei created MAHOUT-1083:
----------------------------------
Summary: CIReducer in kmeans doesn't work well
Key: MAHOUT-1083
URL: https://issues.apache.org/jira/browse/MAHOUT-1083
Project: Mahout
Issue Type: Bug
Environment: hadoop-2.0.0-alpha: pseudo cluster and single node
cluster hadoop-1.0.3: pseudo cluster hadoop-0.20.2:pseudo cluster mahout:
mahout-0.7 os: ubuntu 11.04 jdk: jdk1.6.0_27
Reporter: liutengfei
the function "reduce" in mahout-0.7-kmeans-CIReducer.java doesn't work well as
it looks like.
protected void reduce(IntWritable key, Iterable<ClusterWritable> values,
Context context) throws IOException,
InterruptedException {
Iterator<ClusterWritable> iter = values.iterator();
ClusterWritable first = null;
while (iter.hasNext()) {
ClusterWritable cw = iter.next();
if (first == null) {
first = cw;
} else {
first.getValue().observe(cw.getValue());
}
}
List<Cluster> models = new ArrayList<Cluster>();
models.add(first.getValue());
classifier = new ClusterClassifier(models, policy);
classifier.close();
context.write(key, first);
}
Apparently, the variable "first" will collect all output data of maps. Actually
but, the value of "first" will change after the code "ClusterWritable cw =
iter.next();", same with this new variable "cw"! I don't why but running result
shows that the code runs looks like this:"ClusterWritable cw = first =
iter.next();".
is "cw" a reference a to "iter"?
is "iter.next" just change the value of "iter" itself to the next?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira