That's the way it has always been done. Kmeans was one of the first Mahout 
algorithms and that driver code has been around for maybe 3 years. Is there a 
better way?

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv Kumar
Sent: Monday, May 23, 2011 1:39 PM
To: [email protected]
Subject: Why does KMeansDriver set the map output key and value types 
explicitly?

To get ideas for my BaumWelch Driver class for Mahout-627, I have been
studying the K-Means implementation carefully.

In KMeansDriver.java, the function runIteration is responsible for
dispatching a single MapReduce job. It contains the following constructs for
setting the output key and value types from the mapper.

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(ClusterObservations.class);

However in the core mapping operation performed by the
emitPointToNearestCluster function present in the KMeansClusterer.java, I
find that the output key is of the type Text, and that the output values are
of the type ClusterObservations which implements Writable:

context.write(new Text(nearestCluster.getIdentifier()), new
ClusterObservations(1, point, point.times(point)));

Why is the KMeansDriver setting the mapper's output keys and values types
explicitly?

Reply via email to