That's the way it has always been done. Kmeans was one of the first Mahout algorithms and that driver code has been around for maybe 3 years. Is there a better way?
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv Kumar Sent: Monday, May 23, 2011 1:39 PM To: [email protected] Subject: Why does KMeansDriver set the map output key and value types explicitly? To get ideas for my BaumWelch Driver class for Mahout-627, I have been studying the K-Means implementation carefully. In KMeansDriver.java, the function runIteration is responsible for dispatching a single MapReduce job. It contains the following constructs for setting the output key and value types from the mapper. job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(ClusterObservations.class); However in the core mapping operation performed by the emitPointToNearestCluster function present in the KMeansClusterer.java, I find that the output key is of the type Text, and that the output values are of the type ClusterObservations which implements Writable: context.write(new Text(nearestCluster.getIdentifier()), new ClusterObservations(1, point, point.times(point))); Why is the KMeansDriver setting the mapper's output keys and values types explicitly?
