To get ideas for my BaumWelch Driver class for Mahout-627, I have been studying the K-Means implementation carefully.
In KMeansDriver.java, the function runIteration is responsible for dispatching a single MapReduce job. It contains the following constructs for setting the output key and value types from the mapper. job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(ClusterObservations.class); However in the core mapping operation performed by the emitPointToNearestCluster function present in the KMeansClusterer.java, I find that the output key is of the type Text, and that the output values are of the type ClusterObservations which implements Writable: context.write(new Text(nearestCluster.getIdentifier()), new ClusterObservations(1, point, point.times(point))); Why is the KMeansDriver setting the mapper's output keys and values types explicitly?
