To get ideas for my BaumWelch Driver class for Mahout-627, I have been
studying the K-Means implementation carefully.

In KMeansDriver.java, the function runIteration is responsible for
dispatching a single MapReduce job. It contains the following constructs for
setting the output key and value types from the mapper.

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(ClusterObservations.class);

However in the core mapping operation performed by the
emitPointToNearestCluster function present in the KMeansClusterer.java, I
find that the output key is of the type Text, and that the output values are
of the type ClusterObservations which implements Writable:

context.write(new Text(nearestCluster.getIdentifier()), new
ClusterObservations(1, point, point.times(point)));

Why is the KMeansDriver setting the mapper's output keys and values types
explicitly?

Reply via email to