The problem really arises when you have to tell the Job what the class of the Mapper key/value is. It needs something concrete. The issue is not here in the Mapper declaration.
The general answer is, no, it has to somehow know what it's reading before it reads it. You can accomplish this by, say, writing the class name in the output. By default this is how Java serialization works. It doesn't work at all for many purposes here because that class name is so much overhead. VectorWritable does something that splits the two -- has a tiny header where a few bits indicate "sparse" or "named", etc. and this is enough to know what representation was written and so how to read it. On Sun, Feb 12, 2012 at 3:00 PM, Paritosh Ranjan <pran...@xebia.com> wrote: > Can something like this help? > > public class CIMapper<T extends Cluster> extends > Mapper<WritableComparable<?>,VectorWritable,IntWritable,T> { > ... > > } > > On 12-02-2012 06:48, Jeff Eastman wrote: >> >> I'm wondering how to tease the elephant into accepting any concrete >> instance of the interface o.a.m.clustering.Cluster when writing trained >> clusters in the cleanup() method of CIMapper. I've gotten the MR version of >> the ClusterIterator to get to that point in testing but it blows chunks with >> an IOException when I try to pass a o.a.m.clustering.kmeans.Cluster (I will >> rename the latter for 0.7). Seems the MapTask.collect() wants == and not >> instanceof. >> >> I've talked with Ted about passing Clusters rather than the current >> ClusterObservations but don't see how at this point. Any ideas? >> >> >