Re: CIMapper Question

Sean Owen Sun, 12 Feb 2012 07:28:23 -0800

The problem really arises when you have to tell the Job what the class
of the Mapper key/value is. It needs something concrete. The issue is
not here in the Mapper declaration.

The general answer is, no, it has to somehow know what it's reading
before it reads it. You can accomplish this by, say, writing the class
name in the output. By default this is how Java serialization works.
It doesn't work at all for many purposes here because that class name
is so much overhead.

VectorWritable does something that splits the two -- has a tiny header
where a few bits indicate "sparse" or "named", etc. and this is enough
to know what representation was written and so how to read it.

On Sun, Feb 12, 2012 at 3:00 PM, Paritosh Ranjan <pran...@xebia.com> wrote:
> Can something like this help?
>
> public class CIMapper<T extends Cluster> extends
> Mapper<WritableComparable<?>,VectorWritable,IntWritable,T> {
> ...
>
> }
>
> On 12-02-2012 06:48, Jeff Eastman wrote:
>>
>> I'm wondering how to tease the elephant into accepting any concrete
>> instance of the interface o.a.m.clustering.Cluster when writing trained
>> clusters in the cleanup() method of CIMapper. I've gotten the MR version of
>> the ClusterIterator to get to that point in testing but it blows chunks with
>> an IOException when I try to pass a o.a.m.clustering.kmeans.Cluster (I will
>> rename the latter for 0.7). Seems the MapTask.collect() wants == and not
>> instanceof.
>>
>> I've talked with Ted about passing Clusters rather than the current
>> ClusterObservations but don't see how at this point. Any ideas?
>>
>>
>

Re: CIMapper Question

Reply via email to