Re: CIMapper Question

Sean Owen Sun, 12 Feb 2012 08:27:36 -0800

Exactly right, and that's exactly the answer in some form.
PolymorphicWritable isn't suitable if you're writing a lot of records
as the overhead of writing a 40-byte string is too much at scale.


On Sun, Feb 12, 2012 at 4:01 PM, Ted Dunning <[email protected]> wrote:
> But this sounds like a runtime problem, not a type checking problem.
>
> Polymorphism is generally a problem in the Hadoop API.   That is why we
> have VectorWritable and why I added PolymorphicWritable.
>
> Jeff,
>
> Two questions:
>
> 1) would PolymorphicWritable<Cluster> help?
>
> 2) can you say more about what the IOException is?  Does it give any hints?
>
> On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan <[email protected]> wrote:
>
>> Can something like this help?
>>
>> public class CIMapper<T extends Cluster> extends
>> Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T> {
>> ...
>> }
>>
>> On 12-02-2012 06:48, Jeff Eastman wrote:
>>
>>> I'm wondering how to tease the elephant into accepting any concrete
>>> instance of the interface o.a.m.clustering.Cluster when writing trained
>>> clusters in the cleanup() method of CIMapper. I've gotten the MR version of
>>> the ClusterIterator to get to that point in testing but it blows chunks
>>> with an IOException when I try to pass a o.a.m.clustering.kmeans.**Cluster
>>> (I will rename the latter for 0.7). Seems the MapTask.collect() wants ==
>>> and not instanceof.
>>>
>>> I've talked with Ted about passing Clusters rather than the current
>>> ClusterObservations but don't see how at this point. Any ideas?
>>>
>>>
>>>
>>

Re: CIMapper Question

Reply via email to