Hi Jeff,

It's great to see some discussion on this. I ran into a similar problem when 
trying to make the SplitInput job work for any arbitrary key and value classes. 
In the end I was able to side step the issue by just reading the key and value 
classes from the SequenceFileInput, but I never found a way to deal with this 
head on.

On 12 Feb, 2012, at 8:35 AM, Jeff Eastman wrote:

> Thanks Sean & Ted. That is what I've observed experimentally. I was going to 
> pursue a ClusterWriteable along the lines of VectorWritable but will try 
> PolymorphicWritable<Cluster> first. Looking at it, I see it does send the 
> class name which might be onerous as Sean observed except for the fact that I 
> am only sending (k) clusters between each mapper and the reducer. I will 
> report on this an an hour or so.
> 
> On 2/12/12 9:01 AM, Ted Dunning wrote:
>> But this sounds like a runtime problem, not a type checking problem.
>> 
>> Polymorphism is generally a problem in the Hadoop API.   That is why we
>> have VectorWritable and why I added PolymorphicWritable.
>> 
>> Jeff,
>> 
>> Two questions:
>> 
>> 1) would PolymorphicWritable<Cluster>  help?
>> 
>> 2) can you say more about what the IOException is?  Does it give any hints?
>> 
>> On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan<pran...@xebia.com>  wrote:
>> 
>>> Can something like this help?
>>> 
>>> public class CIMapper<T extends Cluster>  extends
>>> Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T>  {
>>> ...
>>> }
>>> 
>>> On 12-02-2012 06:48, Jeff Eastman wrote:
>>> 
>>>> I'm wondering how to tease the elephant into accepting any concrete
>>>> instance of the interface o.a.m.clustering.Cluster when writing trained
>>>> clusters in the cleanup() method of CIMapper. I've gotten the MR version of
>>>> the ClusterIterator to get to that point in testing but it blows chunks
>>>> with an IOException when I try to pass a o.a.m.clustering.kmeans.**Cluster
>>>> (I will rename the latter for 0.7). Seems the MapTask.collect() wants ==
>>>> and not instanceof.
>>>> 
>>>> I've talked with Ted about passing Clusters rather than the current
>>>> ClusterObservations but don't see how at this point. Any ideas?
>>>> 
>>>> 
>>>> 
> 

Reply via email to