Hi Jeff, It's great to see some discussion on this. I ran into a similar problem when trying to make the SplitInput job work for any arbitrary key and value classes. In the end I was able to side step the issue by just reading the key and value classes from the SequenceFileInput, but I never found a way to deal with this head on.
On 12 Feb, 2012, at 8:35 AM, Jeff Eastman wrote: > Thanks Sean & Ted. That is what I've observed experimentally. I was going to > pursue a ClusterWriteable along the lines of VectorWritable but will try > PolymorphicWritable<Cluster> first. Looking at it, I see it does send the > class name which might be onerous as Sean observed except for the fact that I > am only sending (k) clusters between each mapper and the reducer. I will > report on this an an hour or so. > > On 2/12/12 9:01 AM, Ted Dunning wrote: >> But this sounds like a runtime problem, not a type checking problem. >> >> Polymorphism is generally a problem in the Hadoop API. That is why we >> have VectorWritable and why I added PolymorphicWritable. >> >> Jeff, >> >> Two questions: >> >> 1) would PolymorphicWritable<Cluster> help? >> >> 2) can you say more about what the IOException is? Does it give any hints? >> >> On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan<pran...@xebia.com> wrote: >> >>> Can something like this help? >>> >>> public class CIMapper<T extends Cluster> extends >>> Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T> { >>> ... >>> } >>> >>> On 12-02-2012 06:48, Jeff Eastman wrote: >>> >>>> I'm wondering how to tease the elephant into accepting any concrete >>>> instance of the interface o.a.m.clustering.Cluster when writing trained >>>> clusters in the cleanup() method of CIMapper. I've gotten the MR version of >>>> the ClusterIterator to get to that point in testing but it blows chunks >>>> with an IOException when I try to pass a o.a.m.clustering.kmeans.**Cluster >>>> (I will rename the latter for 0.7). Seems the MapTask.collect() wants == >>>> and not instanceof. >>>> >>>> I've talked with Ted about passing Clusters rather than the current >>>> ClusterObservations but don't see how at this point. Any ideas? >>>> >>>> >>>> >