Exactly right, and that's exactly the answer in some form. PolymorphicWritable isn't suitable if you're writing a lot of records as the overhead of writing a 40-byte string is too much at scale.
On Sun, Feb 12, 2012 at 4:01 PM, Ted Dunning <[email protected]> wrote: > But this sounds like a runtime problem, not a type checking problem. > > Polymorphism is generally a problem in the Hadoop API. That is why we > have VectorWritable and why I added PolymorphicWritable. > > Jeff, > > Two questions: > > 1) would PolymorphicWritable<Cluster> help? > > 2) can you say more about what the IOException is? Does it give any hints? > > On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan <[email protected]> wrote: > >> Can something like this help? >> >> public class CIMapper<T extends Cluster> extends >> Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T> { >> ... >> } >> >> On 12-02-2012 06:48, Jeff Eastman wrote: >> >>> I'm wondering how to tease the elephant into accepting any concrete >>> instance of the interface o.a.m.clustering.Cluster when writing trained >>> clusters in the cleanup() method of CIMapper. I've gotten the MR version of >>> the ClusterIterator to get to that point in testing but it blows chunks >>> with an IOException when I try to pass a o.a.m.clustering.kmeans.**Cluster >>> (I will rename the latter for 0.7). Seems the MapTask.collect() wants == >>> and not instanceof. >>> >>> I've talked with Ted about passing Clusters rather than the current >>> ClusterObservations but don't see how at this point. Any ideas? >>> >>> >>> >>
