Re: CIMapper Question

Jeff Eastman Sun, 12 Feb 2012 21:08:29 -0800

PolymorphicWritable actually works great in the two applications of it Icommitted today. They are low-volume of course so the overhead ofwriting the class name is not onerous.


On 2/12/12 9:57 PM, Lance Norskog wrote:

Another option is TupleWritable. But pull the source and make sure it
works, I had problems.


On Sun, Feb 12, 2012 at 9:22 AM, Jeff Eastman
<[email protected]>  wrote:

This approach worked out, not exactly as below, but I was able to create a
ClusterWritable which used PolymorphicWritable to read and write its Cluster
value field. This makes it through the mapper and reducer but I'm still
working on getting it all to fly in the ClusterIterator.


On 2/12/12 9:43 AM, Raphael Cendrillon wrote:

Hi Jeff,

It's great to see some discussion on this. I ran into a similar problem
when trying to make the SplitInput job work for any arbitrary key and value
classes. In the end I was able to side step the issue by just reading the
key and value classes from the SequenceFileInput, but I never found a way to
deal with this head on.

On 12 Feb, 2012, at 8:35 AM, Jeff Eastman wrote:

Thanks Sean&    Ted. That is what I've observed experimentally. I was going
to pursue a ClusterWriteable along the lines of VectorWritable but will try
PolymorphicWritable<Cluster>    first. Looking at it, I see it does send the
class name which might be onerous as Sean observed except for the fact that
I am only sending (k) clusters between each mapper and the reducer. I will
report on this an an hour or so.


On 2/12/12 9:01 AM, Ted Dunning wrote:

But this sounds like a runtime problem, not a type checking problem.

Polymorphism is generally a problem in the Hadoop API.   That is why we
have VectorWritable and why I added PolymorphicWritable.

Jeff,

Two questions:

1) would PolymorphicWritable<Cluster>     help?

2) can you say more about what the IOException is?  Does it give any
hints?

On Sun, Feb 12, 2012 at 7:00 AM, Paritosh Ranjan<[email protected]>
wrote:

Can something like this help?

public class CIMapper<T extends Cluster>     extends
Mapper<WritableComparable<?>,**VectorWritable,IntWritable,T>     {
...
}

On 12-02-2012 06:48, Jeff Eastman wrote:

I'm wondering how to tease the elephant into accepting any concrete
instance of the interface o.a.m.clustering.Cluster when writing
trained
clusters in the cleanup() method of CIMapper. I've gotten the MR
version of
the ClusterIterator to get to that point in testing but it blows
chunks
with an IOException when I try to pass a
o.a.m.clustering.kmeans.**Cluster
(I will rename the latter for 0.7). Seems the MapTask.collect() wants
==
and not instanceof.

I've talked with Ted about passing Clusters rather than the current
ClusterObservations but don't see how at this point. Any ideas?

Re: CIMapper Question

Reply via email to