I guess I'm suggesting the polymorphism pain need not be very painful.
(No doubt it's all nicer with Avro, but that much can be separate.)

VectorWritable is the one Writable used in all cases.
We have *Writable decorators, corresponding to *Vector, in a similar hierarchy.
We have NamedVector decorating Vector.

I submit that solves all known issues here pretty well?
enough that I should try it or is that giving it too much momentum?

On Sun, Apr 18, 2010 at 8:13 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> This is a fundamental problem with Hadoop's type dispatch for writables.
> Building polymorphic writables is always a pain.
>
> On the other hand, with an Avro input format, polymorphism is pretty much
> assumed and comes for nearly free.
>
> My own preference for a naming layer is to allow us to have a pure vector
> layer that is just math, not labels.  That does begin to make things
> complex, though.
>
> This polymorphism pain makes putting the name into the vector and accepting
> whatever strange semantics that result (missing == "" instead of null, for
> instance) more attractive as a temporary measure.
>
> On Sun, Apr 18, 2010 at 11:44 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>
>> It's not just that it is complicated, it's that say you want to do
>> clustering.  You make a SequenceFile of any old key type, and
>> NamedVectorWritable as the value.  Now you can't use that file as input for
>> any DistributedRowMatrix operation, you have to do a full pass over the
>> data
>> to peel off the names and spit out regular VectorWritables...
>>
>

Reply via email to