This is a fundamental problem with Hadoop's type dispatch for writables. Building polymorphic writables is always a pain.
On the other hand, with an Avro input format, polymorphism is pretty much assumed and comes for nearly free. My own preference for a naming layer is to allow us to have a pure vector layer that is just math, not labels. That does begin to make things complex, though. This polymorphism pain makes putting the name into the vector and accepting whatever strange semantics that result (missing == "" instead of null, for instance) more attractive as a temporary measure. On Sun, Apr 18, 2010 at 11:44 AM, Jake Mannix <jake.man...@gmail.com> wrote: > It's not just that it is complicated, it's that say you want to do > clustering. You make a SequenceFile of any old key type, and > NamedVectorWritable as the value. Now you can't use that file as input for > any DistributedRowMatrix operation, you have to do a full pass over the > data > to peel off the names and spit out regular VectorWritables... >