I can think of situations where I need to use a clusterId as the key-part and a Vector as the value-part. If the Vector is going to have a consistent identity as it moves through jobs then that would need to be inside the Vector.

On 4/18/10 8:41 AM, Jake Mannix wrote:
Which one is "this"?  Wrapping Vector impls into a
NamedVector/LabeledVector,
or seeing if we even need the label *inside* of the Vector itself, and
instead
just having those live in the "key" part of the key-value pair in hadoop,
like
DistributedRowMatrix has it?

   -jake

On Sun, Apr 18, 2010 at 3:44 AM, Sean Owen<sro...@gmail.com>  wrote:

Yeah why don't I have a crack at this. The change as it stands is
already too big for what it is (though I believe they're good
changes.) Then we look at more changes, and sounds like there are
several ideas for streamlining vectors, which is a great thing to
think about at this early stage.

On Sun, Apr 18, 2010 at 12:54 AM, Ted Dunning<ted.dunn...@gmail.com>
wrote:
How about this alternative:

NamedVector: {Vector: wrapped, String: name}
Vector: AbstractVector
AbstractVector: DenseVector | SequentialSparseVector | HashSparseVector

This avoids the multiplicative explosion of vector types.


Reply via email to