I can think of situations where I need to use a clusterId as the
key-part and a Vector as the value-part. If the Vector is going to have
a consistent identity as it moves through jobs then that would need to
be inside the Vector.
On 4/18/10 8:41 AM, Jake Mannix wrote:
Which one is "this"? Wrapping Vector impls into a
NamedVector/LabeledVector,
or seeing if we even need the label *inside* of the Vector itself, and
instead
just having those live in the "key" part of the key-value pair in hadoop,
like
DistributedRowMatrix has it?
-jake
On Sun, Apr 18, 2010 at 3:44 AM, Sean Owen<sro...@gmail.com> wrote:
Yeah why don't I have a crack at this. The change as it stands is
already too big for what it is (though I believe they're good
changes.) Then we look at more changes, and sounds like there are
several ideas for streamlining vectors, which is a great thing to
think about at this early stage.
On Sun, Apr 18, 2010 at 12:54 AM, Ted Dunning<ted.dunn...@gmail.com>
wrote:
How about this alternative:
NamedVector: {Vector: wrapped, String: name}
Vector: AbstractVector
AbstractVector: DenseVector | SequentialSparseVector | HashSparseVector
This avoids the multiplicative explosion of vector types.