+1 NamedVector seems a lot like VectorView. I'm comfortable enough with
this proposal for Sean to go forward with it <grin>. I agree with
separating the naming/identifying/labeling into a separate wrapper class
so that vectors themselves can be pure mathematical entities. Unifying
as many as possible file representations to [vector_label ::
pure_vector] sequence files also seems desirable. Situations where I
need to retain both the vector_label and (e.g. a cluster_label) would be
addressed by a NamedVector wrapper. If a vector name was provided to
identify the sample then it would propagate painlessly through the
systerm. As I look at cluster evaluation semantics I will try to follow
these patterns.
On 4/18/10 3:37 PM, Jake Mannix wrote:
On Sun, Apr 18, 2010 at 3:23 PM, Sean Owen<sro...@gmail.com> wrote:
On Sun, Apr 18, 2010 at 11:16 PM, Jake Mannix<jake.man...@gmail.com>
wrote:
VectorWritable currently is a proper decorator, right? It doesn't even
implement Vector at all.
Yeah, the other *Writable classes should be as well. NamedVector
should both be a Vector and decorate a Vector too. Its Writable also
decorates a NamedVector.
What exactly are you suggesting the hierarchy to be?
1) Vector is an interface, NamedVector extends it
(I just have NamedVector as a concrete subclass, a decorator)
So NamedVector (I think LabeledVector is probably better, actually)
takes in its constructor another Vector which it delegates to, and then
also has the name/label, sure.
4) VectorWritable acts still just as it is now, basically
Yes, made it more general so we don't have to modify it to handle each
new Vector impl too.
The trick is to make the writing part efficient without knowing the
internals of the vector impl. I guess there's no really easy way to
read/write a hash-based vector more efficiently than just making
sure the size is right, and then stuffing the read-from-disk values
into the hash, so internals aren't critical. And as mentioned below,
the constructors for SeqAcc and OrderedIntDoubleMapping both
allow for an efficient read/write impl, as does DenseVector.
Ok, I'm convinced that this should be good for now, until we
get to the happy Avro-future.
-jake