PS let's see a patch to keep discussing, I'm seeing ideas on lots of
good topics here and want to take the opportunity to strike while the
iron is hot and continue overhauling this.

But things like making everything a named vector is sort of stepping
backwards to where we just agreed to move from -- making name a
default part of all vectors.

I am also not sure it is practical to use only VectorWritable because
of the storage overhead, though it does in fact seem to offer the very
facility alluded to in talk of a 'facade' class? I think doing things
like writing optional data in Hadoop's basic serialization format is
not really possible. I saw attempts in the previous code which felt
fragile: read a string, if it's the class name, assume it is the name
of the vector class to deserialize, otherwise assume it's a vector
name... hmm.

So are we on the same page about how this works now. In fact I would
expect to see implementations start to specialize to one particular
representation, if possible, to be more efficient.


On this topic, sort of:

- How about moving label bindings out to NamedVector?
- How about similar restructuring of matrices?
- And how about not writing
"org.apache.mahout.math.RandomAccessSparseVectorWritable" whenever
VectorWritable does its wrapping.. I think making the package name and
"Writable" implicit is perhaps worth the loss of generality.

Reply via email to