How about this alternative: NamedVector: {Vector: wrapped, String: name} Vector: AbstractVector AbstractVector: DenseVector | SequentialSparseVector | HashSparseVector
This avoids the multiplicative explosion of vector types. On Sat, Apr 17, 2010 at 4:17 PM, Robin Anil <robin.a...@gmail.com> wrote: > Agreed. Thats the correct way to go. But like I said, It warrants a > complete > overhaul and a separate JIRA issue. The quick fix I indicated ( i.e. > putting > the ID back in but removing it from compare/equals function) was just for > this bug. > > How does this structuring sound? > > Vector(Interface) -> AbstractVector - > Dense|SparseVector > -> NamedDense|SparseVector OR LabelledDense|SparseVector OR > MultiLabelledDense|SparseVector > > > > Robin > > On Sun, Apr 18, 2010 at 4:21 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > That would be a very, very good thing (uniform data usage). > > > > On Sat, Apr 17, 2010 at 2:52 PM, Jake Mannix <jake.man...@gmail.com> > > wrote: > > > > > Currently, FuzzyKMeansClusterMapper has WritableComparable<?> > > > keys which are ignored. Could we instead have the identifier for the > > > vector live there, where it makes sense? Then that same key could > > > be mapper output key, instead of the name of the Vector. > > > > > > This kind of change could get the clustering code to effectively be > > > able to run sensibly on the same > SequenceFile<IntWritable,VectorWritable> > > > that DistributedRowMatrix is running on, and that would be very nice, > > > I think. > > > > > >