Agreed. Thats the correct way to go. But like I said, It warrants a complete overhaul and a separate JIRA issue. The quick fix I indicated ( i.e. putting the ID back in but removing it from compare/equals function) was just for this bug.
How does this structuring sound? Vector(Interface) -> AbstractVector - > Dense|SparseVector -> NamedDense|SparseVector OR LabelledDense|SparseVector OR MultiLabelledDense|SparseVector Robin On Sun, Apr 18, 2010 at 4:21 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > That would be a very, very good thing (uniform data usage). > > On Sat, Apr 17, 2010 at 2:52 PM, Jake Mannix <jake.man...@gmail.com> > wrote: > > > Currently, FuzzyKMeansClusterMapper has WritableComparable<?> > > keys which are ignored. Could we instead have the identifier for the > > vector live there, where it makes sense? Then that same key could > > be mapper output key, instead of the name of the Vector. > > > > This kind of change could get the clustering code to effectively be > > able to run sensibly on the same SequenceFile<IntWritable,VectorWritable> > > that DistributedRowMatrix is running on, and that would be very nice, > > I think. > > >