That would be a very, very good thing (uniform data usage). On Sat, Apr 17, 2010 at 2:52 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> Currently, FuzzyKMeansClusterMapper has WritableComparable<?> > keys which are ignored. Could we instead have the identifier for the > vector live there, where it makes sense? Then that same key could > be mapper output key, instead of the name of the Vector. > > This kind of change could get the clustering code to effectively be > able to run sensibly on the same SequenceFile<IntWritable,VectorWritable> > that DistributedRowMatrix is running on, and that would be very nice, > I think. >