Agreed. Thats the correct way to go. But like I said, It warrants a complete
overhaul and a separate JIRA issue. The quick fix I indicated ( i.e. putting
the ID back in but removing it from compare/equals function) was just for
this bug.

How does this structuring sound?

Vector(Interface) -> AbstractVector - > Dense|SparseVector
-> NamedDense|SparseVector OR LabelledDense|SparseVector  OR
MultiLabelledDense|SparseVector



Robin

On Sun, Apr 18, 2010 at 4:21 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> That would be a very, very good thing (uniform data usage).
>
> On Sat, Apr 17, 2010 at 2:52 PM, Jake Mannix <jake.man...@gmail.com>
> wrote:
>
> > Currently, FuzzyKMeansClusterMapper has WritableComparable<?>
> > keys which are ignored.  Could we instead have the identifier for the
> > vector live there, where it makes sense?  Then that same key could
> > be mapper output key, instead of the name of the Vector.
> >
> > This kind of change could get the clustering code to effectively be
> > able to run sensibly on the same SequenceFile<IntWritable,VectorWritable>
> > that DistributedRowMatrix is running on, and that would be very nice,
> > I think.
> >
>

Reply via email to