Why not just keep the identifier and not compare it when doing equals. ? Let it be like a tag of the vector.
On Sat, Apr 17, 2010 at 11:53 PM, Sean Owen <sro...@gmail.com> wrote: > At the moment I'm already overreaching on the way to fix MAHOUT-379 > with this patch, as I've expanded to address some mildly related > issues (equals, iterators). > > So I personally am not trying to change serialization formats in > MAHOUT-379 / my current patch, no. The issue uncovered by removing > name relates to serialization format (since that becomes a vector's > new 'name') but is not a problem with the GSON format per se. > > I also don't really want to rip up Writable too much, no. I have other > pet issues to foist on the project first. > > At the moment I want to understand how to patch up the fuzzy k-means > code in this regard -- will probably switch to something slightly less > state-dependent than asFormatString() as a key and be done with it for > the moment. > > > On Sat, Apr 17, 2010 at 6:39 PM, Drew Farris <drew.far...@gmail.com> > wrote: > > it is worth some investigation to determine if there is merit to > > adapting Mahout's MR jobs to use avro. Doug has recently committed a > > patch to avro (https://issues.apache.org/jira/browse/AVRO-493) that > > involves considerably less complexity than what I had originally > > proposed in https://issues.apache.org/jira/browse/MAHOUT-274, based on > > the initial proposed avro/mapreduce integration in MAPREDUCE-815. > > > > I'm half waiting for avro 1.4 to be released (which will include > > AVRO-493) before I dig into further proofs-of-concept of avro usage in > > Mahout, but I think there is something there worth seriously > > exploring. (half procrastinating otherwise) > > > > Drew > > > > On Sat, Apr 17, 2010 at 12:43 PM, Jeff Eastman > > <j...@windwardsolutions.com> wrote: > >> Seems like a major rewrite to replace Writable within our MR jobs. > > >