At the moment I'm already overreaching on the way to fix MAHOUT-379 with this patch, as I've expanded to address some mildly related issues (equals, iterators).
So I personally am not trying to change serialization formats in MAHOUT-379 / my current patch, no. The issue uncovered by removing name relates to serialization format (since that becomes a vector's new 'name') but is not a problem with the GSON format per se. I also don't really want to rip up Writable too much, no. I have other pet issues to foist on the project first. At the moment I want to understand how to patch up the fuzzy k-means code in this regard -- will probably switch to something slightly less state-dependent than asFormatString() as a key and be done with it for the moment. On Sat, Apr 17, 2010 at 6:39 PM, Drew Farris <drew.far...@gmail.com> wrote: > it is worth some investigation to determine if there is merit to > adapting Mahout's MR jobs to use avro. Doug has recently committed a > patch to avro (https://issues.apache.org/jira/browse/AVRO-493) that > involves considerably less complexity than what I had originally > proposed in https://issues.apache.org/jira/browse/MAHOUT-274, based on > the initial proposed avro/mapreduce integration in MAPREDUCE-815. > > I'm half waiting for avro 1.4 to be released (which will include > AVRO-493) before I dig into further proofs-of-concept of avro usage in > Mahout, but I think there is something there worth seriously > exploring. (half procrastinating otherwise) > > Drew > > On Sat, Apr 17, 2010 at 12:43 PM, Jeff Eastman > <j...@windwardsolutions.com> wrote: >> Seems like a major rewrite to replace Writable within our MR jobs. >