Are you thinking of replacing our Writable or Json (asFormatString)
encodings? Certainly, using Avro as an I/O format for clustering would
improve their utility for other languages. Seems like a major rewrite to
replace Writable within our MR jobs.
On 4/17/10 9:10 AM, Ted Dunning wrote:
IF the format is about to change, should we look at avro to encode it? Drew
seemed to like Avro in his document representation work.
On Sat, Apr 17, 2010 at 9:05 AM, Jeff Eastman<j...@windwardsolutions.com>wrote:
Seems to me we need to rethink this step anyway if we are going to
implement the CDbw cluster evaluation algorithm. For that we need a job step
that outputs [clusterId:Vector_as_Writable] sequence files so that we can
iterate over them to find representative points. Is anybody using the
current format who would be impacted by such a change?