That surprises me-- Crunch has its own AvroOutputFormat in order to use the mapreduce.* APIs, but they delegate much of the work to things like DatumWriters/encoders/etc. from Avro's core libraries.
Could I get some detail on hadoop/avro version? Is it just 1.0.x and Avro 1.7.0? J On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins <[email protected]>wrote: > Out of curiosity, is there a way to write output from a Crunch pipeline > into an Avro-format file? It seems that if you do the > collection.write(To.avroFile(path)), you end up just writing JSON. It can > certainly be read into an Avro object, but it seems like it would be more > efficient to write binary data to the file, so no parsing has to happen. > > Have I missed an API, or is this a missing feature? > > Thanks, > Natty > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
