Ah-- that is interesting, and almost certainly the reason why we're writing JSON instead of binary Avro.
On Thu, Dec 13, 2012 at 11:08 AM, Jonathan Natkins <[email protected]>wrote: > It's 2.0.0 and 1.7.0. I've actually only been running MemPipelines thus > far, to make sure that I've built the job correctly, so it's possible that > that's the issue. > > > On Thu, Dec 13, 2012 at 10:56 AM, Josh Wills <[email protected]> wrote: > >> That surprises me-- Crunch has its own AvroOutputFormat in order to use >> the mapreduce.* APIs, but they delegate much of the work to things like >> DatumWriters/encoders/etc. from Avro's core libraries. >> >> Could I get some detail on hadoop/avro version? Is it just 1.0.x and Avro >> 1.7.0? >> >> J >> >> >> On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins <[email protected]>wrote: >> >>> Out of curiosity, is there a way to write output from a Crunch pipeline >>> into an Avro-format file? It seems that if you do the >>> collection.write(To.avroFile(path)), you end up just writing JSON. It can >>> certainly be read into an Avro object, but it seems like it would be more >>> efficient to write binary data to the file, so no parsing has to happen. >>> >>> Have I missed an API, or is this a missing feature? >>> >>> Thanks, >>> Natty >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> >> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
