Gotcha. Alright, I'll try a true MR pipeline, and see if that improves the
situtation. Thanks!


On Thu, Dec 13, 2012 at 11:12 AM, Josh Wills <[email protected]> wrote:

> Ah-- that is interesting, and almost certainly the reason why we're
> writing JSON instead of binary Avro.
>
>
> On Thu, Dec 13, 2012 at 11:08 AM, Jonathan Natkins <[email protected]>wrote:
>
>> It's 2.0.0 and 1.7.0. I've actually only been running MemPipelines thus
>> far, to make sure that I've built the job correctly, so it's possible that
>> that's the issue.
>>
>>
>> On Thu, Dec 13, 2012 at 10:56 AM, Josh Wills <[email protected]> wrote:
>>
>>> That surprises me-- Crunch has its own AvroOutputFormat in order to use
>>> the mapreduce.* APIs, but they delegate much of the work to things like
>>> DatumWriters/encoders/etc. from Avro's core libraries.
>>>
>>> Could I get some detail on hadoop/avro version? Is it just 1.0.x and
>>> Avro 1.7.0?
>>>
>>> J
>>>
>>>
>>> On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins 
>>> <[email protected]>wrote:
>>>
>>>> Out of curiosity, is there a way to write output from a Crunch pipeline
>>>> into an Avro-format file? It seems that if you do the
>>>> collection.write(To.avroFile(path)), you end up just writing JSON. It can
>>>> certainly be read into an Avro object, but it seems like it would be more
>>>> efficient to write binary data to the file, so no parsing has to happen.
>>>>
>>>> Have I missed an API, or is this a missing feature?
>>>>
>>>> Thanks,
>>>> Natty
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>
>>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>

Reply via email to