[ 
https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-493:
------------------------------

    Attachment: AVRO-493.patch

Scott, thanks for the careful review!

> The above looks odd.

Yes, you're right, it was a buggy equals implementation.  I replaced it with 
the one you provided.  hashCode() is required to support hash-based MapReduce 
partitioning (the default) and I only provide an equals implementation to be 
consistent: it's not otherwise required here.  Good catch.

> It would be nice if the compression level was configurable.

Yes, I meant to get to that but forgot.  I've now added it.  Thanks.

> This creates a new AvroWrapper for each output.collect().

Oops.  I originally wrote it that way, but reverted it while debugging to 
remove a possibility but forgot to restore it.  I've now restored it.

> AvroKeySerialization: I am a bit confused about this class.

It's used to serialize map outputs and deserialize reduce inputs.  The 
mapreduce framework uses the job's specified map output key class to find the 
serialization implementation it uses to read and write intermediate keys and 
values.

> Deprecated APIs are used - are the replacements not appropriate or 
> insufficient?

Good question.  Hadoop 0.20 deprecated the "old" org.apache.hadoop.mapred APIs 
to encourage folks to try the new org.apache.hadoop.mapreduce APIs.  However 
the org.apache.hadoop.mapreduce APIs are not fully functional in 0.20, and 
folks primarily continue to use the org.apache.hadoop.mapred APIs.  0.20 is 
used here since it's in Maven repos, but this code should also work against 
0.19 and perhaps even 0.18, and I'd compile against one of those instead if it 
were in a Maven repo.


> hadoop mapreduce support for avro data
> --------------------------------------
>
>                 Key: AVRO-493
>                 URL: https://issues.apache.org/jira/browse/AVRO-493
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-493.patch, AVRO-493.patch
>
>
> Avro should provide support for using Hadoop MapReduce over Avro data files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to