[ 
https://issues.apache.org/jira/browse/MAPREDUCE-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-815:
------------------------------------

    Attachment: MAPREDUCE-815.patch

Attaching a patch that provides AvroInputFormat/AvroOutputFormat.

AvroInputFormat allows you to set its input schema in the job configuration. It 
provides static methods for this functionality. Depending on the input 
serialization metadata it can choose to deserialize to generic, reflect, or 
specific-based classes. 

This patch includes unit tests for both of these classes.

I have also extended the jobdata API to allow you to set output serialization 
metadata (vs. simple class-name-only metadata) in the same fashion as 
MAPREDUCE-1126 allowed you to set intermediate serialization metadata. This 
deprecates the old methods like {{JobConf.setOutputKeyClass()}}. Note that now 
the PipesMapRunner/PipesReducer, MapFileOutputFormat, and 
SequenceFileOutputFormat rely on these deprecated APIs. MAPREDUCE-1360 will 
require a Hadoop-core-project JIRA that allows SequenceFile to handle 
non-class-based serialization; that will update at least the SequenceFile IF/OF 
APIs. Handling Pipes is a separate issue.

This cannot be submitted to the patch queue until a small change is made to the 
Hadoop-core API (issue is linked), and Hadoop is upgraded across the board to 
Avro 1.3. I'll mark this patch-available when that happens.

> Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro 
> Serialization
> ----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-815
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Ravi Gummadi
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-815.patch
>
>
> MapReduce needs AvroInputFormat similar to other InputFormats like 
> TextInputFormat to be able to use avro serialization in hadoop. Similarly 
> AvroOutputFormat is needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to