[jira] Commented: (AVRO-534) AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema

Harsh J Chouraria (JIRA) Fri, 30 Apr 2010 09:34:16 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862738#action_12862738
 ]


Harsh J Chouraria commented on AVRO-534:
----------------------------------------

Hello Doug,

Could you tell me in simple points how to go about doing that? Not been in Java 
development for long but am willing to do this :)

I see a WordCount test for Avro in trunk, shall I extend that or write a custom 
one?

> AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given 
> schema
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-534
>                 URL: https://issues.apache.org/jira/browse/AVRO-534
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.0
>         Environment: ArchLinux, JAVA 1.6, Apache Hadoop (0.20.2), Apache Avro 
> (trunk -- 1.4.0 SNAPSHOT), Using Avro Generic API (JAVA)
>            Reporter: Harsh J Chouraria
>            Priority: Trivial
>             Fix For: 1.4.0
>
>         Attachments: avro.mapreduce.r1.diff
>
>
> Consider an Avro File of a single record type with about 70 fields in the 
> order (str, str, str, long, str, double, [lets take only first 6 into 
> consideration] ...).
> To pass this into a simple MapReduce job I do: 
> AvroInputFormat.addInputPath(...) and it works well with an IdentityMapper.
> Now I'd like to read only three fields, say fields 0, 1 and 3 so I give the 
> special schema with my 3 fields as (str (0), str (1), long(2)) using 
> AvroJob.setInputGeneric(..., mySchema). This leads to a failure of the 
> mapreduce job since the Avro record reader reads the file for its entire 
> schema (of 70 fields) and tries to convert my given 'long' field to 'str' as 
> is at the index 2 of the actual schema (meaning its using the actual schema 
> embedded into the file, not what I supplied!).
> The AvroRecordReader must support reading in the schema specified by the user 
> using AvroJob.setInputGeneric.
> I've written a patch for it to do the same but am not sure if its actually 
> the solution (MAP_OUTPUT_SCHEMA use?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-534) AvroRecordReader (org.apache.avro.mapred) should support a JobConf-given schema

Reply via email to