[ 
https://issues.apache.org/jira/browse/AVRO-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated AVRO-986:
------------------------------

    Attachment: AVRO-986-java.patch

Here's a version of the java changes that includes a test.
                
> Avro files generated from avro-c dont work with the Java mapred 
> implementation.
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-986
>                 URL: https://issues.apache.org/jira/browse/AVRO-986
>             Project: Avro
>          Issue Type: Bug
>          Components: c, java
>         Environment: avro-c 1.6.2-SNAPSHOT
> avro-java 1.6.2-SNAPSHOT
> hadoop 0.20.2
>            Reporter: Michael Cooper
>            Priority: Critical
>              Labels: c, hadoop, java, mapreduce
>         Attachments: 0001-Remove-sync-marker-from-metadata-in-header.patch, 
> 0001-avromod-utility.patch, AVRO-986-java.patch, AVRO-986-java.patch, 
> quickstop.db
>
>
> When a file generated from the Avro-C implementation is fed into Hadoop, it 
> will fail with "Block size invalid or too large for this implementation: -49".
> This is caused by the sync marker, namely the one that Avro-C puts into the 
> header...
> The org.apache.avro.mapred.AvroRecordReader uses a FileSplit object to work 
> out where it should read from, but this class is not particularly smart, it 
> just divides the file up into equal size chunks, the first being with 
> position 0.
> So org.apache.avro.mapred.AvroRecordReader gets 0 as the start of its chunk, 
> and calls
> {code:title=AvroRecordReader.java}reader.sync(split.getStart());   // sync to 
> start{code}
> Then the org.apache.avro.file.DataFileReader::seek() goes to 0, then searches 
> for a sync marker....
> It encounters one at position 32, the one in the header metadata map, 
> "avro.sync"
> No other implementations add the sync marker in the metadata map, and none 
> read it from there, not even the C version.
> I suggest we remove this from the header as the simplest solution.
> Another solution would be to create an AvroFileSplit class in mapred that 
> knows where the blocks are, and provides the correct locations in the first 
> place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to