[ 
https://issues.apache.org/jira/browse/AVRO-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting resolved AVRO-1093.
--------------------------------

    Resolution: Invalid
    
> DataFileWriter, appendEncoded causes AvroRuntimeException when read back
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1093
>                 URL: https://issues.apache.org/jira/browse/AVRO-1093
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Catalin Alexandru Zamfir
>
> We're doing this:
> {code}
> // Check
>               if (!(objRecordsBuffer
>               .containsKey (objShardPath))) {
>                       // Set
>                       objRecordsBuffer.put (objShardPath,
>                       new ByteBufferOutputStream ());
>               }
>               // Set
>               Encoder objEncoder =  EncoderFactory.get ()
>               .binaryEncoder (objRecordsBuffer
>               .get (objShardPath), null);
>               // Write
>               objGenericDatumWriter.write (objRecordConstructor.build (), 
> objEncoder);
>               objEncoder.flush ();
> // For
>                               for (ByteBuffer objRecord : objRecordsBuffer
>                               .get (objKey).getBufferList ()) {
>                                       // Append
>                                       objRecordWriter.appendEncoded 
> (objRecord);
>                               }
>                               // Erase
>                               objRecordWriter.flush ();
>                               objRecordWriter.close ();
> {code}
> It writes the data to HDFS. Reading it back outputs the follosing exception:
> {code}
> Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block 
> read partially, the data may be corrupt
>         at 
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210)
>         at 
> net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113)
>         at net.RnD.Hadoop.App.read1BAvros(App.java:131)
>         at net.RnD.Hadoop.App.executeCode(App.java:534)
>         at net.RnD.Hadoop.App.main(App.java:453)
>         ... 5 more
> Caused by: java.io.IOException: Block read partially, the data may be corrupt
>         at 
> org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
>         ... 9 more
> {code}
> The objRecordWriter is an instance of DataFileWriter.create or 
> DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket.
> Instead of having big "hashmaps" in memory, we've decided to serialize the 
> data in "byte buffers" in memory. Because it's faster. Using "appendEncoded" 
> although seems to write something to HDFS, reading the data back, exposes 
> this error.
> Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but 
> could not figure out if it's our job to add a sync marker, or does 
> appendEncoded does that for us.
> Must the "ByteBuffer" we give, be the length of one exact record?
> Examples and documentation on this method is welcomed.
> Files are getting created because:
> {code}
> -rw-r--r--   3 root supergroup  124901360 2012-05-17 10:09 
> /Streams/Timestamped/Threads/2012/05/17/10/09/Shard.avro
> -rw-r--r--   3 root supergroup  124845625 2012-05-17 10:10 
> /Streams/Timestamped/Threads/2012/05/17/10/10/Shard.avro
> -rw-r--r--   3 root supergroup   62378307 2012-05-17 10:11 
> /Streams/Timestamped/Threads/2012/05/17/10/11/Shard.avro
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to