[ https://issues.apache.org/jira/browse/AVRO-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting resolved AVRO-1093. -------------------------------- Resolution: Invalid > DataFileWriter, appendEncoded causes AvroRuntimeException when read back > ------------------------------------------------------------------------ > > Key: AVRO-1093 > URL: https://issues.apache.org/jira/browse/AVRO-1093 > Project: Avro > Issue Type: Bug > Affects Versions: 1.6.3, 1.7.0 > Reporter: Catalin Alexandru Zamfir > > We're doing this: > {code} > // Check > if (!(objRecordsBuffer > .containsKey (objShardPath))) { > // Set > objRecordsBuffer.put (objShardPath, > new ByteBufferOutputStream ()); > } > // Set > Encoder objEncoder = EncoderFactory.get () > .binaryEncoder (objRecordsBuffer > .get (objShardPath), null); > // Write > objGenericDatumWriter.write (objRecordConstructor.build (), > objEncoder); > objEncoder.flush (); > // For > for (ByteBuffer objRecord : objRecordsBuffer > .get (objKey).getBufferList ()) { > // Append > objRecordWriter.appendEncoded > (objRecord); > } > // Erase > objRecordWriter.flush (); > objRecordWriter.close (); > {code} > It writes the data to HDFS. Reading it back outputs the follosing exception: > {code} > Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block > read partially, the data may be corrupt > at > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) > at > net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113) > at net.RnD.Hadoop.App.read1BAvros(App.java:131) > at net.RnD.Hadoop.App.executeCode(App.java:534) > at net.RnD.Hadoop.App.main(App.java:453) > ... 5 more > Caused by: java.io.IOException: Block read partially, the data may be corrupt > at > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194) > ... 9 more > {code} > The objRecordWriter is an instance of DataFileWriter.create or > DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket. > Instead of having big "hashmaps" in memory, we've decided to serialize the > data in "byte buffers" in memory. Because it's faster. Using "appendEncoded" > although seems to write something to HDFS, reading the data back, exposes > this error. > Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but > could not figure out if it's our job to add a sync marker, or does > appendEncoded does that for us. > Must the "ByteBuffer" we give, be the length of one exact record? > Examples and documentation on this method is welcomed. > Files are getting created because: > {code} > -rw-r--r-- 3 root supergroup 124901360 2012-05-17 10:09 > /Streams/Timestamped/Threads/2012/05/17/10/09/Shard.avro > -rw-r--r-- 3 root supergroup 124845625 2012-05-17 10:10 > /Streams/Timestamped/Threads/2012/05/17/10/10/Shard.avro > -rw-r--r-- 3 root supergroup 62378307 2012-05-17 10:11 > /Streams/Timestamped/Threads/2012/05/17/10/11/Shard.avro > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira