[ https://issues.apache.org/jira/browse/AVRO-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Catalin Alexandru Zamfir updated AVRO-1093: ------------------------------------------- Description: We're doing this: {code} // Check if (!(objRecordsBuffer .containsKey (objShardPath))) { // Set objRecordsBuffer.put (objShardPath, new ByteBufferOutputStream ()); } // Set Encoder objEncoder = EncoderFactory.get () .binaryEncoder (objRecordsBuffer .get (objShardPath), null); // Write objGenericDatumWriter.write (objRecordConstructor.build (), objEncoder); objEncoder.flush (); // For for (ByteBuffer objRecord : objRecordsBuffer .get (objKey).getBufferList ()) { // Append objRecordWriter.appendEncoded (objRecord); } // Erase objRecordWriter.flush (); objRecordWriter.close (); {code} It writes the data to HDFS. Reading it back outputs the follosing exception: {code} Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) at net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113) at net.RnD.Hadoop.App.read1BAvros(App.java:131) at net.RnD.Hadoop.App.executeCode(App.java:534) at net.RnD.Hadoop.App.main(App.java:453) ... 5 more Caused by: java.io.IOException: Block read partially, the data may be corrupt at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194) ... 9 more {code} The objRecordWriter is an instance of DataFileWriter.create or DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket. Instead of having big "hashmaps" in memory, we've decided to serialize the data in "byte buffers" in memory. Because it's faster. Using "appendEncoded" although seems to write something to HDFS, reading the data back, exposes this error. Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but could not figure out if it's our job to add a sync marker, or does appendEncoded does that for us. Must the "ByteBuffer" we give, be the length of one exact record? Examples and documentation on this method is welcomed. was: We're doing this: {code} // Check if (!(objRecordsBuffer .containsKey (objShardPath))) { // Set objRecordsBuffer.put (objShardPath, new ByteBufferOutputStream ()); } // Set Encoder objEncoder = EncoderFactory.get () .binaryEncoder (objRecordsBuffer .get (objShardPath), null); // Write objGenericDatumWriter.write (objRecordConstructor.build (), objEncoder); objEncoder.flush (); // For for (ByteBuffer objRecord : objRecordsBuffer .get (objKey).getBufferList ()) { // Append objRecordWriter.appendEncoded (objRecord); } // Erase objRecordWriter.flush (); objRecordWriter.close (); {code} It writes the data to HDFS. Reading it back outputs the follosing exception: {code} Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block read partially, the data may be corrupt at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) at net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113) at net.RnD.Hadoop.App.read1BAvros(App.java:131) at net.RnD.Hadoop.App.executeCode(App.java:534) at net.RnD.Hadoop.App.main(App.java:453) ... 5 more Caused by: java.io.IOException: Block read partially, the data may be corrupt at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194) ... 9 more {code} The objRecordWriter is an instance of DataFileWriter.create or DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket. Instead of having big "hashmaps" in memory, we've decided to serialize the data in "byte buffers" in memory. Because it's faster. Using "appendEncoded" although seems to write something to HDFS, reading the data back, exposes this error. Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but could not figure out if it's our job to add a sync marker, or does appendEncoded does that for us. Must the "ByteBuffer" we give, be the length of one exact record? Examples and documentation on this method is welcomed. > DataFileWriter, appendEncoded causes AvroRuntimeException when read back > ------------------------------------------------------------------------ > > Key: AVRO-1093 > URL: https://issues.apache.org/jira/browse/AVRO-1093 > Project: Avro > Issue Type: Bug > Affects Versions: 1.6.3 > Reporter: Catalin Alexandru Zamfir > > We're doing this: > {code} > // Check > if (!(objRecordsBuffer > .containsKey (objShardPath))) { > // Set > objRecordsBuffer.put (objShardPath, > new ByteBufferOutputStream ()); > } > // Set > Encoder objEncoder = EncoderFactory.get () > .binaryEncoder (objRecordsBuffer > .get (objShardPath), null); > // Write > objGenericDatumWriter.write (objRecordConstructor.build (), > objEncoder); > objEncoder.flush (); > // For > for (ByteBuffer objRecord : objRecordsBuffer > .get (objKey).getBufferList ()) { > // Append > objRecordWriter.appendEncoded > (objRecord); > } > // Erase > objRecordWriter.flush (); > objRecordWriter.close (); > {code} > It writes the data to HDFS. Reading it back outputs the follosing exception: > {code} > Caused by: org.apache.avro.AvroRuntimeException: java.io.IOException: Block > read partially, the data may be corrupt > at > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:210) > at > net.RnD.FileUtils.TimestampedReader.hasNext(TimestampedReader.java:113) > at net.RnD.Hadoop.App.read1BAvros(App.java:131) > at net.RnD.Hadoop.App.executeCode(App.java:534) > at net.RnD.Hadoop.App.main(App.java:453) > ... 5 more > Caused by: java.io.IOException: Block read partially, the data may be corrupt > at > org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194) > ... 9 more > {code} > The objRecordWriter is an instance of DataFileWriter.create or > DataFileWriter.appendto (SeekableInput). In relation to AVRO-1090 ticket. > Instead of having big "hashmaps" in memory, we've decided to serialize the > data in "byte buffers" in memory. Because it's faster. Using "appendEncoded" > although seems to write something to HDFS, reading the data back, exposes > this error. > Help would be appreciated. I've looked @ appendEncoded in DataFileWriter but > could not figure out if it's our job to add a sync marker, or does > appendEncoded does that for us. > Must the "ByteBuffer" we give, be the length of one exact record? > Examples and documentation on this method is welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira