Michael Coon created AVRO-1917:
----------------------------------
Summary: DataFileStream Skips Blocks with hasNext and nextBlock
calls
Key: AVRO-1917
URL: https://issues.apache.org/jira/browse/AVRO-1917
Project: Avro
Issue Type: Bug
Components: java
Reporter: Michael Coon
We have a situation where there are potentially large segments of data embedded
in an Avro data item. Sometimes, an upstream system will become corrupted and
add hundreds of thousands of array items in the structure. When I try to read
the item as a Datum record, it blows the heap immediately.
To catch this situation, I needed to create a custom DatumReader that checked
the size of arrays and byte[] and if exceeding a threshold, throws a custom
exception that I detect and skip the corrupted item in the file. However, to
accomplish the try-catch-skip functionality, I had to use a hasNext, and
nextBlock to get the ByteBuffer and send to my reader to catch the situation.
Unfortunately, calling "hasNext" and then "nextBlock" actually skips the first
block in the underlying data stream. This is because "nextBlock" calls
"hasNext", which reads the next block. So I called it, then nextBlock called
it, causing bytes to be skipped. My solution is to do a do...while loop and
catch "NoSuchElementException", but this is not intuitive and required me to
review the code to know how to work around it. The fix is to create a condition
that both hasNext and nextBlock agree so that it doesn't advance forward
reading the next block in hasNext call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)