Michael Coon created AVRO-1917:
----------------------------------

             Summary: DataFileStream Skips Blocks with hasNext and nextBlock 
calls
                 Key: AVRO-1917
                 URL: https://issues.apache.org/jira/browse/AVRO-1917
             Project: Avro
          Issue Type: Bug
          Components: java
            Reporter: Michael Coon


We have a situation where there are potentially large segments of data embedded 
in an Avro data item. Sometimes, an upstream system will become corrupted and 
add hundreds of thousands of array items in the structure. When I try to read 
the item as a Datum record, it blows the heap immediately. 

To catch this situation, I needed to create a custom DatumReader that checked 
the size of arrays and byte[] and if exceeding a threshold, throws a custom 
exception that I detect and skip the corrupted item in the file. However, to 
accomplish the try-catch-skip functionality, I had to use a hasNext, and 
nextBlock to get the ByteBuffer and send to my reader to catch the situation. 
Unfortunately, calling "hasNext" and then "nextBlock" actually skips the first 
block in the underlying data stream. This is because "nextBlock" calls 
"hasNext", which reads the next block. So I called it, then nextBlock called 
it, causing bytes to be skipped. My solution is to do a do...while loop and 
catch "NoSuchElementException", but this is not intuitive and required me to 
review the code to know how to work around it. The fix is to create a condition 
that both hasNext and nextBlock agree so that it doesn't advance forward 
reading the next block in hasNext call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to