[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3015: ------------------------------- Attachment: TestInput.java Test.java OK, I made two changes to my test program: * Set the sync interval to 32 bytes (32 bytes seems to be the possible minimum interval unless I misunderstood the Avro source code). * Increased the file size to ~10mb. I can see sync points are added after every 32 bytes or so. I also see tell() returns increasing values with the good file. I am mimicking a bad file by deleting a random byte in a sync point. Running {{avro-tool tojson}} gives me an invalid sync exception after reading to that corrupted sync point, so I guess that the bad file is created correctly. However, I cannot still recover from a bad read. I catches an exception from next() and do sync(tell() + 1). The next tell() seems to correctly return the next valid sync point. But next() still fails. In fact, it continues to fail until it hits the end of the file. {code} next(): 9999 tell(): 82133 hasNext() or next() failed tell(): 82196 hasNext() or next() failed tell(): 82250 ... hasNext() or next() failed tell(): 10424205 hasNext() or next() failed tell(): 10424258 end of the file tell(): 10424259 past the end of the file {code} I am uploading my test program. {{TestInput.java}} generates input files, and {{Test.java}} runs the test. Does anyone have an idea what I am doing wrong? > Rewrite of AvroStorage > ---------------------- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank > Reporter: Joseph Adler > Assignee: Joseph Adler > Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, > PIG-3015-5.patch, TestInput.java, Test.java > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira