[ https://issues.apache.org/jira/browse/PDFBOX-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Nichols resolved PDFBOX-802. --------------------------------- Resolution: Fixed Patch committed in revision 988694 > Better handle corrupt/missing %%EOF flags at the end of a file > -------------------------------------------------------------- > > Key: PDFBOX-802 > URL: https://issues.apache.org/jira/browse/PDFBOX-802 > Project: PDFBox > Issue Type: Improvement > Reporter: Adam Nichols > Assignee: Adam Nichols > Fix For: 1.3.0 > > > Currently, when the %%EOF flag at the end of the file is missing, an > IOException is thrown which produces a stacktrace something like this: > java.io.IOException: Error: Expected to read '%%EOF' instead started reading > '%%E^@' > at > org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1090) > at > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:463) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:751) > While these PDFs are non-conforming, it'd be an improvement to allow them to > be read and processed since we're only a few bytes from the end of file > anyway. > There's existing code which checks to see if what was read was %%EOF and > throw an exception if %%EOF wasn't read and we're not at the end of file. > However, this is never reached because readExpectedString() throws an > exception before this can happen. To fix this, I changed > readExpectedString() to readString() and left the manual check to see if the > proper %%EOF flag was found. If not, it'll output a warning. If we're not > at the end of the file, we'll still throw an exception. I've seen corrupted > and missing %%EOF flags at the end of a file, but never in the middle. Since > this doesn't seem to happen, if it does the PDF is clearly out of spec, and > these issues would be much harder to deal with, throwing an exception still > seems like a reasonable thing to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.