[ 
https://issues.apache.org/jira/browse/PDFBOX-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Nichols resolved PDFBOX-802.
---------------------------------

    Resolution: Fixed

Patch committed in revision 988694

> Better handle corrupt/missing %%EOF flags at the end of a file
> --------------------------------------------------------------
>
>                 Key: PDFBOX-802
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-802
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Adam Nichols
>            Assignee: Adam Nichols
>             Fix For: 1.3.0
>
>
> Currently, when the %%EOF flag at the end of the file is missing, an 
> IOException is thrown which produces a stacktrace something like this:
> java.io.IOException: Error: Expected to read '%%EOF' instead started reading 
> '%%E^@'
>         at 
> org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1090)
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:463)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:751)
> While these PDFs are non-conforming, it'd be an improvement to allow them to 
> be read and processed since we're only a few bytes from the end of file 
> anyway.
> There's existing code which checks to see if what was read was %%EOF and 
> throw an exception if %%EOF wasn't read and we're not at the end of file.  
> However, this is never reached because readExpectedString() throws an 
> exception before this can happen.  To fix this, I changed 
> readExpectedString() to readString() and left the manual check to see if the 
> proper %%EOF flag was found.  If not, it'll output a warning.  If we're not 
> at the end of the file, we'll still throw an exception.  I've seen corrupted 
> and missing %%EOF flags at the end of a file, but never in the middle.  Since 
> this doesn't seem to happen, if it does the PDF is clearly out of spec, and 
> these issues would be much harder to deal with, throwing an exception still 
> seems like a reasonable thing to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to