[
https://issues.apache.org/jira/browse/PDFBOX-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Nichols resolved PDFBOX-798.
---------------------------------
Resolution: Fixed
Committed in revision 989843
> Better handle out of spec PDFs
> ------------------------------
>
> Key: PDFBOX-798
> URL: https://issues.apache.org/jira/browse/PDFBOX-798
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Environment: 32-bti Windows Vista, Java 1.5, HEAD tag of PDFBox
> Reporter: Adam Nichols
> Assignee: Adam Nichols
> Fix For: 1.3.0
>
> Attachments: PDFBOX-798.patch
>
>
> I came across another out-of-spec issue which causes PDFBox to crash. Here's
> the object:
> 5 0 obj
> <</Type /Page
> /Parent 6 0 R
> /MediaBox [ 0 0 610.560 783.360
> endstream
> endobj
> There are numerous issues here. The mediabox doesn't have a closing right
> square bracket, there's no ">>" to end the dictionary, and there's an
> "endstream" stuck in there for no apparent reason. This is something I
> actually found out in the wild, however I do not know if it's a bug in the
> creation program, some data corruption or how this happened. However, I do
> know that Adobe Reader parses it without crashing. Since this is not a
> conforming PDF, the result is undefined, so crashing (which is what PDFBox
> will eventually do, when trying to process the next object in the file) is a
> perfectly acceptable thing to do.
> However, I'd like to make PDFBox be able to detect that the array is
> completed when it sees endstream, then ignore the rogue endstream, and then
> know that the object has ended when it sees "endobj". I'm actually going to
> go one step further and also accept the same object even if endstream or
> endobj is missing. In addition to the above object, I also tested it with
> these objects:
> % end obj, without the endstream
> 5 0 obj
> <</Type /Page
> /Parent 6 0 R
> /MediaBox [ 0 0 610.560 783.360
> endobj
> % end endstream, without the endobj
> 5 0 obj
> <</Type /Page
> /Parent 6 0 R
> /MediaBox [ 0 0 610.560 783.360
> endstream
> % properly ended array, dictionary and object (aka conforming PDF)
> 5 0 obj
> <</Type /Page
> /Parent 6 0 R
> /MediaBox [ 0 0 610.560 783.360 ]
> >>
> endobj
> Although this change will only affect PDFs which do not conform to the spec,
> I want to put the patch up for review before committing it to SVN since it is
> a modification to BaseParser.java. If I do not hear any objections/concerns
> in the few days, I'll go ahead an commit it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.