Better handle out of spec PDFs
------------------------------

                 Key: PDFBOX-798
                 URL: https://issues.apache.org/jira/browse/PDFBOX-798
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
         Environment: 32-bti Windows Vista, Java 1.5, HEAD tag of PDFBox
            Reporter: Adam Nichols
            Assignee: Adam Nichols
             Fix For: 1.3.0


I came across another out-of-spec issue which causes PDFBox to crash.  Here's 
the object:
5 0 obj
<</Type /Page
/Parent 6 0 R
/MediaBox [ 0 0 610.560 783.360
endstream
endobj

There are numerous issues here.  The mediabox doesn't have a closing right 
square bracket, there's no ">>" to end the dictionary, and there's an 
"endstream" stuck in there for no apparent reason.  This is something I 
actually found out in the wild, however I do not know if it's a bug in the 
creation program, some data corruption or how this happened.  However, I do 
know that Adobe Reader parses it without crashing.  Since this is not a 
conforming PDF, the result is undefined, so crashing (which is what PDFBox will 
eventually do, when trying to process the next object in the file) is a 
perfectly acceptable thing to do.

However, I'd like to make PDFBox be able to detect that the array is completed 
when it sees endstream, then ignore the rogue endstream, and then know that the 
object has ended when it sees "endobj".  I'm actually going to go one step 
further and also accept the same object even if endstream or endobj is missing. 
 In addition to the above object, I also tested it with these objects:

% end obj, without the endstream
5 0 obj
<</Type /Page
/Parent 6 0 R
/MediaBox [ 0 0 610.560 783.360
endobj

% end endstream, without the endobj
5 0 obj
<</Type /Page
/Parent 6 0 R
/MediaBox [ 0 0 610.560 783.360
endstream

% properly ended array, dictionary and object (aka conforming PDF)
5 0 obj
<</Type /Page
/Parent 6 0 R
/MediaBox [ 0 0 610.560 783.360 ]
>>
endobj


Although this change will only affect PDFs which do not conform to the spec, I 
want to put the patch up for review before committing it to SVN since it is a 
modification to BaseParser.java.  If I do not hear any objections/concerns in 
the few days, I'll go ahead an commit it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to