[
https://issues.apache.org/jira/browse/PDFBOX-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520135#comment-17520135
]
Andreas Lehmkühler commented on PDFBOX-5412:
--------------------------------------------
There are two versions ob object {{99 0}}
The origin and wellformed version
{code}
99 0 obj
<< /Length 225 /UndoLevel 98 0 R >>
stream
q 0 0 612 792 re W n BT /F9 8 Tf 0.4000 g 1.0 0.0 0.0 1.0 61.4840 10.0 Tm 0 Tw
0 Tc (Downloaded 21 Jun 2001 to 129.6.104.142. Redistribution subject to AIP
license or copyright, see http://ojps.aip.org/phf/phfcr.jsp) Tj ET Q
endstream
endobj
100 0 obj
<< /Length 3 >>
stream
q
endstream
endobj
{code}
And the not really updated but written again version, which is glued to the
following object which results in a malformed pdf
{code}
99 0 obj
<< /Length 225 /UndoLevel 98 0 R >>
stream
q 0 0 612 792 re W n BT /F9 8 Tf 0.4000 g 1.0 0.0 0.0 1.0 61.4840 10.0 Tm 0 Tw
0 Tc (Downloaded 21 Jun 2001 to 129.6.104.142. Redistribution subject to AIP
license or copyright, see http://ojps.aip.org/phf/phfcr.jsp) Tj ET Q
endstr36 0 obj
<<
/Type /Pages
/Kids [ 112 0 R 42 0 R 1 0 R 5 0 R 9 0 R ]
/Count 5
>>
endobj
{code}
The updated object is unknow to the pdf as it is referenced in the xref table.
But when the brute force parser is triggered due to the malformed object number
of the object {{36 0}} it founds the update version struggles upon the missing
end of stream marker. The {{readUntilEndStream}} fallback mechanism reads until
the next {{endobj}} which belongs to {{36 0}} so that the stream is extended by
that object which leads to the exception.
Saying that, the too strict implementation if {{findObjectKey}} reveals an
issue with {{readUntilEndStream}}. I've found a way to make {{findObjectKey}}
more lenient so that the issue isn't triggered any more
> IOException: object reference 112 0 R at offset 18355 in content stream
> -----------------------------------------------------------------------
>
> Key: PDFBOX-5412
> URL: https://issues.apache.org/jira/browse/PDFBOX-5412
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.25, 3.0.0 PDFBox
> Reporter: Tilman Hausherr
> Priority: Major
> Labels: regression
> Attachments: 124760.pdf
>
>
> didn't happen in 2.0.25
> {noformat}
> java.io.IOException: object reference 112 0 R at offset 18355 in content
> stream
>
> org.apache.pdfbox.pdfparser.BaseParser.getObjectFromPool(BaseParser.java:196)
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:654)
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:875)
>
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
>
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:303)
>
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:228)
>
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:159)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]