[jira] [Commented] (PDFBOX-5412) IOException: object reference 112 0 R at offset 18355 in content stream

Jira Sun, 10 Apr 2022 03:29:07 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520135#comment-17520135
 ]


Andreas Lehmkühler commented on PDFBOX-5412:
--------------------------------------------

There are two versions ob object {{99 0}}

The origin and wellformed version
{code}
99 0 obj
<< /Length 225 /UndoLevel 98 0 R >> 
stream
 q 0 0 612 792 re W n BT /F9 8 Tf 0.4000 g 1.0 0.0 0.0 1.0 61.4840 10.0 Tm 0 Tw 
0 Tc (Downloaded 21 Jun 2001 to 129.6.104.142. Redistribution subject to AIP 
license or copyright, see http://ojps.aip.org/phf/phfcr.jsp) Tj ET Q
endstream
endobj
100 0 obj
<< /Length 3 >> 
stream
 q 
endstream
endobj
{code}

And the not really updated but written again version, which is glued to the 
following object which results in a malformed pdf
{code}
99 0 obj
<< /Length 225 /UndoLevel 98 0 R >> 
stream
 q 0 0 612 792 re W n BT /F9 8 Tf 0.4000 g 1.0 0.0 0.0 1.0 61.4840 10.0 Tm 0 Tw 
0 Tc (Downloaded 21 Jun 2001 to 129.6.104.142. Redistribution subject to AIP 
license or copyright, see http://ojps.aip.org/phf/phfcr.jsp) Tj ET Q
endstr36 0 obj
<< 
/Type /Pages 
/Kids [ 112 0 R 42 0 R 1 0 R 5 0 R 9 0 R ] 
/Count 5 
>> 
endobj
{code}

The updated object is unknow to the pdf as it is referenced in the xref table. 
But when the brute force parser is triggered due to the malformed object number 
of the object {{36 0}} it founds the update version struggles upon the missing 
end of stream marker. The {{readUntilEndStream}} fallback mechanism reads until 
the next {{endobj}} which belongs to {{36 0}} so that the stream is extended by 
that object which leads to the exception.

Saying that, the too strict implementation if {{findObjectKey}} reveals an 
issue with {{readUntilEndStream}}. I've found a way to make {{findObjectKey}} 
more lenient so that the issue isn't triggered any more


> IOException: object reference 112 0 R at offset 18355 in content stream
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-5412
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5412
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.25, 3.0.0 PDFBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>         Attachments: 124760.pdf
>
>
> didn't happen in 2.0.25
> {noformat}
> java.io.IOException: object reference 112 0 R at offset 18355 in content 
> stream
>     
> org.apache.pdfbox.pdfparser.BaseParser.getObjectFromPool(BaseParser.java:196)
>     org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:654)
>     org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:875)
>     
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
>     
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:303)
>     
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:228)
>     
> org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:159)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5412) IOException: object reference 112 0 R at offset 18355 in content stream

Reply via email to