[ 
https://issues.apache.org/jira/browse/PDFBOX-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079839#comment-18079839
 ] 

Andreas Lehmkühler commented on PDFBOX-6194:
--------------------------------------------

At the end the solution is quite simple, don't use static instances of a 
PD-object such as PDType1Font.

PD-objects are not immutable or more important they are backed by a COS-object. 
Many of them might end up as indirect object in a pdf, especially if compressed 
object streams are used which consist of indirect objects. Those indirect 
objects get an object number which is saved as COSObjectKey in the affected 
COS-object once they are written to a pdf.

In the given case the static font instance is new and has no such key at the 
beginning. Once the first document is saved, the font is saved as indirect 
object and gets an object key, which is stored in the underlying COSDictionary 
of the static font. If the static font instance is used within another 
document, the object key might get mixed up with an existing one of the source 
document and the resulting pdf is corrupt.

I don't have an idea how to add some magic to the writer to detect and repair 
such cases. IMHO the only reasonable solution is not to use static instances of 
a PD-object 



> COSStream becomes COSDictionary after save — shared XObject reference 
> replaced by Font
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-6194
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6194
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 3.0.7 PDFBox
>         Environment: Windows Server 2016, Java 21, PDFBox 3.0.7
>            Reporter: HABA
>            Priority: Major
>         Attachments: 000012.pdf, 000016.pdf, 000025.pdf, bad-000016.pdf, 
> bad-000025.pdf, image-2026-04-20-12-33-11-057.png, 
> image-2026-04-20-13-52-20-247.png, image-2026-04-20-13-52-44-302.png, 
> image-2026-05-01-19-07-19-330.png, screenshot-1.png
>
>
> Hi,
> `document.save()` corrupts an `/XObject` on page 3 of a 3-page PDF.
> Before save:
> - `Obj5` = `COSStream` (ImageMask)
> After save:
> - `Obj5` = `COSDictionary` (Courier font)
> Pages 1–2 are unaffected. All pages share the same indirect XObject refs 
> (`Obj4`, `Obj5`).
> Flow:
> - load PDF
> - render pages via `PDFRenderer.renderImageWithDPI()`
> - append invisible OCR text using `PDPageContentStream` (AppendMode.APPEND, 
> Courier)
> - save document → corruption occurs
> Result:
> java.io.IOException: Unexpected object type: COSDictionary
>  
> Reproduced consistently on:
>  * Windows Server 2016, Java 21, PDFBox 3.0.7
> Not reproducible on:
>  * Windows 11, Java 21 (same code + input)
> Likely related to shared indirect XObject being overwritten during save.
> Cannot share original PDF (confidential), but can test with synthetic 
> reproducer if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to