[ 
https://issues.apache.org/jira/browse/PDFBOX-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832259#comment-17832259
 ] 

Andreas Lehmkühler commented on PDFBOX-5786:
--------------------------------------------

The given pdf is malformed and triggers the brute force parser. During the 
parsing process some of the object number got disordered as the xref table from 
the brute force parser isn't transfered to the COSDocument. I've fixed that.

But the main reason for the NPE was the fact that the referenced object for the 
key "53 0 R" can't be parsed which led to a null reference for the object AND 
the double usage of the object number "53 0 R" triggers the repair mechanism 
which deletes the double key from the COSObject and replace it with the (fixed) 
from the referenced object. The later was missing and the code doesn't take 
that into account and a NPE occured.

The first part of the fix already fixed the NPE. But I can image a case where a 
malformed pdf is merged with another one using the same object numbers, so that 
the issue might be triggered again. All ended up in applying the proposal 
[~tilman] made just to be on the save side.

[~tilman] thanks for creating the report

> NPE in COSWriter.getObjectKey() when saving broken file
> -------------------------------------------------------
>
>                 Key: PDFBOX-5786
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5786
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 3.0.1 PDFBox, 4.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>
> This happens with the broken file from PDFBOX-5782 when this code is run:
> {code:java}
> PDDocument doc = Loader.loadPDF(new File("PDFBOX-5782.pdf"));
> PDFRenderer r = new PDFRenderer(doc);
> r.renderImage(0);
> doc.save(OutputStream.nullOutputStream());
> {code}
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
>       at java.base/java.util.Hashtable.put(Hashtable.java:475)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.getObjectKey(COSWriter.java:1082)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.writeReference(COSWriter.java:1391)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1231)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.writeDictionary(COSWriter.java:1179)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1226)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.writeDictionary(COSWriter.java:1179)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDictionary(COSWriter.java:1226)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1413)
>       at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:381)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:606)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.doWriteBodyCompressed(COSWriter.java:492)
>       at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1319)
>       at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:429)
>       at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1593)
>       at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1469)
>       at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1044)
>       at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:968)
> {noformat}
> It does not happen if {{r.renderImage(0);}} is removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to