[
https://issues.apache.org/jira/browse/PDFBOX-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046107#comment-18046107
]
Andreas Lehmkühler commented on PDFBOX-6036:
--------------------------------------------
The origin of the regression are overlapping object numbers. Some of the
indirect objects of the imported uses the same object numbers than other
objects in the target pdf. Depending on the order of processing the objects
while writing the result this may lead to a mixed up pdf.
IMHO the only solution is to reassign new numbers to the imported indirect
objects to avoid overlapping object numbers. Now, all numbers of the imported
page are set to null and a new number is assigned when writing the pdf.
[~tilman] I've added your test from PDFBOX-5752
> StackOverflowError in COSWriterCompressionPool for large number of bookmarks
> ----------------------------------------------------------------------------
>
> Key: PDFBOX-6036
> URL: https://issues.apache.org/jira/browse/PDFBOX-6036
> Project: PDFBox
> Issue Type: Bug
> Components: Writing
> Affects Versions: 3.0.6 PDFBox, 4.0.0
> Reporter: Bernhard Fey
> Assignee: Andreas Lehmkühler
> Priority: Critical
> Fix For: 3.0.7 PDFBox, 4.0.0
>
>
> Saving a document containing thousands of bookmarks causes a
> {{StackOverflowError}} in {{{}COSWriterCompressionPool{}}}.
>
> The stack trace alternates between the following two methods:
>
>
> {code:java}
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:190)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:204)
> {code}
>
>
> You can replicate the issue with the following class:
>
>
> {code:java}
> public class StackOverflBookm {
> public static void main(String[] args) {
> for (int i = 1; i <= 1_111_111; i *= 2) {
> System.out.println(new
> java.text.DecimalFormat("#,###").format(i));
> try (org.apache.pdfbox.pdmodel.PDDocument document = new
> org.apache.pdfbox.pdmodel.PDDocument()) {
>
> org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDDocumentOutline
> outline =
> new
> org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDDocumentOutline();
>
> document.getDocumentCatalog().setDocumentOutline(outline);
> for (int j = 0; j < i; j++) {
> outline.addLast(new
> org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDOutlineItem());
> }
> document.save(new java.io.ByteArrayOutputStream(),
>
> org.apache.pdfbox.pdfwriter.compress.CompressParameters.DEFAULT_COMPRESSION);
> // NO_COMPRESSION avoids the Error
> } catch (Throwable e) {
> e.printStackTrace(System.out);
> return;
> }
> }
> }
> }{code}
>
>
> Without compression it will create documents with up to over a million
> bookmarks (assuming sufficient heap size, 1 gigabyte is enough).
> But with the default compression the StackOverflowError will be thrown (on my
> Windows VMs, with a default stack size of about 1 megabyte, before reaching
> ten thousand bookmarks).
>
> Apparently the recursion depth of the two methods grows with the amount of
> bookmarks, until the Error is thrown.
>
> I have chosen the priority Critical, because an {{Error}} is thrown and,
> unlike Exceptions, likely not handled in integrations.
> Additionally, most integrations likely use the {{save}} methods without the
> second parameter, getting compression enabled by default.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]