[
https://issues.apache.org/jira/browse/PDFBOX-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575631#comment-17575631
]
Owen McGovern commented on PDFBOX-5485:
---------------------------------------
Ah, I think I can at least reproduce this with one of your PDFs after running
my code through all the PDFs in your repo.
Try pdfbox/pdfbox/target/test-classes/input/PDFBOX-3110-poems-beads.pdf as a
source document.
And try loading pages 1-1 or 1-2 and write them back out eg.
{code:java}
val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile)
if (pdfDoc.pages.count > 1 ) {
println("Found PDF with ${pdfDoc.pages.count} pages : $pdfFile")
val pageExtractor = PageExtractor(pdfDoc, 1, 1)
val pdfPages = pageExtractor.extract()
pdfPages.save(xmlFile)
}
{code}
> Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream
> -----------------------------------------------------------------------
>
> Key: PDFBOX-5485
> URL: https://issues.apache.org/jira/browse/PDFBOX-5485
> Project: PDFBox
> Issue Type: Bug
> Components: Writing
> Affects Versions: 3.0.0 PDFBox
> Environment: MacOS, but likely not OS specific.
> Reporter: Owen McGovern
> Priority: Major
>
> Version: org.apache.pdfbox:pdfbox:3.0.0-alpha3
>
> In a subset of PDFs I process, I cannot extract a range of PDF pages and
> write them out to a new PDF. ( As part of test code )
> Here's the Kotlin code I use
> {code:java}
> fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
> val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
> val pdfPagesFile = Paths.get("data", "input", "PDFS",
> "${documentName}_Page_$fromPage-$toPage.pdf")
> val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
> val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)
> val pdfPages = pageExtractor.extract()
> pdfPages.save(pdfPagesFile.toFile())
> return pdfPagesFile
> }{code}
> It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use.
>
> The a slice of the stack trace is
> {code:java}
> java.lang.StackOverflowError
> at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
> at java.base/java.util.HashMap.<init>(HashMap.java:453)
> at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:347)
> at java.base/java.util.HashSet.<init>(HashSet.java:162)
> at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:154)
> at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
> at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
> at
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
> {code}
> As I mentioned, hits some PDFs, not all.
> I legally cannot share the original source PDFs but it looks like a recursive
> loop in writeCOSDictionary and writeObject in COSWriterObjectStream.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]