Manfred Schauer created PDFBOX-5479: ---------------------------------------
Summary: PDFTextStripper needs 1GB heap for a 3.6 MB pdf Key: PDFBOX-5479 URL: https://issues.apache.org/jira/browse/PDFBOX-5479 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.26 Environment: JDK11.0.2 on MacOS 12.4 Reporter: Manfred Schauer Attachments: heapDump.png, x.pdf Extracting text from the attached x.pdf: PDDocument pdDocument = PDDocument.load(new File("/tmp/x.pdf")); PDFTextStripper stripper = new PDFTextStripper(); stripper.getText(pdDocument); succeeds with -Xmx1G but throws OOME with -Xmx900m Heapdump shows 2923 instances of TrueTypeFont, PDRessources.cache contains SoftReferences to lots of fonts keyed by different COSObjects; -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org