[ https://issues.apache.org/jira/browse/PDFBOX-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-5479: --------------------------------------- Issue Type: Improvement (was: Bug) > PDFTextStripper needs 1GB heap for a 3.6 MB pdf > ----------------------------------------------- > > Key: PDFBOX-5479 > URL: https://issues.apache.org/jira/browse/PDFBOX-5479 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction > Affects Versions: 2.0.26 > Environment: JDK11.0.2 on MacOS 12.4 > Reporter: Manfred Schauer > Priority: Minor > Attachments: heapDump.png, x.pdf > > > Extracting text from the attached x.pdf: > PDDocument pdDocument = PDDocument.load(new File("/tmp/x.pdf")); > PDFTextStripper stripper = new PDFTextStripper(); > stripper.getText(pdDocument); > succeeds with -Xmx1G but throws OOME with -Xmx900m > Heapdump shows 2923 instances of TrueTypeFont, PDRessources.cache contains > SoftReferences to lots of fonts keyed by different COSObjects; -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org