[ https://issues.apache.org/jira/browse/PDFBOX-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-4896: --------------------------------------- Fix Version/s: 3.0.0 PDFBox 2.0.21 > Don't save and restore graphic states around showGlyph in > LegacyPDFStreamEngine > ------------------------------------------------------------------------------- > > Key: PDFBOX-4896 > URL: https://issues.apache.org/jira/browse/PDFBOX-4896 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction > Affects Versions: 2.0.20, 3.0.0 PDFBox > Reporter: Alfred > Assignee: Andreas Lehmkühler > Priority: Minor > Labels: Optimization > Fix For: 2.0.21, 3.0.0 PDFBox > > Attachments: PDFBOX-4896.patch > > > One of the major performance bottlenecks in text extraction was the > clone + push and the pop + clone operations on the graphic state before and > after the call to showGlyph. > Not only it was slow to clone, it also consumes large amounts of memory > making the garbage collector work harder. > When extracting text, showGlyph does not modify the graphic state so there's > no need to save / restore the state. > The same could be true in general, not just for text extraction, but I do not > understand the code well enough to decide. > I have only modified the behavior for the LegacyPDFStreamEngine, to be safe. > The showGlyph operation sounds like a read only operation, that should not > modify anything. > > I have the code ready and I will submit a patch and a review. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org