[
https://issues.apache.org/jira/browse/PDFBOX-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151154#comment-17151154
]
Tilman Hausherr commented on PDFBOX-4896:
-----------------------------------------
The two lines were added in [ [https://svn.apache.org/r1634938] ]. However the
lines were just moved from elsewhere so that part was more a refactoring. I ran
the visual regression tests and it worked fine. Finally I looked at the code
and I believe that the saving / restore is also done by the code in
processType3Stream() where {{saveGraphicsStack()}} and
{{restoreGraphicsStack()}} is called, i.e. that one should keep us safe even if
type3 processing destroys something.
> Don't save and restore graphic states around showGlyph in
> LegacyPDFStreamEngine
> -------------------------------------------------------------------------------
>
> Key: PDFBOX-4896
> URL: https://issues.apache.org/jira/browse/PDFBOX-4896
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 2.0.20, 3.0.0 PDFBox
> Reporter: Alfred
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Labels: Optimization
> Attachments: PDFBOX-4896.patch
>
>
> One of the major performance bottlenecks in text extraction was the
> clone + push and the pop + clone operations on the graphic state before and
> after the call to showGlyph.
> Not only it was slow to clone, it also consumes large amounts of memory
> making the garbage collector work harder.
> When extracting text, showGlyph does not modify the graphic state so there's
> no need to save / restore the state.
> The same could be true in general, not just for text extraction, but I do not
> understand the code well enough to decide.
> I have only modified the behavior for the LegacyPDFStreamEngine, to be safe.
> The showGlyph operation sounds like a read only operation, that should not
> modify anything.
>
> I have the code ready and I will submit a patch and a review.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]