[
https://issues.apache.org/jira/browse/PDFBOX-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163428#comment-17163428
]
Michael Klink edited comment on PDFBOX-4909 at 7/23/20, 11:07 AM:
------------------------------------------------------------------
{quote}Do you happen to have a PDF that would exhibit that problem?
{quote}
No, I don't, at least not that I'm aware of. I merely stumbled over the problem
looking at the code, storing a datum based on the current graphics state in a
text stripper member seemed outright wrong.
{quote}It would be great if the height was saved in the font.
{quote}
On one hand *yes, indeed,* as it really only depends on the font in question.
On the other hand, though, *no, please not,* as this number is an artificial
value which is coupled tightly with the text extraction code of the
{{LegacyPDFStreamEngine}} and {{PDFTextStripper}}, optimized for this usage by
trial and error, and not necessarily meaningful beyond.
Furthermore, an advantage of the current solution is the option of *overriding*
the calculation of this value, see [this stack overflow
answer|https://stackoverflow.com/a/63052240/1729265], an option that indeed can
make sense and, therefore, should remain.
was (Author: mkl):
{quote}Do you happen to have a PDF that would exhibit that problem?{quote}
No, I don't, at least not that I'm aware of. I merely stumbled over the problem
looking at the code, storing a datum based on the current graphics state seemed
outright wrong.
{quote}It would be great if the height was saved in the font.{quote}
On one hand *yes, indeed,* as it really only depends on the font in question.
On the other hand, though, *no, please not,* as this number is an artificial
value which is coupled tightly with the text extraction code of the
{{LegacyPDFStreamEngine}} and {{PDFTextStripper}}, optimized for this usage by
trial and error, and not necessarily meaningful beyond.
Furthermore, an advantage of the current solution is the option of *overriding*
the calculation of this value, see [this stack overflow
answer|https://stackoverflow.com/a/63052240/1729265], an option that indeed can
make sense and, therefore, should remain.
> Don't calculate font height for every glyph
> -------------------------------------------
>
> Key: PDFBOX-4909
> URL: https://issues.apache.org/jira/browse/PDFBOX-4909
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 2.0.0, 3.0.0 PDFBox
> Reporter: Alfred
> Assignee: Tilman Hausherr
> Priority: Major
> Labels: Optimization
> Fix For: 2.0.21, 3.0.0 PDFBox
>
> Attachments: PDFBOX-4909.patch
>
>
> LegacyPDFStreamEngine computes font height for every glyph and the
> computation is rather heavy, to work around all known problems.
> Instead of computing for every glyph, we can recompute only when the font
> changes. The SetFontAndSize operator will be invoked when the font changes so
> we can use that to compute and store the height to have it ready when needed
> in showGlyph.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]