[
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983199#comment-14983199
]
Daniel Persson commented on PDFBOX-3075:
----------------------------------------
John: I've read your reply at
http://mail-archives.apache.org/mod_mbox/pdfbox-users/201510.mbox/%[email protected]%3E
And as you say there you need to rethink the font height so it works with
PDFTextStripper. My changes made it though the test cases so I think the
stripper can't be that dependent on the actual text height. It uses the fonts
boundingbox height not the font.getHeight(int code) that gives you a specific
glyph height.
Futher more all the font types doesn't have glyphs defined. Could be wrong
behavior but in those cases you could only approximate the height. My patch
gave me a unified font height in the 1000 em system so I could make accurate
calculations on the position and height of glyphs.
I've been running a many tests on these functions but I would like to
contribute back because the help I've gotten from PDFBOX is great. When it
comes to the width advance it's pretty accurate as long as I make small changes
when we have vertical texts and texts that writes from right to left. But we've
solved those too.
The API documentation only states
Description copied from interface: PDFontLike
Returns the height of the given character, in glyph space. This can be
expensive to calculate. Results are only approximate.
Which is not that descriptive. So what do you recommend that I do going forth.
I would like to build my solution on PDFBOX and I have time alotted by my
company to contribute code back to PDFBOX when our work requires changes in the
PDFBOX engine.
This could only be done if we go in the same direction. Should all font's have
glyphs?
> Changed to the getHeight function for fonts so it will return a more accurate
> height
> ------------------------------------------------------------------------------------
>
> Key: PDFBOX-3075
> URL: https://issues.apache.org/jira/browse/PDFBOX-3075
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Daniel Persson
> Priority: Minor
> Labels: github-import
> Fix For: 2.0.0
>
> Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases
> only height the first time the function was called. Tried to clean up the
> functions and return a more accurate height for each glyph.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]