[ 
https://issues.apache.org/jira/browse/PDFBOX-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983199#comment-14983199
 ] 

Daniel Persson commented on PDFBOX-3075:
----------------------------------------

John: I've read your reply at
http://mail-archives.apache.org/mod_mbox/pdfbox-users/201510.mbox/%[email protected]%3E

And as you say there you need to rethink the font height so it works with 
PDFTextStripper. My changes made it though the test cases so I think the 
stripper can't be that dependent on the actual text height. It uses the fonts 
boundingbox height not the font.getHeight(int code) that gives you a specific 
glyph height.

Futher more all the font types doesn't have glyphs defined. Could be wrong 
behavior but in those cases you could only approximate the height. My patch 
gave me a unified font height in the 1000 em system so I could make accurate 
calculations on the position and height of glyphs.

I've been running a many tests on these functions but I would like to 
contribute back because the help I've gotten from PDFBOX is great. When it 
comes to the width advance it's pretty accurate as long as I make small changes 
when we have vertical texts and texts that writes from right to left. But we've 
solved those too.

The API documentation only states

Description copied from interface: PDFontLike
Returns the height of the given character, in glyph space. This can be 
expensive to calculate. Results are only approximate.

Which is not that descriptive. So what do you recommend that I do going forth. 
I would like to build my solution on PDFBOX and I have time alotted by my 
company to contribute code back to PDFBOX when our work requires changes in the 
PDFBOX engine. 

This could only be done if we go in the same direction. Should all font's have 
glyphs?

> Changed to the getHeight function for fonts so it will return a more accurate 
> height
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3075
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3075
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Daniel Persson
>            Priority: Minor
>              Labels: github-import
>             Fix For: 2.0.0
>
>         Attachments: get_height.patch
>
>
> The getHeight in the fonts gave back approximated heights and in some cases 
> only height the first time the function was called. Tried to clean up the 
> functions and return a more accurate height for each glyph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to