Tilman Hausherr created PDFBOX-2023:
---------------------------------------

             Summary: zero font height
                 Key: PDFBOX-2023
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2023
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.0
            Reporter: Tilman Hausherr


Fred Andrews posted this to the user list:




I am using PDFTextStripper on some PDF statements from Bank of America, and 
everything is coming through as zero height. I traced it down to getFontHeight 
in org.apache.pdfbox.pdmodel.font.PDSimpleFont, which is indeed getting zero.  
The font is a type 3 font and I'm not sure how it should work, but 
getFontHeight is calling getAFM() and that is returning a null because its not 
a type 1 font.  Then in the next section in getFontHeight there are no font 
descriptors, and the zero just flows through all the way through getFontHeight. 

I searched for anything I could key on to calculate the font height but
couldn't find it.  The font size is claimed to be 20 by getFontSize(),
although it appears to be more like 8. I did trace to where it got a
font size command of twenty, but somehow I'm assuming that would need to be 
scaled, and I can't see where that might come from.

The font width on the other hand looks accurate, and I would think
something similar to that would be needed, but would really appreciate
some guidance on how it should work.  If I have clue on how it should
work I can see what I can do to implement it.

This file displays fine in Acrobat and edits fine in Nitro, so it can't
be that invalid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to