[
https://issues.apache.org/jira/browse/PDFBOX-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-2023:
------------------------------------
Description:
Fred Andrews posted this to the user list:
I am using PDFTextStripper on some PDF statements from Bank of America, and
everything is coming through as zero height. I traced it down to getFontHeight
in org.apache.pdfbox.pdmodel.font.PDSimpleFont, which is indeed getting zero.
The font is a type 3 font and I'm not sure how it should work, but
getFontHeight is calling getAFM() and that is returning a null because its not
a type 1 font. Then in the next section in getFontHeight there are no font
descriptors, and the zero just flows through all the way through getFontHeight.
I searched for anything I could key on to calculate the font height but
couldn't find it. The font size is claimed to be 20 by getFontSize(), although
it appears to be more like 8. I did trace to where it got a font size command
of twenty, but somehow I'm assuming that would need to be scaled, and I can't
see where that might come from.
The font width on the other hand looks accurate, and I would think something
similar to that would be needed, but would really appreciate some guidance on
how it should work. If I have clue on how it should work I can see what I can
do to implement it.
This file displays fine in Acrobat and edits fine in Nitro, so it can't be that
invalid.
was:
Fred Andrews posted this to the user list:
I am using PDFTextStripper on some PDF statements from Bank of America, and
everything is coming through as zero height. I traced it down to getFontHeight
in org.apache.pdfbox.pdmodel.font.PDSimpleFont, which is indeed getting zero.
The font is a type 3 font and I'm not sure how it should work, but
getFontHeight is calling getAFM() and that is returning a null because its not
a type 1 font. Then in the next section in getFontHeight there are no font
descriptors, and the zero just flows through all the way through getFontHeight.
I searched for anything I could key on to calculate the font height but
couldn't find it. The font size is claimed to be 20 by getFontSize(),
although it appears to be more like 8. I did trace to where it got a
font size command of twenty, but somehow I'm assuming that would need to be
scaled, and I can't see where that might come from.
The font width on the other hand looks accurate, and I would think
something similar to that would be needed, but would really appreciate
some guidance on how it should work. If I have clue on how it should
work I can see what I can do to implement it.
This file displays fine in Acrobat and edits fine in Nitro, so it can't
be that invalid.
> zero font height
> ----------------
>
> Key: PDFBOX-2023
> URL: https://issues.apache.org/jira/browse/PDFBOX-2023
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Attachments: PDFBOX-2023.pdf
>
>
> Fred Andrews posted this to the user list:
> I am using PDFTextStripper on some PDF statements from Bank of America, and
> everything is coming through as zero height. I traced it down to
> getFontHeight in org.apache.pdfbox.pdmodel.font.PDSimpleFont, which is indeed
> getting zero. The font is a type 3 font and I'm not sure how it should work,
> but getFontHeight is calling getAFM() and that is returning a null because
> its not a type 1 font. Then in the next section in getFontHeight there are
> no font descriptors, and the zero just flows through all the way through
> getFontHeight.
> I searched for anything I could key on to calculate the font height but
> couldn't find it. The font size is claimed to be 20 by getFontSize(),
> although it appears to be more like 8. I did trace to where it got a font
> size command of twenty, but somehow I'm assuming that would need to be
> scaled, and I can't see where that might come from.
> The font width on the other hand looks accurate, and I would think something
> similar to that would be needed, but would really appreciate some guidance on
> how it should work. If I have clue on how it should work I can see what I
> can do to implement it.
> This file displays fine in Acrobat and edits fine in Nitro, so it can't be
> that invalid.
--
This message was sent by Atlassian JIRA
(v6.2#6252)