[ https://issues.apache.org/jira/browse/PDFBOX-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965403#comment-14965403 ]
ASF subversion and git services commented on PDFBOX-3038: --------------------------------------------------------- Commit 1709647 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1709647 ] PDFBOX-3038: return BBox from font descriptor if font BBox empty > Text extraction shows glyphs with zero height > --------------------------------------------- > > Key: PDFBOX-3038 > URL: https://issues.apache.org/jira/browse/PDFBOX-3038 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.0 > Reporter: Tilman Hausherr > Labels: regression > Fix For: 2.0.0 > > Attachments: PDFBOX-3038-001033-p2.pdf > > > This happens with file 001033.pdf: > 2.0: > {code} > String[108.0,663.6 fs=6.96 xscale=6.96 height=0.0 space=12.1104 > width=3.4800034]1 > String[144.0,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.996994]I > String[147.417,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=4.5]n > String[152.337,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.25] > String[154.88701,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 > width=2.501999]t > String[157.809,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=4.5]h > String[162.729,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 > width=3.9960022]e > String[167.145,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.25] > {code} > 1.8: > {code} > String[108.0,663.6 fs=6.96 xscale=6.96 height=4.57272 space=1.74 > width=3.4800034]1 > String[144.0,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.996994]I > String[147.417,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=4.5]n > String[152.337,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.25] > String[154.88701,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 > width=2.501999]t > String[157.809,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=4.5]h > String[162.729,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 > width=3.9960022]e > String[167.145,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.25] > {code} > The font has an empty bbox: > {code} > def > /FontBBox {0 0 0 0} > {code} > 1.8 had this code to get the height (in PDSimpleFont): > {code} > PDRectangle fontBBox = desc.getFontBoundingBox(); > if (fontBBox != null) > { > retval = fontBBox.getHeight() / 2; > } > if( retval == 0 ) > { > retval = desc.getCapHeight(); > } > if( retval == 0 ) > { > retval = desc.getAscent(); > } > if( retval == 0 ) > { > retval = desc.getXHeight(); > if (retval > 0) > { > retval -= desc.getDescent(); > } > } > {code} > 2.0 has only this: > {code} > float glyphHeight = font.getBoundingBox().getHeight() / 2; > {code} > So 2.0 takes the height from the font itself, and has no Plan B. > Getting the BBox from the font descriptor brings correct heights. (And a > better text extraction) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org