Unicode characters displayed with wrong Advance
-----------------------------------------------
Key: PDFBOX-1283
URL: https://issues.apache.org/jira/browse/PDFBOX-1283
Project: PDFBox
Issue Type: Bug
Components: PDFReader
Affects Versions: 1.6.0
Reporter: Daniel Schwinn
Attachments: AnnahmeReport_MitRussischTest.pdf
The file AnnahmeReport_MitRussischTest.pdf is not displayed correctly. The
advance of the characters is calculated wrong. The document is displayed
correctly in Adobe Reader.
In PDCIDFont.java the method extractWidths() fills widthCache with the
character widths based on the array in the "W" Dictionary. The widthCache seems
to translate from from Unicode to character width but the "W" Dictionary
translates from CID-code to character width.
In this PDF file the TTF font is embedded and the CID code is identical to the
glyph code in the TTF font. A cmap maps from unicode directly to the cid/gid in
the ttf font.
So this cache is filled in the wrong way or when accessing the cache it is not
taken into account that this array containes the widths based on the cid/gid.
The cmap encoding has to be used when filling the cache or when reading the
values from the cache
I checked if Adobe Reader uses the values in /W to determine the widths to rule
out the case that
the PDF file is faulty and adobe reader just ignores the faulty /W array.
When changing the entries for the glyphs number 20..23 in the /W array of the
bold font
(first 4 values in the second line of the array which match to characters
'1'..'4')
then the numbers are displayed with wrong widths in AdobeReader while nothing
changes in PDFBox.
(file AnnahmeReport_MitRussischTest_Modified.pdf)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira