[ https://issues.apache.org/jira/browse/PDFBOX-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275912#comment-14275912 ]
John Hewson commented on PDFBOX-2599: ------------------------------------- Ok, I've solved the main problem with 2.0. The file has a number of errors, the fonts use Identity-H encoding but are not embedded, the CID2GIDMap is missing, we expect to fallback to the ToUnicodeMap, but that is missing too. Broken fonts like these fall under an ambiguous part of the PDF spec: {quote} The conforming reader shall select glyphs by translating characters from the encoding specified by the predefined CMap to one of the encodings in the TrueType font's 'cmap' table. The means by which this is accomplished are implementation-dependent. {quote} We try to emulate Acrobat's undocumented behaviour as much as possible, in the case it's not working as we were expecting a ToUnicodeMap to fall back to. The fix is to check if the ToUnicodeMap is missing, and if so, fall back to using an Identity encoding. > failure to render file with utf8 CID TT fonts > --------------------------------------------- > > Key: PDFBOX-2599 > URL: https://issues.apache.org/jira/browse/PDFBOX-2599 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 1.8.8, 1.8.9, 2.0.0 > Reporter: Tilman Hausherr > Assignee: John Hewson > Attachments: PDFBOX-2599.pdf, rendering-1.8.6.png > > > The glyphs in the attached file are not rendered correctly. From Sanyam G. in > the user mailing list: > {quote} > I tried to convert the first page of the attached pdf to image and got the > attached resulting output > Please note This PDF uses UTF8 character set and not ASCII character set. > For ASCII character set pdfs it works fine. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)