[ 
https://issues.apache.org/jira/browse/PDFBOX-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275912#comment-14275912
 ] 

John Hewson commented on PDFBOX-2599:
-------------------------------------

Ok, I've solved the main problem with 2.0. The file has a number of errors, the 
fonts use Identity-H encoding but are not embedded, the CID2GIDMap is missing, 
we expect to fallback to the ToUnicodeMap, but that is missing too. Broken 
fonts like these fall under an ambiguous part of the PDF spec:

{quote}
The conforming reader shall select glyphs by translating characters from the 
encoding specified by the predefined CMap to one of the encodings in the 
TrueType font's 'cmap' table. The means by which this is accomplished are 
implementation-dependent.
{quote}

We try to emulate Acrobat's undocumented behaviour as much as possible, in the 
case it's not working as we were expecting a ToUnicodeMap to fall back to. The 
fix is to check if the ToUnicodeMap is missing, and if so, fall back to using 
an Identity encoding.

> failure to render file with utf8 CID TT fonts
> ---------------------------------------------
>
>                 Key: PDFBOX-2599
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2599
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.8.8, 1.8.9, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: John Hewson
>         Attachments: PDFBOX-2599.pdf, rendering-1.8.6.png
>
>
> The glyphs in the attached file are not rendered correctly. From Sanyam G. in 
> the user mailing list:
> {quote}
> I tried to convert the first page of the attached pdf to image and got the 
> attached resulting output
> Please note This PDF uses UTF8 character set and not ASCII character set.
> For ASCII  character set pdfs it works fine.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to