[ 
https://issues.apache.org/jira/browse/PDFBOX-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117728#comment-15117728
 ] 

John Hewson edited comment on PDFBOX-3092 at 1/26/16 6:51 PM:
--------------------------------------------------------------

A cmap table doesn't need to define mappings for all glyphs, many glyphs have 
no code points, e.g composite glyphs (such as accents) and contextually 
substituted glyphs (from GSUB). Arial Unicode is certainly not a broken font! 
Microsoft are responsible for most of the TrueType / OTL spec and it's a 
flagship font on Windows.

The cmap table in Arial Unicode contains 2496 entries corresponding to 38916 
codepoint to glyph mappings, FontBox returns approx. 100 or so, so we're 
missing most of the cmap entries in FontBox, which is why PDFBox fails to 
render glyphs that we know exist in the cmap table and the font. I suspect this 
is because there are over thirty-eight thousand mappings, which FontBox isn't 
used to. 


was (Author: jahewson):
A cmap table doesn't need to define mappings for all glyphs, many glyphs have 
no code points, e.g composite glyphs (such as accents) and contextually 
substituted glyphs (from GSUB). Arial Unicode is certainly not a broken font! 
Microsoft are responsible for most of the TrueType / OTL spec and it's a 
flagship font on Windows.

The cmap table in Arial Unicode contains 2496 entries corresponding to 38916 
codepoint to glyph mappings, FontBox returns approx. 100 or so, so we're 
missing most of the cmap entries in FontBox, which is why PDFBox fails to 
render glyphs that we know exist in the cmap table and the font. I suspect this 
is because there are over thirty eight thousand mappings, which FontBox isn't 
used to. 

> Format 4 TTF cmap table is parsed incorrectly
> ---------------------------------------------
>
>                 Key: PDFBOX-3092
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3092
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>             Fix For: 2.1.0
>
>
> Certain large Format 4 cmap tables aren't being parsed correctly by 
> CmapSubtable#processSubtype4(), for example in the font "ArialUnicodeMS".
> This results in missing glyphs when rendering the file from PDFBOX-2950, when 
> "ArialUnicodeMS" is used as a substitute. You can force this to happen by 
> changing the following line of PDCIDFontType2:
> {code}
> // find font or substitute
> CIDFontMapping mapping = FontMappers.instance()
>                                     .getCIDFont(getBaseFont(), 
> getFontDescriptor(),
>                                                 getCIDSystemInfo());
> {code}
> Replace getBaseFont() with "ArialUnicodeMS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to