Stefan Ziegler created PDFBOX-6197:
--------------------------------------

             Summary: TTFSubsetter: add support for custom cmap subtables 
(addCustomCmapEntry / addCustomCmap)
                 Key: PDFBOX-6197
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6197
             Project: PDFBox
          Issue Type: Improvement
          Components: FontBox
    Affects Versions: 3.0.7 PDFBox
            Reporter: Stefan Ziegler
         Attachments: TTFSubsetter.java

*Summary*

{{TTFSubsetter}} currently only writes a single Windows Unicode BMP cmap 
subtable (platform 3, encoding 1), and only when {{addAll()}} has been called. 
There is no API to inject additional cmap subtables. This makes it impossible 
to correctly re-subset TrueType fonts that rely on non-Unicode cmaps for 
rendering — in particular fonts produced by Ghostscript using its 
{{TT_BIAS=0xF000}} strategy.



*Problem*

Ghostscript embeds TrueType subsets in PDFs using a character code bias of 
{{{}0xF000{}}}. The resulting TTF contains two cmap subtables:
 * *Mac Roman (platform 1, encoding 0, format 6):* maps {{code N → glyph}} 
directly — this is the subtable viewers use for rendering
 * *Windows Symbol (platform 3, encoding 0, format 4):* maps {{0xF000+N → 
glyph}} — used by Windows-based viewers

When {{TTFSubsetter}} is used to re-subset such a font, both subtables are lost:
 # {{buildCmapTable()}} returns {{null}} when {{uniToGID}} is empty (i.e. when 
only {{addGlyphIds()}} was used)
 # Even when {{addAll()}} is called with PUA codepoints ({{{}U+F001{}}} etc.), 
only the Windows Unicode BMP subtable is written — not the Mac Roman subtable 
that viewers depend on for rendering
 # There is no API to inject additional cmap subtables with arbitrary 
platform/encoding combinations

The result is a re-subsetted TTF where viewers cannot find glyphs for the 
character codes present in the PDF content stream, producing blank/missing 
glyph rendering for Thai, CJK, and other scripts that go through this 
Ghostscript encoding path.

The patched {{TTFSubsetter.java}} is attached, based on the current development 
branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to