Stefan Ziegler created PDFBOX-6197:
--------------------------------------
Summary: TTFSubsetter: add support for custom cmap subtables
(addCustomCmapEntry / addCustomCmap)
Key: PDFBOX-6197
URL: https://issues.apache.org/jira/browse/PDFBOX-6197
Project: PDFBox
Issue Type: Improvement
Components: FontBox
Affects Versions: 3.0.7 PDFBox
Reporter: Stefan Ziegler
Attachments: TTFSubsetter.java
*Summary*
{{TTFSubsetter}} currently only writes a single Windows Unicode BMP cmap
subtable (platform 3, encoding 1), and only when {{addAll()}} has been called.
There is no API to inject additional cmap subtables. This makes it impossible
to correctly re-subset TrueType fonts that rely on non-Unicode cmaps for
rendering — in particular fonts produced by Ghostscript using its
{{TT_BIAS=0xF000}} strategy.
*Problem*
Ghostscript embeds TrueType subsets in PDFs using a character code bias of
{{{}0xF000{}}}. The resulting TTF contains two cmap subtables:
* *Mac Roman (platform 1, encoding 0, format 6):* maps {{code N → glyph}}
directly — this is the subtable viewers use for rendering
* *Windows Symbol (platform 3, encoding 0, format 4):* maps {{0xF000+N →
glyph}} — used by Windows-based viewers
When {{TTFSubsetter}} is used to re-subset such a font, both subtables are lost:
# {{buildCmapTable()}} returns {{null}} when {{uniToGID}} is empty (i.e. when
only {{addGlyphIds()}} was used)
# Even when {{addAll()}} is called with PUA codepoints ({{{}U+F001{}}} etc.),
only the Windows Unicode BMP subtable is written — not the Mac Roman subtable
that viewers depend on for rendering
# There is no API to inject additional cmap subtables with arbitrary
platform/encoding combinations
The result is a re-subsetted TTF where viewers cannot find glyphs for the
character codes present in the PDF content stream, producing blank/missing
glyph rendering for Thai, CJK, and other scripts that go through this
Ghostscript encoding path.
The patched {{TTFSubsetter.java}} is attached, based on the current development
branch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]