Stefan Ziegler created PDFBOX-6193:
--------------------------------------
Summary: TrueTypeEmbedder.getTag() throws
StringIndexOutOfBoundsException for negative Map.hashCode()
Key: PDFBOX-6193
URL: https://issues.apache.org/jira/browse/PDFBOX-6193
Project: PDFBox
Issue Type: Bug
Affects Versions: 3.0.7 PDFBox
Reporter: Stefan Ziegler
{{TrueTypeEmbedder.getTag(Map<Integer, Integer> gidToCid)}} produces a
six-character subset prefix from the hash code of the GID map. When the hash
code is negative, the method crashes with {{StringIndexOutOfBoundsException}}
instead of returning a valid tag.
The relevant code (lines 365–388 of {{TrueTypeEmbedder.java}} in 3.0.7):
{code:java}
public String getTag(Map<Integer, Integer> gidToCid)
{
// deterministic
long num = gidToCid.hashCode();
// base25 encode
StringBuilder sb = new StringBuilder();
do
{
long div = num / 25;
int mod = (int)(num % 25);
sb.append(BASE25.charAt(mod)); // <-- crashes when mod is negative
num = div;
} while (num != 0 && sb.length() < 6);
...
}
{code}
*Root cause*
In Java, the {{%}} operator returns a result whose sign matches the sign of the
left operand. Therefore, when {{num}} is negative, {{num % 25}} is in the range
{{{}[-24, 0]{}}}, and {{BASE25.charAt(negativeIndex)}} immediately throws
{{{}StringIndexOutOfBoundsException{}}}.
{{Map.hashCode()}} is the sum of entry hash codes per the {{Map.hashCode()}}
contract; for sufficiently large or value-diverse maps this sum frequently
overflows or otherwise becomes negative. In practice this means the bug is
triggered probabilistically depending on the specific glyphs being subset.
The crash also occurs naturally when subsetting larger fonts whose resulting
{{gidToCid}} map happens to hash to a negative value — there is roughly a 50%
chance per subset, depending on the glyphs used in the document.
*Suggested fix*
Use {{{}Math.floorMod{}}}, which is defined to return a non-negative result for
a positive divisor:
{code:java}
public String getTag(Map<Integer, Integer> gidToCid)
{
long num = gidToCid.hashCode(); StringBuilder sb = new StringBuilder();
do
{
int mod = (int) Math.floorMod(num, 25L);
sb.append(BASE25.charAt(mod));
num = Math.floorDiv(num, 25L);
} while (num != 0 && sb.length() < 6); while (sb.length() < 6)
{
sb.insert(0, 'A');
} sb.append('+');
return sb.toString();
}{code}
Note that {{num / 25}} must also be replaced with {{Math.floorDiv(num, 25)}} to
keep the loop's termination condition ({{{}num != 0{}}}) consistent with the
new {{mod}} semantics. Without that change, the loop can spin between -1 and 1
indefinitely (capped at 6 iterations by the length guard, but producing wrong
tags).
Alternatively, {{Math.abs(num) % 25}} works for {{{}int{}}}-derived hash codes
(since {{Math.abs}} of a widened {{int}} is always representable as
{{{}long{}}}), but {{floorMod}} is the more idiomatic and self-documenting
choice.
*Impact*
Any caller of {{TrueTypeEmbedder.subset()}} — i.e., the standard font-embedding
path used by {{PDType0Font.load()}} and {{PDTrueTypeFont.load()}} when
subsetting is enabled (the default) — can hit this crash. The likelihood scales
with the number and identity of glyphs in the subset, so it manifests
intermittently and is hard to attribute without reading the stack trace
carefully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]