Stefan Ziegler created PDFBOX-6193:
--------------------------------------

             Summary: TrueTypeEmbedder.getTag() throws 
StringIndexOutOfBoundsException for negative Map.hashCode()
                 Key: PDFBOX-6193
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6193
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 3.0.7 PDFBox
            Reporter: Stefan Ziegler


{{TrueTypeEmbedder.getTag(Map<Integer, Integer> gidToCid)}} produces a 
six-character subset prefix from the hash code of the GID map. When the hash 
code is negative, the method crashes with {{StringIndexOutOfBoundsException}} 
instead of returning a valid tag.

The relevant code (lines 365–388 of {{TrueTypeEmbedder.java}} in 3.0.7):

 
{code:java}
public String getTag(Map<Integer, Integer> gidToCid)
{
    // deterministic
    long num = gidToCid.hashCode();
    // base25 encode
    StringBuilder sb = new StringBuilder();
    do
    {
        long div = num / 25;
        int mod = (int)(num % 25);
        sb.append(BASE25.charAt(mod));   // <-- crashes when mod is negative
        num = div;
    } while (num != 0 && sb.length() < 6);
    ...
}
{code}
 

*Root cause*

In Java, the {{%}} operator returns a result whose sign matches the sign of the 
left operand. Therefore, when {{num}} is negative, {{num % 25}} is in the range 
{{{}[-24, 0]{}}}, and {{BASE25.charAt(negativeIndex)}} immediately throws 
{{{}StringIndexOutOfBoundsException{}}}.

{{Map.hashCode()}} is the sum of entry hash codes per the {{Map.hashCode()}} 
contract; for sufficiently large or value-diverse maps this sum frequently 
overflows or otherwise becomes negative. In practice this means the bug is 
triggered probabilistically depending on the specific glyphs being subset.

 

The crash also occurs naturally when subsetting larger fonts whose resulting 
{{gidToCid}} map happens to hash to a negative value — there is roughly a 50% 
chance per subset, depending on the glyphs used in the document.

 

*Suggested fix*

Use {{{}Math.floorMod{}}}, which is defined to return a non-negative result for 
a positive divisor:
{code:java}
public String getTag(Map<Integer, Integer> gidToCid)
{
    long num = gidToCid.hashCode();    StringBuilder sb = new StringBuilder();
    do
    {
        int mod = (int) Math.floorMod(num, 25L);
        sb.append(BASE25.charAt(mod));
        num = Math.floorDiv(num, 25L);
    } while (num != 0 && sb.length() < 6);    while (sb.length() < 6)
    {
        sb.insert(0, 'A');
    }    sb.append('+');
    return sb.toString();
}{code}
Note that {{num / 25}} must also be replaced with {{Math.floorDiv(num, 25)}} to 
keep the loop's termination condition ({{{}num != 0{}}}) consistent with the 
new {{mod}} semantics. Without that change, the loop can spin between -1 and 1 
indefinitely (capped at 6 iterations by the length guard, but producing wrong 
tags).

Alternatively, {{Math.abs(num) % 25}} works for {{{}int{}}}-derived hash codes 
(since {{Math.abs}} of a widened {{int}} is always representable as 
{{{}long{}}}), but {{floorMod}} is the more idiomatic and self-documenting 
choice.

 

*Impact*

Any caller of {{TrueTypeEmbedder.subset()}} — i.e., the standard font-embedding 
path used by {{PDType0Font.load()}} and {{PDTrueTypeFont.load()}} when 
subsetting is enabled (the default) — can hit this crash. The likelihood scales 
with the number and identity of glyphs in the subset, so it manifests 
intermittently and is hard to attribute without reading the stack trace 
carefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to