https://bugs.kde.org/show_bug.cgi?id=520200

            Bug ID: 520200
           Summary: cachedCharset() using qstricmp on non-terminated data,
                    causing cache growth and 100% CPU
    Classification: Frameworks and Libraries
           Product: frameworks-kcodecs
      Version First unspecified
       Reported In:
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: critical
          Priority: NOR
         Component: general
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

DESCRIPTION

`cachedCharset(QByteArrayView)` in `src/kcodecs.cpp` uses
`qstricmp(name.data(), charset.data())` to look up charsets in the cache.
However, `QByteArrayView::data()` is not guaranteed to be null-terminated — it
points into the middle of a larger buffer.

When parsing RFC 2047 encoded-words like `=?us-ascii?Q?...?=`, the
`maybeCharset` QByteArrayView in `parseEncodedWord()` points to
`"us-ascii?Q?<encoded data>..."` (not null-terminated at the charset boundary).
`qstricmp` compares beyond the charset name into the encoded data, so the
comparison with cached entries (e.g. `"US-ASCII\0"`) always fails at the `?` vs
`\0` position.

As a result:
1. The lookup never finds a match
2. A new "US-ASCII" entry is appended to `charsetCache` on every call
3. The cache grows without bound (one entry per encoded-word parsed)
4. The linear search through the cache becomes progressively slower — O(n²)
5. After hours, the cache contains millions of duplicate entries

This manifests as KMail (or any application using KCodecs to parse RFC 2047
headers) consuming 100% CPU and multiple GB of RAM, appearing completely
frozen. In my case, KMail accumulated 935 minutes of CPU time and 2 GB of RSS
on a single email with 18 `=?us-ascii?Q?...?=` encoded-words in
X-SG-EID/X-SG-ID headers (SendGrid tracking headers). This is extremely common
in commercial/transactional emails.


STEPS TO REPRODUCE

1. Receive an email with headers containing multiple consecutive
`=?us-ascii?Q?...?=` encoded-words (e.g. SendGrid X-SG-EID/X-SG-ID headers)
2. Open KMail and let it sync the folder containing that email
3. Observe KMail consuming 100% CPU indefinitely


OBSERVED RESULT

KMail freezes at 100% CPU on the main thread, stuck in `qstricmp()` called from
`cachedCharset()` → `decodeRFC2047String()` → `KMime::Content::parse()`.


EXPECTED RESULT

The email headers should be parsed in microseconds.


SOFTWARE/OS VERSIONS

KCodecs 6.25.0 (also confirmed on 6.27.0)
KMail 25.04
Qt 6.x
Arch Linux


ADDITIONAL INFORMATION

It can be fixed with `name.compare(charset, Qt::CaseInsensitive)`.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to