Subbu created COMPRESS-723:
------------------------------
Summary: Harden TAR PAX header parsing: enforce memory bound to
mitigate resource exhaustion from oversized headers
Key: COMPRESS-723
URL: https://issues.apache.org/jira/browse/COMPRESS-723
Project: Commons Compress
Issue Type: Bug
Components: Archivers
Reporter: Subbu
PAX extended header parsing in `TarUtils.parsePaxHeaders()` is the only
allocation path in commons-compress without an enforced, configurable memory
bound. The soft limit is set to `Long.MAX_VALUE`, which disables the
`MemoryLimitException` check entirely for this code path.
This leaves applications that process untrusted TAR archives (CI/CD pipelines,
container registries, backup restoration) unable to enforce a policy-driven cap
on PAX header allocation. A crafted `.tar.gz` with a large PAX header block
(text that compresses at >1000:1 with gzip) can force disproportionate heap
consumption relative to its on-wire size. While an implicit hard check against
`totalMemory()` exists deeper in the call stack, it is not an intentional
security control and does not allow granular configuration.
*Solution*
Enforce a configurable memory bound on PAX header parsing via a new
`maxPaxHeaderSize` builder option on `TarArchiveInputStream` and `TarFile`. The
default is 10 MB (`TarConstants.DEFAULT_MAX_PAX_HEADER_SIZE`), enforced through
the existing `MemoryLimitException.checkBytes()` mechanism. This closes the
last unbounded allocation surface in the TAR parsing pipeline and follows the
same defense-in-depth pattern already established for entry names and 7z
headers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)