Subbu created COMPRESS-723:
------------------------------

             Summary: Harden TAR PAX header parsing: enforce memory bound to 
mitigate resource exhaustion from oversized headers
                 Key: COMPRESS-723
                 URL: https://issues.apache.org/jira/browse/COMPRESS-723
             Project: Commons Compress
          Issue Type: Bug
          Components: Archivers
            Reporter: Subbu


PAX extended header parsing in `TarUtils.parsePaxHeaders()` is the only 
allocation path in commons-compress without an enforced, configurable memory 
bound. The soft limit is set to `Long.MAX_VALUE`, which disables the 
`MemoryLimitException` check entirely for this code path.

This leaves applications that process untrusted TAR archives (CI/CD pipelines, 
container registries, backup restoration) unable to enforce a policy-driven cap 
on PAX header allocation. A crafted `.tar.gz` with a large PAX header block 
(text that compresses at >1000:1 with gzip) can force disproportionate heap 
consumption relative to its on-wire size. While an implicit hard check against 
`totalMemory()` exists deeper in the call stack, it is not an intentional 
security control and does not allow granular configuration.

*Solution*


Enforce a configurable memory bound on PAX header parsing via a new 
`maxPaxHeaderSize` builder option on `TarArchiveInputStream` and `TarFile`. The 
default is 10 MB (`TarConstants.DEFAULT_MAX_PAX_HEADER_SIZE`), enforced through 
the existing `MemoryLimitException.checkBytes()` mechanism. This closes the 
last unbounded allocation surface in the TAR parsing pipeline and follows the 
same defense-in-depth pattern already established for entry names and 7z 
headers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to