Piotr Karwasz created COMPRESS-726:
--------------------------------------

             Summary: Build decompression-bomb protection into 
CompressorInputStream
                 Key: COMPRESS-726
                 URL: https://issues.apache.org/jira/browse/COMPRESS-726
             Project: Commons Compress
          Issue Type: Improvement
            Reporter: Piotr Karwasz


Every concrete {{CompressorInputStream}} already tracks both the compressed 
bytes it has consumed and the uncompressed bytes it has emitted, through 
{{InputStreamStatistics}}. The per-format differences in how the compressed 
count is sourced (raw-input counter vs. internal bit/byte reader) amount only 
to a bounded read-ahead skew (at most one decompressor buffer), which is 
irrelevant for decompression-bomb detection.

We could therefore rely on {{InputStreamStatistics}} to introduce 
decompression-bomb protection into each {{CompressorInputStream}}. Concretely:

# Make {{CompressorInputStream}} implement {{InputStreamStatistics}}, so the 
compressed/uncompressed counts are part of the base type for every format.
# Let users configure decompression-bomb protection parameters on the streams 
the builders / {{CompressorStreamFactory}} return, enforced centrally in the 
base class.

h2. Background: InputStreamStatistics is sufficient

{{getUncompressedCount()}} is uniformly exact: it is the base 
{{CompressorInputStream.getBytesRead()}} counter, fed by {{count(...)}} on 
every {{read}}, so it equals the number of bytes actually handed to the caller.

{{getCompressedCount()}} is a live counter in every implementation. Some source 
it from a {{BoundedInputStream}} wrapping the raw input (so it includes a fixed 
amount of decompressor read-ahead), others from the format's own bit/byte 
reader (read-ahead corrected). Because the read-ahead is bounded by a constant 
buffer size, the ratio {{uncompressedCount / compressedCount}} converges to the 
true ratio as the stream grows, and the absolute output count is always exact. 
Both protection strategies below are robust to this skew once a small grace 
period has elapsed.

h2. Proposed change 1: CompressorInputStream implements InputStreamStatistics

The base class already defines {{getUncompressedCount()}}. Add 
{{getCompressedCount()}} and declare {{implements InputStreamStatistics}}.

All concrete subclasses except {{Pack200CompressorInputStream}} already 
override {{getCompressedCount()}}, so the base declaration should provide a 
default (return {{-1}} meaning "unknown") rather than be {{abstract}}, to 
preserve source and binary compatibility for external subclasses. Subclasses 
that have a real count keep overriding it.

h2. Proposed change 2: configurable bomb protection on the returned streams

Expose two independent, composable limits, settable through the existing 
builders and through {{CompressorStreamFactory}} (which already carries 
{{memoryLimitInKb}}, so this is a natural companion):

* *Absolute output cap*: throw once {{getUncompressedCount()}} exceeds a 
configured maximum. Exact and format independent.
* *Mean ratio guard*: once output passes a configurable grace threshold, throw 
if {{getUncompressedCount() / getCompressedCount()}} exceeds a configured 
maximum ratio. The grace period tolerates legitimately high local ratios at the 
start (headers, dictionaries, long runs) while still enforcing a sane mean 
ratio.

h2. Gap: Pack200CompressorInputStream

{{Pack200CompressorInputStream}} is the one exception. It overrides 
{{read(...)}} to delegate straight to an internal bridge stream, never calls 
{{count(...)}} (its Javadoc states {{getBytesRead()}} always returns 0), and 
does not implement {{InputStreamStatistics}}. To bring it under the same 
protection it must route output through {{count(...)}} and provide a real 
{{getCompressedCount()}} (counting bytes pulled from the original input). 
Otherwise it should be documented as unprotected, with {{getCompressedCount()}} 
returning {{-1}} and the ratio guard skipped (the absolute cap also cannot fire 
while output is uncounted).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to