[
https://issues.apache.org/jira/browse/COMPRESS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085549#comment-18085549
]
Piotr Karwasz commented on COMPRESS-726:
----------------------------------------
Sure, you can implement both the absolute cap and mean ratio guard using
{{BoundedInputStream}}, but this still requires *every* user to apply the cap.
I believe that enforcing a default (configurable) {{100x}} limit on the
decompressed size, would protect most users from decompression bombs without
causing any disturbance. Usual decompression rates are between {{3x}} and
{{10x}}.
A decompression ratio guard would also work more quickly: it would be triggered
as soon as the output size goes beyond {{compressRatioLimit}} times the
*actually consumed* bytes, not the total payload size.
If Compress is used to decompress the payload sent by an HTTP client, the
client can *both* include a small decompression bomb and fake a huge payload
size.
> Build decompression-bomb protection into CompressorInputStream
> --------------------------------------------------------------
>
> Key: COMPRESS-726
> URL: https://issues.apache.org/jira/browse/COMPRESS-726
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Piotr Karwasz
> Priority: Major
>
> Every concrete {{CompressorInputStream}} already tracks both the compressed
> bytes it has consumed and the uncompressed bytes it has emitted, through
> {{InputStreamStatistics}}. The per-format differences in how the compressed
> count is sourced (raw-input counter vs. internal bit/byte reader) amount only
> to a bounded read-ahead skew (at most one decompressor buffer), which is
> irrelevant for decompression-bomb detection.
> We could therefore rely on {{InputStreamStatistics}} to introduce
> decompression-bomb protection into each {{CompressorInputStream}}. Concretely:
> # Make {{CompressorInputStream}} implement {{InputStreamStatistics}}, so the
> compressed/uncompressed counts are part of the base type for every format.
> # Let users configure decompression-bomb protection parameters on the streams
> the builders / {{CompressorStreamFactory}} return, enforced centrally in the
> base class.
> h2. Background: InputStreamStatistics is sufficient
> {{getUncompressedCount()}} is uniformly exact: it is the base
> {{CompressorInputStream.getBytesRead()}} counter, fed by {{count(...)}} on
> every {{read}}, so it equals the number of bytes actually handed to the
> caller.
> {{getCompressedCount()}} is a live counter in every implementation. Some
> source it from a {{BoundedInputStream}} wrapping the raw input (so it
> includes a fixed amount of decompressor read-ahead), others from the format's
> own bit/byte reader (read-ahead corrected). Because the read-ahead is bounded
> by a constant buffer size, the ratio {{uncompressedCount / compressedCount}}
> converges to the true ratio as the stream grows, and the absolute output
> count is always exact. Both protection strategies below are robust to this
> skew once a small grace period has elapsed.
> h2. Proposed change 1: CompressorInputStream implements InputStreamStatistics
> The base class already defines {{getUncompressedCount()}}. Add
> {{getCompressedCount()}} and declare {{implements InputStreamStatistics}}.
> All concrete subclasses except {{Pack200CompressorInputStream}} already
> override {{getCompressedCount()}}, so the base declaration should provide a
> default (return {{-1}} meaning "unknown") rather than be {{abstract}}, to
> preserve source and binary compatibility for external subclasses. Subclasses
> that have a real count keep overriding it.
> h2. Proposed change 2: configurable bomb protection on the returned streams
> Expose two independent, composable limits, settable through the existing
> builders and through {{CompressorStreamFactory}} (which already carries
> {{memoryLimitInKb}}, so this is a natural companion):
> * *Absolute output cap*: throw once {{getUncompressedCount()}} exceeds a
> configured maximum. Exact and format independent.
> * *Mean ratio guard*: once output passes a configurable grace threshold,
> throw if {{getUncompressedCount() / getCompressedCount()}} exceeds a
> configured maximum ratio. The grace period tolerates legitimately high local
> ratios at the start (headers, dictionaries, long runs) while still enforcing
> a sane mean ratio.
> h2. Gap: Pack200CompressorInputStream
> {{Pack200CompressorInputStream}} is the one exception. It overrides
> {{read(...)}} to delegate straight to an internal bridge stream, never calls
> {{count(...)}} (its Javadoc states {{getBytesRead()}} always returns 0), and
> does not implement {{InputStreamStatistics}}. To bring it under the same
> protection it must route output through {{count(...)}} and provide a real
> {{getCompressedCount()}} (counting bytes pulled from the original input).
> Otherwise it should be documented as unprotected, with
> {{getCompressedCount()}} returning {{-1}} and the ratio guard skipped (the
> absolute cap also cannot fire while output is uncounted).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)