[ 
https://issues.apache.org/jira/browse/COMPRESS-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085547#comment-18085547
 ] 

Gary D. Gregory commented on COMPRESS-726:
------------------------------------------

Why not simply document using Commons IO'Is BoundedInoutStream? That's exactly 
what it's for. I think it is better to compose solutions with what we have 
instead of adding complexity. 

See 
https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BoundedInputStream.html


> Build decompression-bomb protection into CompressorInputStream
> --------------------------------------------------------------
>
>                 Key: COMPRESS-726
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-726
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Piotr Karwasz
>            Priority: Major
>
> Every concrete {{CompressorInputStream}} already tracks both the compressed 
> bytes it has consumed and the uncompressed bytes it has emitted, through 
> {{InputStreamStatistics}}. The per-format differences in how the compressed 
> count is sourced (raw-input counter vs. internal bit/byte reader) amount only 
> to a bounded read-ahead skew (at most one decompressor buffer), which is 
> irrelevant for decompression-bomb detection.
> We could therefore rely on {{InputStreamStatistics}} to introduce 
> decompression-bomb protection into each {{CompressorInputStream}}. Concretely:
> # Make {{CompressorInputStream}} implement {{InputStreamStatistics}}, so the 
> compressed/uncompressed counts are part of the base type for every format.
> # Let users configure decompression-bomb protection parameters on the streams 
> the builders / {{CompressorStreamFactory}} return, enforced centrally in the 
> base class.
> h2. Background: InputStreamStatistics is sufficient
> {{getUncompressedCount()}} is uniformly exact: it is the base 
> {{CompressorInputStream.getBytesRead()}} counter, fed by {{count(...)}} on 
> every {{read}}, so it equals the number of bytes actually handed to the 
> caller.
> {{getCompressedCount()}} is a live counter in every implementation. Some 
> source it from a {{BoundedInputStream}} wrapping the raw input (so it 
> includes a fixed amount of decompressor read-ahead), others from the format's 
> own bit/byte reader (read-ahead corrected). Because the read-ahead is bounded 
> by a constant buffer size, the ratio {{uncompressedCount / compressedCount}} 
> converges to the true ratio as the stream grows, and the absolute output 
> count is always exact. Both protection strategies below are robust to this 
> skew once a small grace period has elapsed.
> h2. Proposed change 1: CompressorInputStream implements InputStreamStatistics
> The base class already defines {{getUncompressedCount()}}. Add 
> {{getCompressedCount()}} and declare {{implements InputStreamStatistics}}.
> All concrete subclasses except {{Pack200CompressorInputStream}} already 
> override {{getCompressedCount()}}, so the base declaration should provide a 
> default (return {{-1}} meaning "unknown") rather than be {{abstract}}, to 
> preserve source and binary compatibility for external subclasses. Subclasses 
> that have a real count keep overriding it.
> h2. Proposed change 2: configurable bomb protection on the returned streams
> Expose two independent, composable limits, settable through the existing 
> builders and through {{CompressorStreamFactory}} (which already carries 
> {{memoryLimitInKb}}, so this is a natural companion):
> * *Absolute output cap*: throw once {{getUncompressedCount()}} exceeds a 
> configured maximum. Exact and format independent.
> * *Mean ratio guard*: once output passes a configurable grace threshold, 
> throw if {{getUncompressedCount() / getCompressedCount()}} exceeds a 
> configured maximum ratio. The grace period tolerates legitimately high local 
> ratios at the start (headers, dictionaries, long runs) while still enforcing 
> a sane mean ratio.
> h2. Gap: Pack200CompressorInputStream
> {{Pack200CompressorInputStream}} is the one exception. It overrides 
> {{read(...)}} to delegate straight to an internal bridge stream, never calls 
> {{count(...)}} (its Javadoc states {{getBytesRead()}} always returns 0), and 
> does not implement {{InputStreamStatistics}}. To bring it under the same 
> protection it must route output through {{count(...)}} and provide a real 
> {{getCompressedCount()}} (counting bytes pulled from the original input). 
> Otherwise it should be documented as unprotected, with 
> {{getCompressedCount()}} returning {{-1}} and the ratio guard skipped (the 
> absolute cap also cannot fire while output is uncounted).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to