Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Archie Cobbs Thu, 06 Jun 2024 10:04:38 -0700

> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
> order to do this, after a GZIP trailer frame is read, it attempts to read a 
> GZIP header frame and, if successful, proceeds onward to decompress the new 
> stream. If the attempt to decode a GZIP header frame fails, or happens to 
> trigger an `IOException`, it just ignores the trailing garbage and/or the 
> `IOException` and returns EOF.
> 
> There are several issues with this:
> 
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
> trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
> or other data corruption that an application would rather be notified about. 
> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
> data is read, but obviously that doesn't happen in the trailing garbage 
> scenario (so N concatenated streams where the last one has a corrupted header 
> frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
> stream concatenation.
> 
> On the other hand, `GZIPInputStream` is an old class with lots of existing 
> usage, so it's important to preserve the existing behavior, warts and all 
> (note: my the definition of "existing behavior" here includes the bug fix in 
> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
> 
> So this patch adds a new constructor that takes two new parameters 
> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
> enabled by setting both to true; otherwise, they do what they sound like. In 
> particular, when `allowTrailingGarbage` is false, then the underlying input 
> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
> exception is guaranteed.


Archie Cobbs has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 12 commits:

 - Bump @since from 23 to 24.
 - Merge branch 'master' into JDK-8322256
 - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
 - Simplify code by eliminating an impossible case.
 - Field name change and Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Javadoc wording tweaks.
 - Merge branch 'master' into JDK-8322256
 - Clarify exceptions: sometimes ZipException, sometimes EOFException.
 - Merge branch 'master' into JDK-8322256
 - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b

-------------

Changes: https://git.openjdk.org/jdk/pull/18385/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=04
  Stats: 318 lines in 2 files changed: 292 ins; 13 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/18385.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385

PR: https://git.openjdk.org/jdk/pull/18385

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Reply via email to