Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Eirik Bjørsnøs Fri, 19 Jul 2024 03:06:12 -0700

On Mon, 15 Jul 2024 10:01:47 GMT, Jaikiran Pai <j...@openjdk.org> wrote:


>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 12 commits:
>> 
>>  - Bump @since from 23 to 24.
>>  - Merge branch 'master' into JDK-8322256
>>  - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
>>  - Simplify code by eliminating an impossible case.
>>  - Field name change and Javadoc wording tweaks.
>>  - Merge branch 'master' into JDK-8322256
>>  - Javadoc wording tweaks.
>>  - Merge branch 'master' into JDK-8322256
>>  - Clarify exceptions: sometimes ZipException, sometimes EOFException.
>>  - Merge branch 'master' into JDK-8322256
>>  - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b
>
> Hello Archie, sorry it has taken long to review this PR.
> 
> The proposal here is to have `java.util.zip.GZIPInputStream` specify (and 
> improve) how it deals with concatenated GZIP stream(s) and also allow for 
> applications instantiating `GZIPInputStream` to decide whether or not the 
> underlying `InputStream` passed to the `GZIPInputStream` instance is expected 
> to contain a concatenated GZIP stream(s).
> 
> I'll include here the text that I had sent in a private communication to 
> Archie:
> 
> I think this comes down to introducing a new optional boolean parameter to 
> the constructor of GZIPInputStream.
> 
> The boolean when "true" will attempt to read a potential additional GZIP 
> stream after the previous GZIP stream's trailer has been read. In that case 
> we need to handle 2 cases - one where we successfully find the header of the 
> next (concatenated) GZIP stream and the other where we don't find a valid 
> GZIP stream header. In the first case where we do find a header successfully, 
> then we continue reading the stream as usual and return the uncompressed data 
> from the "read()" call. In the case where we fail to find a valid header and 
> yet there were bytes past the previous GZIP stream, then I think the 
> GZIPInputStream.read() should throw an IOException, since that stream no 
> longer represents a GZIP stream (remember, we are talking about this only 
> when the GZIPInputStream was constructed with the new boolean parameter = 
> true).
> 
> Coming to the case where the GZIPInputStream was constructed using boolean = 
> false - In this case when we reach and read the trailer of a GZIP stream, if 
> there is no more bytes, then we consider this a completed GZIP stream and 
> return the uncompressed data. If there is any more bytes past the GZIP 
> stream's trailer, then I think we should throw a IOException (since we aren't 
> expecting any concatenated GZIP stream).
> 
> As for the default value of this optional boolean parameter, I think it 
> should default to "true" implying it will read any concatenated GZIP streams. 
> That would match the current implementation of GZIPInputStream.read() which 
> has the ability to read (valid) GZIP concatenated input stream.
> 
> I think this would then allow us to keep the implementation simple as well as 
> allow the calling application to have control over whether or not the passed 
> should be considered as having a concatenated GZIP stream.

This PR is open, but notifications don't seem to reach the mailing lists. I 
experienced a similar situation a while back where I believe @jaikiran reached 
out to the Skara team to get that PR out of the silent state.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2238816809

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Reply via email to