On Mon, 15 Jul 2024 10:01:47 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
>> Archie Cobbs has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 12 commits: >> >> - Bump @since from 23 to 24. >> - Merge branch 'master' into JDK-8322256 >> - Relabel "trailing garbage" as "extra bytes" to sound less accusatory. >> - Simplify code by eliminating an impossible case. >> - Field name change and Javadoc wording tweaks. >> - Merge branch 'master' into JDK-8322256 >> - Javadoc wording tweaks. >> - Merge branch 'master' into JDK-8322256 >> - Clarify exceptions: sometimes ZipException, sometimes EOFException. >> - Merge branch 'master' into JDK-8322256 >> - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b > > Hello Archie, sorry it has taken long to review this PR. > > The proposal here is to have `java.util.zip.GZIPInputStream` specify (and > improve) how it deals with concatenated GZIP stream(s) and also allow for > applications instantiating `GZIPInputStream` to decide whether or not the > underlying `InputStream` passed to the `GZIPInputStream` instance is expected > to contain a concatenated GZIP stream(s). > > I'll include here the text that I had sent in a private communication to > Archie: > > I think this comes down to introducing a new optional boolean parameter to > the constructor of GZIPInputStream. > > The boolean when "true" will attempt to read a potential additional GZIP > stream after the previous GZIP stream's trailer has been read. In that case > we need to handle 2 cases - one where we successfully find the header of the > next (concatenated) GZIP stream and the other where we don't find a valid > GZIP stream header. In the first case where we do find a header successfully, > then we continue reading the stream as usual and return the uncompressed data > from the "read()" call. In the case where we fail to find a valid header and > yet there were bytes past the previous GZIP stream, then I think the > GZIPInputStream.read() should throw an IOException, since that stream no > longer represents a GZIP stream (remember, we are talking about this only > when the GZIPInputStream was constructed with the new boolean parameter = > true). > > Coming to the case where the GZIPInputStream was constructed using boolean = > false - In this case when we reach and read the trailer of a GZIP stream, if > there is no more bytes, then we consider this a completed GZIP stream and > return the uncompressed data. If there is any more bytes past the GZIP > stream's trailer, then I think we should throw a IOException (since we aren't > expecting any concatenated GZIP stream). > > As for the default value of this optional boolean parameter, I think it > should default to "true" implying it will read any concatenated GZIP streams. > That would match the current implementation of GZIPInputStream.read() which > has the ability to read (valid) GZIP concatenated input stream. > > I think this would then allow us to keep the implementation simple as well as > allow the calling application to have control over whether or not the passed > should be considered as having a concatenated GZIP stream. This PR is open, but notifications don't seem to reach the mailing lists. I experienced a similar situation a while back where I believe @jaikiran reached out to the Skara team to get that PR out of the silent state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2238816809