Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

Archie Cobbs Mon, 26 Feb 2024 07:25:00 -0800

On Mon, 26 Feb 2024 06:51:12 GMT, Jaikiran Pai <j...@openjdk.org> wrote:


>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains six additional 
>> commits since the last revision:
>> 
>>  - Merge branch 'master' into JDK-7036144
>>  - Merge branch 'master' into JDK-7036144
>>  - Address third round of review comments.
>>  - Address second round of review comments.
>>  - Address review comments.
>>  - Fix bug in GZIPInputStream when underlying available() returns short.
>
> Hello Archie, the proposal to not depend on the `available()` method of the 
> underlying `InputStream` to decide whether to read additional bytes from the 
> underlying stream to detect the "next" header seems reasonable.
> 
> What's being proposed here is that we proceed and read the underlying 
> stream's few additional bytes to detect the presence or absence of a GZIP 
> member header and if that attempt fails (with an IOException) then we 
> consider that we have reached the end of GZIP stream and just return back.  
> 
> For this change, I think we would also need to consider whether we should 
> "unread" the read bytes from the `InputStream` if those don't correspond to a 
> "next" GZIP member header. That way any underlying `InputStream` which was 
> implemented in a way that it would return availability as 0 when it knew that 
> the GZIP stream was done and yet had additional (non GZIP) data to read on 
> the underlying stream, would still be able to read that data after this 
> change. It's arguable whether we should have been doing that "unread" even 
> when we were doing the `available() > 0` check and the decision that comes 
> out of https://bugs.openjdk.org/browse/JDK-8322256 might cover that.

Hi @jaikiran,

I agree with your comments. My only question is whether we should do all of 
this in one stage or two stages.

My initial thought is to do this in two stages:
* A narrow fix for the bug described here as implemented by this PR
* A larger change (requiring a separate bug, CSR, and PR) to:
  * More precisely define and specify the expected behavior, with support for 
concatenated streams
  * Eliminate situations where we read beyond the end-of-stream (i.e., 
"unreading" if/when necessary)

The reason I think this two stage approach is appropriate is because there is 
no downside to doing it this way - that is, the problem you describe of reading 
beyond the end-of-stream is _already_ a problem in the current code, with the 
exception of the one corner case where this bug fix applies, namely, when 
`in.available()` returns zero and yet there actually _is_ more data available.

Your thoughts?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1964372772

Re: RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]

Reply via email to