On Fri, 19 Jul 2024 23:16:07 GMT, Archie Cobbs <[email protected]> wrote:
>> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 143:
>>
>>> 141: * @param allowConcatenation true to allow multiple concatenated
>>> compressed data streams,
>>> 142: * or false to expect exactly one
>>> compressed data stream
>>> 143: * @param ignoreExtraBytes true to tolerate and ignore extra
>>> bytes, false to throw
>>
>> The term "extra" here feels somewhat open to interpretation. Specifically,
>> "extra" sounds like something that is out of the ordinary, but not uncommon
>> or wrong. It could be used when describing an optional feature in a format
>> specification.
>>
>> The API description referes to "unexpected data". Perhaps the word
>> "unexpected" is a more precise and descriptive term to use in this option?
>> So `ignoreUnexpectedBytes` or `ignoreUnexpectedData`.
>>
>> I think my eyebrows would be raised more when seeing someone ignoring
>> 'unexpected data' rather than when ignoring 'extra data'.
>>
>> I know this might smell of bikeshedding, but naming is important (and hard!).
>
>> The term "extra" here feels somewhat open to interpretation. Specifically,
>> "extra" sounds like something that is out of the ordinary, but not uncommon
>> or wrong. It could be used when describing an optional feature in a format
>> specification.
>>
>> The API description referes to "unexpected data". Perhaps the word
>> "unexpected" is a more precise and descriptive term to use in this option?
>> So ignoreUnexpectedBytes or ignoreUnexpectedData.
>
> This is a good point.. because it's actually a bit subtle what kind of data
> is being referred to here as being "ignored". Even "unexpected" doesn't quite
> capture it.
>
> To be precise, here's what bytes the `ignoreExtraBytes` parameter is going
> to cause to be ignored:
> * In the case `allowConcatenation == false`, it really does refer to "extra"
> or "extraneous" data, that is: one or more bytes having any value appearing
> _after_ the end of the GZIP trailer frame.
> * In the case `allowConcatenation == true`, it means a truncated or invalid
> GZIP _header frame_ appearing after the end of any GZIP trailer frame.
>
> The second case means "unexpected" seems wrong because why would unexpected
> data in a header frame be treated any differently from unexpected data any
> other frame (which of course is not going to be ignored but instead will
> trigger an exception)?
>
> So maybe this is just more evidence that we shouldn't use boolean parameters
> - because they're not really orthogonal.
>
> What about something like:
>
>
> public enum ConcatPolicy {
> DISALLOW_STRICT, // one GZIP stream, nothing else
> DISALLOW_LENIENT, // one GZIP stream, ignore any extra byte(s)
> ALLOW_STRICT, // N GZIP streams, nothing else
> ALLOW_LENIENT; // N GZIP streams, stop at, and ignore, an invalid
> header frame
> }
>
> The legacy behavior would of course correspond to `ALLOW_LENIENT`.
Another thought:
Are we sure we would want to introduce a new mode now which does _not_ allow
concatenation, but _does_ ignore trailing data?
If the ignore mode is really discouraged, why open a new mode of config
allowing it?
In other words, could we instead (at least conceptually) have the policies
LEGACY, SINGLE_STREAM, CONCATENATED_STREAM where the latter two always reject
trailing data?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1686942202