Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Joe Darcy Wed, 10 Jan 2024 21:50:26 -0800

On Wed, 10 Jan 2024 17:02:05 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:


> > Sounds like a CSR is needed for the behavioral change proposed here.
> 
> Thanks for flagging this @jddarcy
> 
> I'm personally not 100% convinced a CSR is warranted in this case, but I'm of 
> course happy to extend the following into a CSR if you or other reviewer 
> think that's the best way forward:
> 
> Let's review the intended and unintended behavioral changes in this PR:
> 
> The _intended_ behavioral change is to extend the set of ZIP entries parsable 
> by `ZipInputStream` to include "small Zip64 entries". Such entries meet the 
> following criteria:
> 
>     1. They are clearly marked as using the Zip64 format, meaning that the 
> LOC's 'compressed size' and 'uncompressed size' fields are set to the "Zip64 
> magic value" 0xFFFFFFFF and that the LOC's extra field includes a valid 
> _Zip64 Extended Information Field_.
> 
>     2. They use _streaming mode_, meaning that the 'general purpose bit flag' 
> 3 is set, that the Zip64 'Original Size' and 'Compressed Size' are both set 
> to zero and that file data is followed by a 'Data Descriptor' containing the 
> actual size values.
> 
>     3. The Data Descriptor represents the 'compressed size' and 
> 'uncompressed' size fields using 8 byte fields, instead of the regular 4 byte 
> fields.
> 
> 
> These entires are currently not parsable by `ZipInputStream`. Trying to parse 
> them fails with an exception similar to the following:
> 
> ```
> java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes)
>       at 
> java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616)
> ```
> 
> The _unintended_ behavioral change in this PR is that ZIP entries marked as 
> Zip64, but still having a Data Descriptor using 4 bytes to represent the size 
> fields will now become unparsable. Such entries meet critera 1 and 2 above, 
> but not 3.
> 
> While I have not performed a corpus search, I believe such entries are rare, 
> if they exist at all:
> 
>     * They would be in clear violation of the `APPNOTE.txt` spec, which in 
> 4.3.9.2 says  _ZIP64 format MAY be used regardless of the size of a file.  
> When extracting, if the zip64 extended information extra field is present for 
> the file the compressed and uncompressed sizes will be 8 byte values_
> 
>     * The `zipdetails` tool fails to parse such a file, instead reading the 
> two 4-byte numbers into one 8-byte 'compressed size'
> 
>     * The file cannot be parsed by external APIs similar to `ZipInputStream`, 
> such as the Python `stream-unzip` module.
> 
> 
> To avoid unintended behavioral changes in the implementation, care has been 
> taken to make the Zip64 extra field validation code robust to invalid input, 
> including out of bounds array access.
> 
> To summarize, this PR aims to allow a set of useful, valid ZIP files to be 
> parsed, at the cost of potentially making a small set of invalid ZIP files 
> unparsable.

@eirbjo, either of the intended or unintended effects of the PR on their own 
would merit a CSR and this is a case where the inclusive-OR applies so a CSR is 
definitely needed; thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1886328597

Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Reply via email to