Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Eirik Bjørsnøs Wed, 10 Jan 2024 09:05:40 -0800

On Mon, 8 Jan 2024 18:21:06 GMT, Joe Darcy <[email protected]> wrote:


> Sounds like a CSR is needed for the behavioral change proposed here.

Thanks for flagging this @jddarcy 

I'm personally not 100% convinced a CSR is warranted in this case, but I'm of 
course happy to extend the following into a CSR if you or other reviewer think 
that's the best way forward:  

Let's review the intended and unintended behavioral changes in this PR:  

The _intended_ behavioral change is to extend the set of ZIP entries parsable 
by `ZipInputStream` to include "small Zip64 entries". Such entries meet the 
following criteria:
 
1. They are clearly marked as using the Zip64 format, meaning that the LOC's 
'compressed size' and 'uncompressed size' fields are set to the "Zip64 magic 
value" 0xFFFFFFFF and that the LOC's extra field includes a valid _Zip64 
Extended Information Field_.
2. They use _streaming mode_, meaning that the 'general purpose bit flag' 3 is 
set, that the Zip64 'Original Size' and 'Compressed Size' are both set to zero 
and that file data is followed by a 'Data Descriptor' containing the actual 
size values.
3. The Data Descriptor represents the 'compressed size' and 'uncompressed' size 
fields using 8 byte fields, instead of the regular 4 byte fields.

These entires are currently not parsable by `ZipInputStream`. Trying to parse 
them fails with an exception similar to the following:


java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes)
        at 
java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616)


The _unintended_ behavioral change in this PR is that ZIP entries marked as 
Zip64, but still having a Data Descriptor using 4 bytes to represent the size 
fields will now become unparsable. Such entries meet critera 1 and 2 above, but 
not 3.

While I have not performed a corpus search, I believe such entries are rare, if 
they exist at all:

- They would be in clear violation of the `APPNOTE.txt` spec, which in 4.3.9.2 
says  _ZIP64 format MAY be used regardless of the size of a file.  When 
extracting, if the zip64 extended information extra field is present for the 
file the compressed and uncompressed sizes will be 8 byte values_
- The `zipdetails` tool fails to parse such a file, instead reading the two 
4-byte numbers into one 8-byte 'compressed size'
- The file cannot be parsed by external APIs similar to `ZipInputStream`, such 
as the Python `stream-unzip` module.

To avoid unintended behavioral changes in the implementation, care has been 
taken to make the Zip64 extra field validation code robust to invalid input, 
including out of bounds array access. 

To summarize, this PR aims to allow a set of useful, valid ZIP files to be 
parsed, at the cost of potentially making a small set of invalid ZIP files 
unparsable.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1885245195

Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Reply via email to