On Mon, 8 Jan 2024 18:21:06 GMT, Joe Darcy <[email protected]> wrote:
> Sounds like a CSR is needed for the behavioral change proposed here.
Thanks for flagging this @jddarcy
I'm personally not 100% convinced a CSR is warranted in this case, but I'm of
course happy to extend the following into a CSR if you or other reviewer think
that's the best way forward:
Let's review the intended and unintended behavioral changes in this PR:
The _intended_ behavioral change is to extend the set of ZIP entries parsable
by `ZipInputStream` to include "small Zip64 entries". Such entries meet the
following criteria:
1. They are clearly marked as using the Zip64 format, meaning that the LOC's
'compressed size' and 'uncompressed size' fields are set to the "Zip64 magic
value" 0xFFFFFFFF and that the LOC's extra field includes a valid _Zip64
Extended Information Field_.
2. They use _streaming mode_, meaning that the 'general purpose bit flag' 3 is
set, that the Zip64 'Original Size' and 'Compressed Size' are both set to zero
and that file data is followed by a 'Data Descriptor' containing the actual
size values.
3. The Data Descriptor represents the 'compressed size' and 'uncompressed' size
fields using 8 byte fields, instead of the regular 4 byte fields.
These entires are currently not parsable by `ZipInputStream`. Trying to parse
them fails with an exception similar to the following:
java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes)
at
java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616)
The _unintended_ behavioral change in this PR is that ZIP entries marked as
Zip64, but still having a Data Descriptor using 4 bytes to represent the size
fields will now become unparsable. Such entries meet critera 1 and 2 above, but
not 3.
While I have not performed a corpus search, I believe such entries are rare, if
they exist at all:
- They would be in clear violation of the `APPNOTE.txt` spec, which in 4.3.9.2
says _ZIP64 format MAY be used regardless of the size of a file. When
extracting, if the zip64 extended information extra field is present for the
file the compressed and uncompressed sizes will be 8 byte values_
- The `zipdetails` tool fails to parse such a file, instead reading the two
4-byte numbers into one 8-byte 'compressed size'
- The file cannot be parsed by external APIs similar to `ZipInputStream`, such
as the Python `stream-unzip` module.
To avoid unintended behavioral changes in the implementation, care has been
taken to make the Zip64 extra field validation code robust to invalid input,
including out of bounds array access.
To summarize, this PR aims to allow a set of useful, valid ZIP files to be
parsed, at the cost of potentially making a small set of invalid ZIP files
unparsable.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1885245195