On Wed, 10 Jan 2024 17:02:05 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
> > Sounds like a CSR is needed for the behavioral change proposed here. > > Thanks for flagging this @jddarcy > > I'm personally not 100% convinced a CSR is warranted in this case, but I'm of > course happy to extend the following into a CSR if you or other reviewer > think that's the best way forward: > > Let's review the intended and unintended behavioral changes in this PR: > > The _intended_ behavioral change is to extend the set of ZIP entries parsable > by `ZipInputStream` to include "small Zip64 entries". Such entries meet the > following criteria: > > 1. They are clearly marked as using the Zip64 format, meaning that the > LOC's 'compressed size' and 'uncompressed size' fields are set to the "Zip64 > magic value" 0xFFFFFFFF and that the LOC's extra field includes a valid > _Zip64 Extended Information Field_. > > 2. They use _streaming mode_, meaning that the 'general purpose bit flag' > 3 is set, that the Zip64 'Original Size' and 'Compressed Size' are both set > to zero and that file data is followed by a 'Data Descriptor' containing the > actual size values. > > 3. The Data Descriptor represents the 'compressed size' and > 'uncompressed' size fields using 8 byte fields, instead of the regular 4 byte > fields. > > > These entires are currently not parsable by `ZipInputStream`. Trying to parse > them fails with an exception similar to the following: > > ``` > java.util.zip.ZipException: invalid entry size (expected 0 but got 6 bytes) > at > java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:616) > ``` > > The _unintended_ behavioral change in this PR is that ZIP entries marked as > Zip64, but still having a Data Descriptor using 4 bytes to represent the size > fields will now become unparsable. Such entries meet critera 1 and 2 above, > but not 3. > > While I have not performed a corpus search, I believe such entries are rare, > if they exist at all: > > * They would be in clear violation of the `APPNOTE.txt` spec, which in > 4.3.9.2 says _ZIP64 format MAY be used regardless of the size of a file. > When extracting, if the zip64 extended information extra field is present for > the file the compressed and uncompressed sizes will be 8 byte values_ > > * The `zipdetails` tool fails to parse such a file, instead reading the > two 4-byte numbers into one 8-byte 'compressed size' > > * The file cannot be parsed by external APIs similar to `ZipInputStream`, > such as the Python `stream-unzip` module. > > > To avoid unintended behavioral changes in the implementation, care has been > taken to make the Zip64 extra field validation code robust to invalid input, > including out of bounds array access. > > To summarize, this PR aims to allow a set of useful, valid ZIP files to be > parsed, at the cost of potentially making a small set of invalid ZIP files > unparsable. @eirbjo, either of the intended or unintended effects of the PR on their own would merit a CSR and this is a case where the inclusive-OR applies so a CSR is definitely needed; thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1886328597