On Tue, 9 Jan 2024 13:06:56 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the >> number of compressed or uncompressed bytes read from the inflater is larger >> than the Zip64 magic value. >> >> While the ZIP format mandates that the data descriptor `SHOULD be stored in >> ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it >> also states that `ZIP64 format MAY be used regardless of the size of a >> file`. For such small entries, the above assumption does not hold. >> >> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the >> ZipEntry includes a Zip64 extra information field. This brings >> ZipInputStream into alignment with the APPNOTE format spec: >> >> >> When extracting, if the zip64 extended information extra >> field is present for the file the compressed and >> uncompressed sizes will be 8 byte values. >> >> >> While small Zip64 files with 8-byte data descriptors are not commonly found >> in the wild, it is possible to create one using the Info-ZIP command line >> `-fd` flag: >> >> `echo hello | zip -fd > hello.zip` >> >> The PR also adds a test verifying that such a small Zip64 file can be parsed >> by ZipInputStream. > > Eirik Bjørsnøs has updated the pull request incrementally with two additional > commits since the last revision: > > - Move hasZip64Extra(e) to the end of the 4/8-byte data descriptor check > - hasZip64 does not throw IOException Marking this PR ready for review again with the following changes applied: - The decision of whether to expect 64 bit data descriptors is moved from `readEnd` to `readLOC`. This allows access to the 'compressed size' and 'uncompressed size' of the LOC as well as direct access to the extra data, remediating concerns raised by @jaikiran about trusting any passed `ZipEntry` extra data. - Checking of LOC is tightened to require that the 'compressed size' and 'uncompressed size' fields are both set to the Zip64 magic marker 0xFFFFFFFF (Required for Zip64 entries) - Checking of the Zip64 field is tightened to require that the 'Original Size' and 'Compressed Size' are both present and set to zero. (Required for Zip64 entires using Data Descriptors) - An new internal boolean field `ZipInputStream.expect64BitDataDescriptor` passes the decision made in `readLOC` on to the `readEnd` method. The two support methods added by the PR have been renamed: - `hasZip64Extra` is now called `expect64BitDataDescriptor` - `isZip64ExtBlockSizeValid` is now called `isZip64DataDescriptorField` ------------- PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1884837398