On Mon, 8 Jan 2024 15:43:34 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
> The crucial part here is that the "current" `ZipEntry` is the one which was > created by a previous call to `ZipInputStream.getNextEntry()` method, which > is a public overridable method. Furthermore, the `ZipEntry.getExtra()` method > too is overridable. So in theory, the byte[] data returned by the > `ZipEntry.getExtra()` isn't controlled or validated and can be anything that > the overridden method might return. Yes, we cannot rely on the extra field contents to be well-formed and valid. Because of this, `hasZip64Extra` must be coded in a defensive fashion, meaning it returns true only when it finds a valid Zip64 extra field, false otherwise. It should also never throw exceptions. This is the case regardless of ZipEntry concerns, since the input ZIP data may already be malformed. > The bigger question however is, should `readEnd()` now be entrusted the > responsibility of reading the `ZipEntry.getExtra()` byte[] contents that can > be anything and even if the content of the byte[] seem to represent a zip64 > entry (i.e. the newly introduced `hasZip64Extra()` method returning `true`), > should that decision be trusted to further lead the `readEnd()` method to > read the compressed/uncompressed sizes in data descriptor as 8 bytes? Would > this need additional checks of some form? To avoid readEND relying on "untrusted" ZipEntry contents, I moved the call to `hasZip64Extra` from `readEND` to `readLOC`. This allows `hasZip64Extra` to work on the raw extra data. For this to work, I needed to add a new internal boolean field `ZipInputStream.expectEightBitDataDescriptor`. When true, this flag indicates to `readEND` that the next data descriptor has 8-byte size fields. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12524#issuecomment-1883339627