This looks like a legitimate bug to me (the zip file system implementation is sensitive to the order of extra parameters and may throw an unexpected error on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no permissions - but I think it'd be worth adding one?
Dawid On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > Hi Lance, > > Yes, this is exactly the point where the problem is in JDK code. It's > directly > related to the zip entry beyond max int offset. The code tries to read data > from a local zip entry header at locoff, here: > > if (zipfs.readFullyAt(buf, 0, buf.length , locoff) > > but the locoff is set to ~0 as per the spec for files exceeding > 4 gigabytes, which says: > > 4.4.16 relative offset of local header: (4 bytes) > > This is the offset from the start of the first disk on > which this file appears, to where the local header SHOULD > be found. If an archive is in ZIP64 format and the value > in this field is 0xFFFFFFFF, the size will be in the > corresponding 8 byte zip64 extended information extra field. > > A proper fix would be to read the local header from zip64 extra data > first. I don't know how this interferes with the rest of the code though - > didn't have enough time to look at it. As it stands, zip entries > beyond 4GB cause > an unchecked exception while attribute-scanning. This simple snippet > is enough to demonstrate it, given a ZIP entry beyond 4GB range: > > try (FileSystem fs = > FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { > for (Path root : fs.getRootDirectories()) { > Files.walkFileTree(root, new SimpleFileVisitor<>() { > @Override > public FileVisitResult visitFile(Path file, BasicFileAttributes > attrs) throws IOException { > return FileVisitResult.CONTINUE; > } > }); > } > } > > The walkFileTree method is key here as it attempts to harvest > attributes (why we use it is another story -- this saves a lot of time > on large, network-mounted regular directories when you're collecting > file metadata). > > > Dawid > > > On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen > <lance.ander...@oracle.com> wrote: > > > > Hi David, > > > > From a quick look at ZipFileSystem this appears to be an optimization > to avoid looking at the LOC extended Timestamp Extra field > > > > If a Info-ZIP Extended Timestamp (0x5455)is found then: > > > > If the "zipinfo-time" entry was set to “false” in the Map specified > when creating the Zip FileSystem, > > > > FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") > > > > and the data size of the CEN extended Timestamp is 5 (flag + mod time) > > > > The modified time is used from the CEN Extended Timestamp extra field > > > > Otherwise get the modified time, creation time, and access time from the > LOC Extended Timestamp extra field. > > > > > > ————— > > > > Extended Timestamp Extra Field: > > > > ============================== > > > > The following is the layout of the extended-timestamp extra > block. > > (Last Revision 19970118) > > > > Local-header version: > > > > Value Size Description > > ----- ---- ----------- > > (time) 0x5455 Short tag for this extra block type ("UT") > > TSize Short total data size for this block > > Flags Byte info bits > > (ModTime) Long time of last modification (UTC/GMT) > > (AcTime) Long time of last access (UTC/GMT) > > (CrTime) Long time of original creation (UTC/GMT) > > > > Central-header version: > > > > Value Size Description > > ----- ---- ----------- > > (time) 0x5455 Short tag for this extra block type ("UT") > > TSize Short total data size for this block > > Flags Byte info bits (refers to local header!) > > (ModTime) Long time of last modification (UTC/GMT) > > > > The central-header extra field contains the modification time > only, > > or no timestamp at all. TSize is used to flag its presence or > > absence. But note: > > > > If "Flags" indicates that Modtime is present in the local > header > > field, it MUST be present in the central header field, too! > > This correspondence is required because the modification > time > > value may be used to support trans-timezone freshening and > > updating operations with zip archives. > > > > The time values are in standard Unix signed-long format, > indicating > > the number of seconds since 1 January 1970 00:00:00. The times > > are relative to Coordinated Universal Time (UTC), also > sometimes > > referred to as Greenwich Mean Time (GMT). To convert to local > time, > > the software must know the local timezone offset from UTC/GMT. > > > > The lower three bits of Flags in both headers indicate which > time- > > stamps are present in the LOCAL extra field: > > > > bit 0 if set, modification time is present > > bit 1 if set, access time is present > > bit 2 if set, creation time is present > > bits 3-7 reserved for additional timestamps; not > set > > > > Those times that are present will appear in the order > indicated, but > > any combination of times may be omitted. (Creation time may be > > present without access time, for example.) TSize should equal > > (1 + 4*(number of set bits in Flags)), as the block is > currently > > defined. Other timestamps may be added in the future. > > > > > > -------------- > > > > It's hard to comment on why you received the error that you did but it > is possible the tool that was used for writing the entry did something > unexpected. > > > > Out of curiosity, have you tried using ZipFile/ZipEntry to access the > entry? > > > > > > Best, > > Lance > > > > > > On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > > > Hello, > > > > We've encountered a seemingly valid ZIP file (zipinfo -v shows all its > > entries are intact) that causes a runtime exception when scanning its > > contents with ZipFileSystem. The exception indicates an invalid > > signature when parsing EXTID_EXTT. I don't quite understand this > > comment in the code: > > > > case EXTID_EXTT: > > // spec says the Extened timestamp in cen only has mtime > > // need to read the loc to get the extra a/ctime, if flag > > // "zipinfo-time" is not specified to false; > > // there is performance cost (move up to loc and read) to > > // access the loc table foreach entry; > > if (zipfs.noExtt) { > > if (sz == 5) > > mtime = unixToJavaTime(LG(extra, pos + 1)); > > break; > > } > > ... > > > > but this ZIP file has the extra data block of exactly 5 bytes, as > > indicated by zipinfo: > > > > Central directory entry #6: > > --------------------------- > > ... > > file system or operating system of origin: Unix > > version of encoding software: 3.0 > > minimum file system compatibility required: MS-DOS, OS/2 or NT FAT > > minimum software version required to extract: 2.0 > > compression method: deflated > > ... > > extended local header: no > > file last modified on (DOS date/time): 2018 Mar 1 04:56:20 > > file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 > local > > file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC > > ... > > Unix file attributes (100000 octal): ---------- > > MS-DOS file attributes (01 hex): read-only > > > > The central-directory extra field contains: > > - A subfield with ID 0x5455 (universal time) and 5 data bytes. > > The local extra field has UTC/GMT modification/access times. > > > > The above conditional block checking for length == 5 would have worked > > in ZipFileSystem but it's surrounded by a condition over an > > externally-provided property - zipfs.noExtt is only set to true if: > > > > this.noExtt = "false".equals(env.get("zipinfo-time")); > > > > I can't share this particular ZIP file with you and I don't know how > > it was created but it seems like that "zipinfo-time" flag could be > > omitted if the length of the extra data field is exactly 5? > > > > Dawid > > > > > > > > Best > > Lance > > ------------------ > > > > > > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > > Oracle Java Engineering > > 1 Network Drive > > Burlington, MA 01803 > > lance.ander...@oracle.com > > > > > > > > >