Hi Dawid, If you believe you have a a fix, please submit a patch along with a JTREG test case. Ideally the test case would create zip file as part of the test.
Best Lance > On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > Just for the archives - I'm not sure if this ended up being filed to > the bug system - the repro below demonstrates the bug on all JDKs up > to the newest one. > > # create a single random file of 250 megabytes > head -c 250M < /dev/urandom > rnd.bin > # create a bunch of hardlinks to the same file. > for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done > # create a zip archive exceeding 4gb (in stored mode so that it's faster) > zip -0 archive.zip rnd*.bin > # show the content of the archive - for all entries beyond 4gb this should > # show extra data blocks like below (note the order of extra data > blocks - this is important). > # > # The central-directory extra field contains: > # - A subfield with ID 0x5455 (universal time) and 5 data bytes. > # The local extra field has UTC/GMT modification/access times. > # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes: > # 01 04 ea 03 00 00 04 ea 03 00 00. > # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes: > # a4 06 a0 86 01 00 00 00. > # > zipinfo -v archive.zip > > # Bug repro code: > cat > Test.java <<EOF > import java.io.*; > import java.nio.*; > import java.nio.file.*; > import java.nio.file.attribute.*; > > public class Test { > public static void main(String[]args) throws IOException { > try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) { > for (Path root : fs.getRootDirectories()) { > Files.walkFileTree(root, new SimpleFileVisitor<>() { > public FileVisitResult visitFile(Path file, > BasicFileAttributes attrs) throws IOException { > return FileVisitResult.CONTINUE; > } > }); > } > } > } > } > EOF > # and run it. > java Test.java > > The above ends with: > > Exception in thread "main" java.util.zip.ZipException: loc: wrong sig > ->841cd111 > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532) > at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767) > at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777) > at > jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276) > at java.base/java.nio.file.Files.readAttributes(Files.java:1843) > at > java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) > at > java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) > at java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2840) > at java.base/java.nio.file.Files.walkFileTree(Files.java:2876) > at Test.main(Test.java:10) > > The fix is relatively simple I think - the current code assumes entry > offset is < 4GB when parsing extra data for > ID 0x5455. This should be evaluated lazily after all extra data blocks > are parsed to ensure the file offset is correctly > updated based on block ID 0x0001, regardless of its ordering within > other extra data blocks. > > > > On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.we...@gmail.com> wrote: >> >> >> This looks like a legitimate bug to me (the zip file system implementation >> is sensitive to the order of extra parameters and may throw an unexpected >> error >> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no >> permissions - but I think it'd be worth adding one? >> >> Dawid >> >> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com> wrote: >>> >>> Hi Lance, >>> >>> Yes, this is exactly the point where the problem is in JDK code. It's >>> directly >>> related to the zip entry beyond max int offset. The code tries to read data >>> from a local zip entry header at locoff, here: >>> >>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff) >>> >>> but the locoff is set to ~0 as per the spec for files exceeding >>> 4 gigabytes, which says: >>> >>> 4.4.16 relative offset of local header: (4 bytes) >>> >>> This is the offset from the start of the first disk on >>> which this file appears, to where the local header SHOULD >>> be found. If an archive is in ZIP64 format and the value >>> in this field is 0xFFFFFFFF, the size will be in the >>> corresponding 8 byte zip64 extended information extra field. >>> >>> A proper fix would be to read the local header from zip64 extra data >>> first. I don't know how this interferes with the rest of the code though - >>> didn't have enough time to look at it. As it stands, zip entries >>> beyond 4GB cause >>> an unchecked exception while attribute-scanning. This simple snippet >>> is enough to demonstrate it, given a ZIP entry beyond 4GB range: >>> >>> try (FileSystem fs = >>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { >>> for (Path root : fs.getRootDirectories()) { >>> Files.walkFileTree(root, new SimpleFileVisitor<>() { >>> @Override >>> public FileVisitResult visitFile(Path file, BasicFileAttributes >>> attrs) throws IOException { >>> return FileVisitResult.CONTINUE; >>> } >>> }); >>> } >>> } >>> >>> The walkFileTree method is key here as it attempts to harvest >>> attributes (why we use it is another story -- this saves a lot of time >>> on large, network-mounted regular directories when you're collecting >>> file metadata). >>> >>> >>> Dawid >>> >>> >>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen >>> <lance.ander...@oracle.com> wrote: >>>> >>>> Hi David, >>>> >>>> From a quick look at ZipFileSystem this appears to be an optimization to >>>> avoid looking at the LOC extended Timestamp Extra field >>>> >>>> If a Info-ZIP Extended Timestamp (0x5455)is found then: >>>> >>>> If the "zipinfo-time" entry was set to “false” in the Map specified when >>>> creating the Zip FileSystem, >>>> >>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") >>>> >>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time) >>>> >>>> The modified time is used from the CEN Extended Timestamp extra field >>>> >>>> Otherwise get the modified time, creation time, and access time from the >>>> LOC Extended Timestamp extra field. >>>> >>>> >>>> ————— >>>> >>>> Extended Timestamp Extra Field: >>>> >>>> ============================== >>>> >>>> The following is the layout of the extended-timestamp extra block. >>>> (Last Revision 19970118) >>>> >>>> Local-header version: >>>> >>>> Value Size Description >>>> ----- ---- ----------- >>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>> TSize Short total data size for this block >>>> Flags Byte info bits >>>> (ModTime) Long time of last modification (UTC/GMT) >>>> (AcTime) Long time of last access (UTC/GMT) >>>> (CrTime) Long time of original creation (UTC/GMT) >>>> >>>> Central-header version: >>>> >>>> Value Size Description >>>> ----- ---- ----------- >>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>> TSize Short total data size for this block >>>> Flags Byte info bits (refers to local header!) >>>> (ModTime) Long time of last modification (UTC/GMT) >>>> >>>> The central-header extra field contains the modification time >>>> only, >>>> or no timestamp at all. TSize is used to flag its presence or >>>> absence. But note: >>>> >>>> If "Flags" indicates that Modtime is present in the local >>>> header >>>> field, it MUST be present in the central header field, too! >>>> This correspondence is required because the modification time >>>> value may be used to support trans-timezone freshening and >>>> updating operations with zip archives. >>>> >>>> The time values are in standard Unix signed-long format, >>>> indicating >>>> the number of seconds since 1 January 1970 00:00:00. The times >>>> are relative to Coordinated Universal Time (UTC), also sometimes >>>> referred to as Greenwich Mean Time (GMT). To convert to local >>>> time, >>>> the software must know the local timezone offset from UTC/GMT. >>>> >>>> The lower three bits of Flags in both headers indicate which time- >>>> stamps are present in the LOCAL extra field: >>>> >>>> bit 0 if set, modification time is present >>>> bit 1 if set, access time is present >>>> bit 2 if set, creation time is present >>>> bits 3-7 reserved for additional timestamps; not set >>>> >>>> Those times that are present will appear in the order indicated, >>>> but >>>> any combination of times may be omitted. (Creation time may be >>>> present without access time, for example.) TSize should equal >>>> (1 + 4*(number of set bits in Flags)), as the block is currently >>>> defined. Other timestamps may be added in the future. >>>> >>>> >>>> -------------- >>>> >>>> It's hard to comment on why you received the error that you did but it is >>>> possible the tool that was used for writing the entry did something >>>> unexpected. >>>> >>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the >>>> entry? >>>> >>>> >>>> Best, >>>> Lance >>>> >>>> >>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: >>>> >>>> Hello, >>>> >>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its >>>> entries are intact) that causes a runtime exception when scanning its >>>> contents with ZipFileSystem. The exception indicates an invalid >>>> signature when parsing EXTID_EXTT. I don't quite understand this >>>> comment in the code: >>>> >>>> case EXTID_EXTT: >>>> // spec says the Extened timestamp in cen only has mtime >>>> // need to read the loc to get the extra a/ctime, if flag >>>> // "zipinfo-time" is not specified to false; >>>> // there is performance cost (move up to loc and read) to >>>> // access the loc table foreach entry; >>>> if (zipfs.noExtt) { >>>> if (sz == 5) >>>> mtime = unixToJavaTime(LG(extra, pos + 1)); >>>> break; >>>> } >>>> ... >>>> >>>> but this ZIP file has the extra data block of exactly 5 bytes, as >>>> indicated by zipinfo: >>>> >>>> Central directory entry #6: >>>> --------------------------- >>>> ... >>>> file system or operating system of origin: Unix >>>> version of encoding software: 3.0 >>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT >>>> minimum software version required to extract: 2.0 >>>> compression method: deflated >>>> ... >>>> extended local header: no >>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20 >>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local >>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC >>>> ... >>>> Unix file attributes (100000 octal): ---------- >>>> MS-DOS file attributes (01 hex): read-only >>>> >>>> The central-directory extra field contains: >>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes. >>>> The local extra field has UTC/GMT modification/access times. >>>> >>>> The above conditional block checking for length == 5 would have worked >>>> in ZipFileSystem but it's surrounded by a condition over an >>>> externally-provided property - zipfs.noExtt is only set to true if: >>>> >>>> this.noExtt = "false".equals(env.get("zipinfo-time")); >>>> >>>> I can't share this particular ZIP file with you and I don't know how >>>> it was created but it seems like that "zipinfo-time" flag could be >>>> omitted if the length of the extra data field is exactly 5? >>>> >>>> Dawid >>>> >>>> >>>> >>>> Best >>>> Lance >>>> ------------------ >>>> >>>> >>>> >>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 >>>> Oracle Java Engineering >>>> 1 Network Drive >>>> Burlington, MA 01803 >>>> lance.ander...@oracle.com >>>> >>>> >>>> >>>> Best Lance ------------------ Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com