Hi Dawid, Thank you for the follow up.
> On Nov 28, 2020, at 9:34 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > > I know what the problem is and I know what the fix should be (and tried to > convey it) but I don't have a patch. Given you believe you know what the fix would be, it would be great to create a patch for ZipFileSystem and see if it addresses the issue and then contribute the fix back to the OpenJDK community :-). > Writing the test case is not a problem but it'll be a lengthy test on slower > drives (has to create a zip file that exceeds 4gb of disk space). The test case would be configured as a manual test and would look similar to test/jdk/jdk/nio/zipfs/TestLocOffsetFromZip64EF.java <https://github.com/openjdk/jdk/pull/987#diff-73b401685399c9d59851cda49e9c0d377093086119555708253c8df99649e3c9> which also creates a 4GB zip. > I don't think there is a way to fake the file system by using a sparse file, > sadly. Would this be acceptable? If so then sure - I can provide a > reproducible test case that will fail for the current implementation. The above test runs in a reasonable time so I would expect this test case would as well. I will look to reproduce the issue over the weekend or early next week and log a bug with what you provided in your earlier email. Best Lance > > D. > > On Fri, Nov 27, 2020 at 8:37 PM Lance Andersen <lance.ander...@oracle.com > <mailto:lance.ander...@oracle.com>> wrote: > Hi Dawid, > > If you believe you have a a fix, please submit a patch along with a JTREG > test case. Ideally the test case would create zip file as part of the test. > > Best > Lance > >> On Nov 27, 2020, at 7:37 AM, Dawid Weiss <dawid.we...@gmail.com >> <mailto:dawid.we...@gmail.com>> wrote: >> >> Just for the archives - I'm not sure if this ended up being filed to >> the bug system - the repro below demonstrates the bug on all JDKs up >> to the newest one. >> >> # create a single random file of 250 megabytes >> head -c 250M < /dev/urandom > rnd.bin >> # create a bunch of hardlinks to the same file. >> for i in `seq -w 1 25`; do ln rnd.bin rnd-$i.bin; done >> # create a zip archive exceeding 4gb (in stored mode so that it's faster) >> zip -0 archive.zip rnd*.bin >> # show the content of the archive - for all entries beyond 4gb this should >> # show extra data blocks like below (note the order of extra data >> blocks - this is important). >> # >> # The central-directory extra field contains: >> # - A subfield with ID 0x5455 (universal time) and 5 data bytes. >> # The local extra field has UTC/GMT modification/access times. >> # - A subfield with ID 0x7875 (Unix UID/GID (any size)) and 11 data bytes: >> # 01 04 ea 03 00 00 04 ea 03 00 00. >> # - A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 8 data bytes: >> # a4 06 a0 86 01 00 00 00. >> # >> zipinfo -v archive.zip >> >> # Bug repro code: >> cat > Test.java <<EOF >> import java.io >> <https://urldefense.com/v3/__http://java.io__;!!GqivPVa7Brio!KhvWnawkon4E5sVeCpHmwDPJusjNPLMtqPh-ZBleD-U4Dv6tmWKB3cq3sT68yelqdA$>.*; >> import java.nio.*; >> import java.nio.file.*; >> import java.nio.file.attribute.*; >> >> public class Test { >> public static void main(String[]args) throws IOException { >> try (FileSystem fs = FileSystems.newFileSystem(Paths.get("archive.zip"))) >> { >> for (Path root : fs.getRootDirectories()) { >> Files.walkFileTree(root, new SimpleFileVisitor<>() { >> public FileVisitResult visitFile(Path file, >> BasicFileAttributes attrs) throws IOException { >> return FileVisitResult.CONTINUE; >> } >> }); >> } >> } >> } >> } >> EOF >> # and run it. >> java Test.java >> >> The above ends with: >> >> Exception in thread "main" java.util.zip.ZipException: loc: wrong sig >> ->841cd111 >> at >> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readExtra(ZipFileSystem.java:2899) >> at >> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.readCEN(ZipFileSystem.java:2600) >> at >> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem$Entry.<init>(ZipFileSystem.java:2536) >> at >> jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.getFileAttributes(ZipFileSystem.java:532) >> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:767) >> at jdk.zipfs/jdk.nio.zipfs.ZipPath.readAttributes(ZipPath.java:777) >> at >> jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.readAttributes(ZipFileSystemProvider.java:276) >> at java.base/java.nio.file.Files.readAttributes(Files.java:1843) >> at >> java.base/java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) >> at >> java.base/java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) >> at >> java.base/java.nio.file.FileTreeWalker.next(FileTreeWalker.java:373) >> at java.base/java.nio.file.Files.walkFileTree(Files.java:2840) >> at java.base/java.nio.file.Files.walkFileTree(Files.java:2876) >> at Test.main(Test.java:10) >> >> The fix is relatively simple I think - the current code assumes entry >> offset is < 4GB when parsing extra data for >> ID 0x5455. This should be evaluated lazily after all extra data blocks >> are parsed to ensure the file offset is correctly >> updated based on block ID 0x0001, regardless of its ordering within >> other extra data blocks. >> >> >> >> On Fri, Oct 23, 2020 at 9:12 AM Dawid Weiss <dawid.we...@gmail.com >> <mailto:dawid.we...@gmail.com>> wrote: >>> >>> >>> This looks like a legitimate bug to me (the zip file system implementation >>> is sensitive to the order of extra parameters and may throw an unexpected >>> error >>> on zip64 archives exceeding 4 gigabytes). I can't file a Jira issue - no >>> permissions - but I think it'd be worth adding one? >>> >>> Dawid >>> >>> On Wed, Oct 21, 2020 at 8:36 PM Dawid Weiss <dawid.we...@gmail.com >>> <mailto:dawid.we...@gmail.com>> wrote: >>>> >>>> Hi Lance, >>>> >>>> Yes, this is exactly the point where the problem is in JDK code. It's >>>> directly >>>> related to the zip entry beyond max int offset. The code tries to read data >>>> from a local zip entry header at locoff, here: >>>> >>>> if (zipfs.readFullyAt(buf, 0, buf.length , locoff) >>>> >>>> but the locoff is set to ~0 as per the spec for files exceeding >>>> 4 gigabytes, which says: >>>> >>>> 4.4.16 relative offset of local header: (4 bytes) >>>> >>>> This is the offset from the start of the first disk on >>>> which this file appears, to where the local header SHOULD >>>> be found. If an archive is in ZIP64 format and the value >>>> in this field is 0xFFFFFFFF, the size will be in the >>>> corresponding 8 byte zip64 extended information extra field. >>>> >>>> A proper fix would be to read the local header from zip64 extra data >>>> first. I don't know how this interferes with the rest of the code though - >>>> didn't have enough time to look at it. As it stands, zip entries >>>> beyond 4GB cause >>>> an unchecked exception while attribute-scanning. This simple snippet >>>> is enough to demonstrate it, given a ZIP entry beyond 4GB range: >>>> >>>> try (FileSystem fs = >>>> FileSystems.newFileSystem(Paths.get("zipWithEntryBeyond4Gb.zip"))) { >>>> for (Path root : fs.getRootDirectories()) { >>>> Files.walkFileTree(root, new SimpleFileVisitor<>() { >>>> @Override >>>> public FileVisitResult visitFile(Path file, BasicFileAttributes >>>> attrs) throws IOException { >>>> return FileVisitResult.CONTINUE; >>>> } >>>> }); >>>> } >>>> } >>>> >>>> The walkFileTree method is key here as it attempts to harvest >>>> attributes (why we use it is another story -- this saves a lot of time >>>> on large, network-mounted regular directories when you're collecting >>>> file metadata). >>>> >>>> >>>> Dawid >>>> >>>> >>>> On Wed, Oct 21, 2020 at 6:39 PM Lance Andersen >>>> <lance.ander...@oracle.com <mailto:lance.ander...@oracle.com>> wrote: >>>>> >>>>> Hi David, >>>>> >>>>> From a quick look at ZipFileSystem this appears to be an optimization to >>>>> avoid looking at the LOC extended Timestamp Extra field >>>>> >>>>> If a Info-ZIP Extended Timestamp (0x5455)is found then: >>>>> >>>>> If the "zipinfo-time" entry was set to “false” in the Map specified >>>>> when creating the Zip FileSystem, >>>>> >>>>> FileSystems.newFileSystem(zipFile, Map.of("zipinfo-time", "false") >>>>> >>>>> and the data size of the CEN extended Timestamp is 5 (flag + mod time) >>>>> >>>>> The modified time is used from the CEN Extended Timestamp extra field >>>>> >>>>> Otherwise get the modified time, creation time, and access time from the >>>>> LOC Extended Timestamp extra field. >>>>> >>>>> >>>>> ————— >>>>> >>>>> Extended Timestamp Extra Field: >>>>> >>>>> ============================== >>>>> >>>>> The following is the layout of the extended-timestamp extra >>>>> block. >>>>> (Last Revision 19970118) >>>>> >>>>> Local-header version: >>>>> >>>>> Value Size Description >>>>> ----- ---- ----------- >>>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>>> TSize Short total data size for this block >>>>> Flags Byte info bits >>>>> (ModTime) Long time of last modification (UTC/GMT) >>>>> (AcTime) Long time of last access (UTC/GMT) >>>>> (CrTime) Long time of original creation (UTC/GMT) >>>>> >>>>> Central-header version: >>>>> >>>>> Value Size Description >>>>> ----- ---- ----------- >>>>> (time) 0x5455 Short tag for this extra block type ("UT") >>>>> TSize Short total data size for this block >>>>> Flags Byte info bits (refers to local header!) >>>>> (ModTime) Long time of last modification (UTC/GMT) >>>>> >>>>> The central-header extra field contains the modification time >>>>> only, >>>>> or no timestamp at all. TSize is used to flag its presence or >>>>> absence. But note: >>>>> >>>>> If "Flags" indicates that Modtime is present in the local >>>>> header >>>>> field, it MUST be present in the central header field, too! >>>>> This correspondence is required because the modification time >>>>> value may be used to support trans-timezone freshening and >>>>> updating operations with zip archives. >>>>> >>>>> The time values are in standard Unix signed-long format, >>>>> indicating >>>>> the number of seconds since 1 January 1970 00:00:00. The times >>>>> are relative to Coordinated Universal Time (UTC), also sometimes >>>>> referred to as Greenwich Mean Time (GMT). To convert to local >>>>> time, >>>>> the software must know the local timezone offset from UTC/GMT. >>>>> >>>>> The lower three bits of Flags in both headers indicate which >>>>> time- >>>>> stamps are present in the LOCAL extra field: >>>>> >>>>> bit 0 if set, modification time is present >>>>> bit 1 if set, access time is present >>>>> bit 2 if set, creation time is present >>>>> bits 3-7 reserved for additional timestamps; not set >>>>> >>>>> Those times that are present will appear in the order indicated, >>>>> but >>>>> any combination of times may be omitted. (Creation time may be >>>>> present without access time, for example.) TSize should equal >>>>> (1 + 4*(number of set bits in Flags)), as the block is currently >>>>> defined. Other timestamps may be added in the future. >>>>> >>>>> >>>>> -------------- >>>>> >>>>> It's hard to comment on why you received the error that you did but it is >>>>> possible the tool that was used for writing the entry did something >>>>> unexpected. >>>>> >>>>> Out of curiosity, have you tried using ZipFile/ZipEntry to access the >>>>> entry? >>>>> >>>>> >>>>> Best, >>>>> Lance >>>>> >>>>> >>>>> On Oct 21, 2020, at 4:55 AM, Dawid Weiss <dawid.we...@gmail.com >>>>> <mailto:dawid.we...@gmail.com>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> We've encountered a seemingly valid ZIP file (zipinfo -v shows all its >>>>> entries are intact) that causes a runtime exception when scanning its >>>>> contents with ZipFileSystem. The exception indicates an invalid >>>>> signature when parsing EXTID_EXTT. I don't quite understand this >>>>> comment in the code: >>>>> >>>>> case EXTID_EXTT: >>>>> // spec says the Extened timestamp in cen only has mtime >>>>> // need to read the loc to get the extra a/ctime, if flag >>>>> // "zipinfo-time" is not specified to false; >>>>> // there is performance cost (move up to loc and read) to >>>>> // access the loc table foreach entry; >>>>> if (zipfs.noExtt) { >>>>> if (sz == 5) >>>>> mtime = unixToJavaTime(LG(extra, pos + 1)); >>>>> break; >>>>> } >>>>> ... >>>>> >>>>> but this ZIP file has the extra data block of exactly 5 bytes, as >>>>> indicated by zipinfo: >>>>> >>>>> Central directory entry #6: >>>>> --------------------------- >>>>> ... >>>>> file system or operating system of origin: Unix >>>>> version of encoding software: 3.0 >>>>> minimum file system compatibility required: MS-DOS, OS/2 or NT FAT >>>>> minimum software version required to extract: 2.0 >>>>> compression method: deflated >>>>> ... >>>>> extended local header: no >>>>> file last modified on (DOS date/time): 2018 Mar 1 04:56:20 >>>>> file last modified on (UT extra field modtime): 2018 Mar 1 05:56:19 local >>>>> file last modified on (UT extra field modtime): 2018 Mar 1 04:56:19 UTC >>>>> ... >>>>> Unix file attributes (100000 octal): ---------- >>>>> MS-DOS file attributes (01 hex): read-only >>>>> >>>>> The central-directory extra field contains: >>>>> - A subfield with ID 0x5455 (universal time) and 5 data bytes. >>>>> The local extra field has UTC/GMT modification/access times. >>>>> >>>>> The above conditional block checking for length == 5 would have worked >>>>> in ZipFileSystem but it's surrounded by a condition over an >>>>> externally-provided property - zipfs.noExtt is only set to true if: >>>>> >>>>> this.noExtt = "false".equals(env.get("zipinfo-time")); >>>>> >>>>> I can't share this particular ZIP file with you and I don't know how >>>>> it was created but it seems like that "zipinfo-time" flag could be >>>>> omitted if the length of the extra data field is exactly 5? >>>>> >>>>> Dawid >>>>> >>>>> >>>>> >>>>> Best >>>>> Lance >>>>> ------------------ >>>>> >>>>> >>>>> >>>>> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 >>>>> Oracle Java Engineering >>>>> 1 Network Drive >>>>> Burlington, MA 01803 >>>>> lance.ander...@oracle.com <mailto:lance.ander...@oracle.com> >>>>> >>>>> >>>>> >>>>> > > > Best > Lance > ------------------ > > <oracle_sig_logo.gif> > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > Oracle Java Engineering > 1 Network Drive > Burlington, MA 01803 > lance.ander...@oracle.com <mailto:lance.ander...@oracle.com> Best Lance ------------------ Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com