[jira] [Commented] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
[ https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018043#comment-17018043 ] Jakob Sultan Ericsson commented on COMPRESS-501: Some of my commented code were left intentional to understand what is actually taking time in the code and start a discussion as we have done. :-) Some other thoughts that I also experienced when I did this is that some parts such as parsing the actual date time can be somewhat time consuming. Maybe just saving the raw value (dos timestamp) and then later when/if you actually call getTime(), parse it to a correct milliseconds timestamp. If I uncomment below rows, my naive test goes from 2s to about 3.9s. {code:java} long ts = ZipLong.getValue(cfhBuf, off); final long time = ZipUtil.dosToJavaTime(ts); ze.setTime(time); {code} I have also commented out reading zip64 extra information because we don't need this in our use case. I believe that this is might be a compatibility issue for general usage of commons-compress. But if I'm not mistaken disabling this speeds up reading. > Possibility to introduce a fast Zip open with some caveats > -- > > Key: COMPRESS-501 > URL: https://issues.apache.org/jira/browse/COMPRESS-501 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.19 > Environment: OSX 10.14.6 and Linux >Reporter: Jakob Sultan Ericsson >Priority: Major > Attachments: zipfile-speed-improvements.diff > > > About a year ago I created an improvement > (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things > in commons-compress for Zip-files. This helped us quite a lot but we wanted > it to be even faster so I optimised away some stuff that I thought was not > that important for us. > I was able to improve opening of a 34GB zip file from ~12s to ~2s. > Now to my question, do you think it would be possible to introduce some of my > fixes (diff included) into master? > Yes, I know that I shortcut some features for some specific zip files and > don't expose everything anymore. > I haven't really made a good switchable solution for it because we just use > our own build locally with this path. > But with some hints from you I might be able to do it somehow. I'm happy to > help and would love to get this speed open into master (it is always > cumbersome with custom changes to public libraries). > {code:java} > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > index 767f615d..d441b12d 100644 > --- > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > +++ > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > @@ -146,6 +146,7 @@ > private boolean isStreamContiguous = false; > private NameSource nameSource = NameSource.NAME; > private CommentSource commentSource = CommentSource.COMMENT; > +private byte[] cdExtraData = null; > > > /** > @@ -397,6 +398,14 @@ public void setAlignment(int alignment) { > this.alignment = alignment; > } > > +public void setRawCentralDirectoryExtra(byte[] cdExtraData) { > +this.cdExtraData = cdExtraData; > +} > + > +public byte[] getRawCentralDirectoryExtra() { > +return this.cdExtraData; > +} > + > /** > * Replaces all currently attached extra fields with the new array. > * @param fields an array of extra fields > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > index 152272b5..bb33b50f 100644 > --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > @@ -691,10 +691,10 @@ protected void finalize() throws Throwable { > final HashMap noUTF8Flag = > new HashMap<>(); > > -positionAtCentralDirectory(); > +ByteBuffer ceDir = positionAtCentralDirectory(); > > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +ceDir.get(wordBuf); > long sig = ZipLong.getValue(wordBuf); > > if (sig != CFH_SIG && startsWithLocalFileHeader()) { > @@ -703,9 +703,12 @@ protected void finalize() throws Throwable { > } > > while (sig == CFH_SIG) { > -readCentralDirectoryEntry(noUTF8Flag); > +readCentralDirectoryEntry(ceDir, noUTF8Flag); > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +if (ceDir.remaining() == 0) { > +
[jira] [Updated] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
[ https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Sultan Ericsson updated COMPRESS-501: --- Description: About a year ago I created an improvement (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things in commons-compress for Zip-files. This helped us quite a lot but we wanted it to be even faster so I optimised away some stuff that I thought was not that important for us. I was able to improve opening of a 34GB zip file from ~12s to ~2s. Now to my question, do you think it would be possible to introduce some of my fixes (diff included) into master? Yes, I know that I shortcut some features for some specific zip files and don't expose everything anymore. I haven't really made a good switchable solution for it because we just use our own build locally with this path. But with some hints from you I might be able to do it somehow. I'm happy to help and would love to get this speed open into master (it is always cumbersome with custom changes to public libraries). {code:java} diff --git a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java index 767f615d..d441b12d 100644 --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java @@ -146,6 +146,7 @@ private boolean isStreamContiguous = false; private NameSource nameSource = NameSource.NAME; private CommentSource commentSource = CommentSource.COMMENT; +private byte[] cdExtraData = null; /** @@ -397,6 +398,14 @@ public void setAlignment(int alignment) { this.alignment = alignment; } +public void setRawCentralDirectoryExtra(byte[] cdExtraData) { +this.cdExtraData = cdExtraData; +} + +public byte[] getRawCentralDirectoryExtra() { +return this.cdExtraData; +} + /** * Replaces all currently attached extra fields with the new array. * @param fields an array of extra fields diff --git a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java index 152272b5..bb33b50f 100644 --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java @@ -691,10 +691,10 @@ protected void finalize() throws Throwable { final HashMap noUTF8Flag = new HashMap<>(); -positionAtCentralDirectory(); +ByteBuffer ceDir = positionAtCentralDirectory(); wordBbuf.rewind(); -IOUtils.readFully(archive, wordBbuf); +ceDir.get(wordBuf); long sig = ZipLong.getValue(wordBuf); if (sig != CFH_SIG && startsWithLocalFileHeader()) { @@ -703,9 +703,12 @@ protected void finalize() throws Throwable { } while (sig == CFH_SIG) { -readCentralDirectoryEntry(noUTF8Flag); +readCentralDirectoryEntry(ceDir, noUTF8Flag); wordBbuf.rewind(); -IOUtils.readFully(archive, wordBbuf); +if (ceDir.remaining() == 0) { +break; +} +ceDir.get(wordBuf); sig = ZipLong.getValue(wordBuf); } return noUTF8Flag; @@ -721,10 +724,10 @@ protected void finalize() throws Throwable { * added to this map. */ private void -readCentralDirectoryEntry(final Map noUTF8Flag) +readCentralDirectoryEntry(ByteBuffer ceDir, final Map noUTF8Flag) throws IOException { cfhBbuf.rewind(); -IOUtils.readFully(archive, cfhBbuf); +ceDir.get(cfhBuf); int off = 0; final Entry ze = new Entry(); @@ -752,8 +755,9 @@ protected void finalize() throws Throwable { ze.setMethod(ZipShort.getValue(cfhBuf, off)); off += SHORT; -final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, off)); -ze.setTime(time); +//long ts = ZipLong.getValue(cfhBuf, off); +//final long time = ZipUtil.dosToJavaTime(ts); +//ze.setTime(time); off += WORD; ze.setCrc(ZipLong.getValue(cfhBuf, off)); @@ -784,7 +788,7 @@ protected void finalize() throws Throwable { off += WORD; final byte[] fileName = new byte[fileNameLen]; -IOUtils.readFully(archive, ByteBuffer.wrap(fileName)); +ceDir.get(fileName); ze.setName(entryEncoding.decode(fileName), fileName); // LFH offset, @@ -792,19 +796,22 @@ protected void finalize() throws Throwable { // data offset will be filled later entries.add(ze); +//ceDir.position(ceDir.position() + extraLen + commentLen); final byte[] cdExtraData =
[jira] [Updated] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
[ https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Sultan Ericsson updated COMPRESS-501: --- Attachment: zipfile-speed-improvements.diff > Possibility to introduce a fast Zip open with some caveats > -- > > Key: COMPRESS-501 > URL: https://issues.apache.org/jira/browse/COMPRESS-501 > Project: Commons Compress > Issue Type: Improvement > Components: Archivers >Affects Versions: 1.19 > Environment: OSX 10.14.6 and Linux >Reporter: Jakob Sultan Ericsson >Priority: Major > Attachments: zipfile-speed-improvements.diff > > > About a year ago I created an improvement > (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things > in commons-compress for Zip-files. This helped us quite a lot but we wanted > it to be even faster so I optimised away some stuff that I thought was not > that important for us. > I was able to improve opening of a 34GB zip file from ~12s to ~2s. > Now to my question, do you think it would be possible to introduce some of my > fixes (diff included) into master? > Yes, I know that I shortcut some features for some specific zip files and > don't expose everything anymore. > I haven't really made a good switchable solution for it because we just use > our own build locally with this path. > But with some hints from you I might be able to do it somehow. I'm happy to > help and would love to get this speed open into master (it is always > cumbersome with custom changes to public libraries). > {code:java} > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > index 767f615d..d441b12d 100644 > --- > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > +++ > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java > @@ -146,6 +146,7 @@ > private boolean isStreamContiguous = false; > private NameSource nameSource = NameSource.NAME; > private CommentSource commentSource = CommentSource.COMMENT; > +private byte[] cdExtraData = null; > > > /** > @@ -397,6 +398,14 @@ public void setAlignment(int alignment) { > this.alignment = alignment; > } > > +public void setRawCentralDirectoryExtra(byte[] cdExtraData) { > +this.cdExtraData = cdExtraData; > +} > + > +public byte[] getRawCentralDirectoryExtra() { > +return this.cdExtraData; > +} > + > /** > * Replaces all currently attached extra fields with the new array. > * @param fields an array of extra fields > diff --git > a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > index 152272b5..bb33b50f 100644 > --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java > @@ -691,10 +691,10 @@ protected void finalize() throws Throwable { > final HashMap noUTF8Flag = > new HashMap<>(); > > -positionAtCentralDirectory(); > +ByteBuffer ceDir = positionAtCentralDirectory(); > > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +ceDir.get(wordBuf); > long sig = ZipLong.getValue(wordBuf); > > if (sig != CFH_SIG && startsWithLocalFileHeader()) { > @@ -703,9 +703,12 @@ protected void finalize() throws Throwable { > } > > while (sig == CFH_SIG) { > -readCentralDirectoryEntry(noUTF8Flag); > +readCentralDirectoryEntry(ceDir, noUTF8Flag); > wordBbuf.rewind(); > -IOUtils.readFully(archive, wordBbuf); > +if (ceDir.remaining() == 0) { > +break; > +} > +ceDir.get(wordBuf); > sig = ZipLong.getValue(wordBuf); > } > return noUTF8Flag; > @@ -721,10 +724,10 @@ protected void finalize() throws Throwable { > * added to this map. > */ > private void > -readCentralDirectoryEntry(final Map > noUTF8Flag) > +readCentralDirectoryEntry(ByteBuffer ceDir, final > Map noUTF8Flag) > throws IOException { > cfhBbuf.rewind(); > -IOUtils.readFully(archive, cfhBbuf); > +ceDir.get(cfhBuf); > int off = 0; > final Entry ze = new Entry(); > > @@ -752,8 +755,9 @@ protected void finalize() throws Throwable { > ze.setMethod(ZipShort.getValue(cfhBuf, off)); > off += SHORT; > > -final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, > off)); > -ze.setTime(time); >
[jira] [Created] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats
Jakob Sultan Ericsson created COMPRESS-501: -- Summary: Possibility to introduce a fast Zip open with some caveats Key: COMPRESS-501 URL: https://issues.apache.org/jira/browse/COMPRESS-501 Project: Commons Compress Issue Type: Improvement Components: Archivers Affects Versions: 1.19 Environment: OSX 10.14.6 and Linux Reporter: Jakob Sultan Ericsson Attachments: zipfile-speed-improvements.diff About a year ago I created an improvement (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things in commons-compress for Zip-files. This helped us quite a lot but we wanted it to be even faster so I optimised away some stuff that I thought was not that important for us. I was able to improve opening of a 34GB zip file from ~12s to ~2s. Now to my question, do you think it would be possible to introduce some of my fixes (diff included) into master? Yes, I know that I shortcut some features for some specific zip files and don't expose everything anymore. I haven't really made a good switchable solution for it because we just use our own build locally with this path. But with some hints from you I might be able to do it somehow. I'm happy to help and would love to get this speed open into master (it is always cumbersome with custom changes to public libraries). {code:java} diff --git a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java index 767f615d..d441b12d 100644 --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java @@ -146,6 +146,7 @@ private boolean isStreamContiguous = false; private NameSource nameSource = NameSource.NAME; private CommentSource commentSource = CommentSource.COMMENT; +private byte[] cdExtraData = null; /** @@ -397,6 +398,14 @@ public void setAlignment(int alignment) { this.alignment = alignment; } +public void setRawCentralDirectoryExtra(byte[] cdExtraData) { +this.cdExtraData = cdExtraData; +} + +public byte[] getRawCentralDirectoryExtra() { +return this.cdExtraData; +} + /** * Replaces all currently attached extra fields with the new array. * @param fields an array of extra fields diff --git a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java index 152272b5..bb33b50f 100644 --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java @@ -691,10 +691,10 @@ protected void finalize() throws Throwable { final HashMap noUTF8Flag = new HashMap<>(); -positionAtCentralDirectory(); +ByteBuffer ceDir = positionAtCentralDirectory(); wordBbuf.rewind(); -IOUtils.readFully(archive, wordBbuf); +ceDir.get(wordBuf); long sig = ZipLong.getValue(wordBuf); if (sig != CFH_SIG && startsWithLocalFileHeader()) { @@ -703,9 +703,12 @@ protected void finalize() throws Throwable { } while (sig == CFH_SIG) { -readCentralDirectoryEntry(noUTF8Flag); +readCentralDirectoryEntry(ceDir, noUTF8Flag); wordBbuf.rewind(); -IOUtils.readFully(archive, wordBbuf); +if (ceDir.remaining() == 0) { +break; +} +ceDir.get(wordBuf); sig = ZipLong.getValue(wordBuf); } return noUTF8Flag; @@ -721,10 +724,10 @@ protected void finalize() throws Throwable { * added to this map. */ private void -readCentralDirectoryEntry(final Map noUTF8Flag) +readCentralDirectoryEntry(ByteBuffer ceDir, final Map noUTF8Flag) throws IOException { cfhBbuf.rewind(); -IOUtils.readFully(archive, cfhBbuf); +ceDir.get(cfhBuf); int off = 0; final Entry ze = new Entry(); @@ -752,8 +755,9 @@ protected void finalize() throws Throwable { ze.setMethod(ZipShort.getValue(cfhBuf, off)); off += SHORT; -final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, off)); -ze.setTime(time); +//long ts = ZipLong.getValue(cfhBuf, off); +//final long time = ZipUtil.dosToJavaTime(ts); +//ze.setTime(time); off += WORD; ze.setCrc(ZipLong.getValue(cfhBuf, off)); @@ -784,7 +788,7 @@ protected void finalize() throws Throwable { off += WORD; final byte[] fileName = new byte[fileNameLen]; -IOUtils.readFully(archive, ByteBuffer.wrap(fileName)); +ceDir.get(fileName);
[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641482#comment-16641482 ] Jakob Sultan Ericsson commented on COMPRESS-466: One thing though? Why does {{getRawInputStream()}} return null in this case? Isn't basically same as {{getInputStream()}} On thing that might not be totally related to this, why is {{ZipArchiveEntry.getLocalHeaderOffset()}} protected? We might have problems with taking the X seconds (18 in my test) penalty for opening the file and reading it every time. If {{getLocalHeaderOffset}} is public I can basically find out where the data starts and decompress it myself. > Opening of a very large zip file is extremely slow compared to > java.util.zip.ZipFile > > > Key: COMPRESS-466 > URL: https://issues.apache.org/jira/browse/COMPRESS-466 > Project: Commons Compress > Issue Type: Improvement > Components: Compressors >Affects Versions: 1.18 > Environment: Tested both on Linux and OSX 10.13.6. >Reporter: Jakob Sultan Ericsson >Priority: Major > Fix For: 1.19 > > > We have a quite large zip file 35 gb and try to open this with ZipFile. > {code:java} > try (ZipFile zf = new ZipFile(new File("35gb.zip"))) { > System.out.println("File opened..." + (System.currentTimeMillis() > - start)); > } > {code} > This code takes about 300 000 - 400 000 ms (5-6 minutes). > If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms > (less than a second). > I'm not totally sure what it is the problem but I did some debugging and > basically all time is spent in > {code:java} > private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag) > {code} > Anything that can be done to improve this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641193#comment-16641193 ] Jakob Sultan Ericsson commented on COMPRESS-466: Looks good and is working fine. Thanks. You did a better refactor than I dared to do. :-) > Opening of a very large zip file is extremely slow compared to > java.util.zip.ZipFile > > > Key: COMPRESS-466 > URL: https://issues.apache.org/jira/browse/COMPRESS-466 > Project: Commons Compress > Issue Type: Bug > Components: Compressors >Affects Versions: 1.18 > Environment: Tested both on Linux and OSX 10.13.6. >Reporter: Jakob Sultan Ericsson >Priority: Major > > We have a quite large zip file 35 gb and try to open this with ZipFile. > {code:java} > try (ZipFile zf = new ZipFile(new File("35gb.zip"))) { > System.out.println("File opened..." + (System.currentTimeMillis() > - start)); > } > {code} > This code takes about 300 000 - 400 000 ms (5-6 minutes). > If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms > (less than a second). > I'm not totally sure what it is the problem but I did some debugging and > basically all time is spent in > {code:java} > private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag) > {code} > Anything that can be done to improve this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641124#comment-16641124 ] Jakob Sultan Ericsson commented on COMPRESS-466: Yes i realized that. :-) I have made working patch last week but I forgot to update this on my fork. I can publish it later tonight. I also tested to read everything in one go and parse from memory it was a bit faster but not as much as I thought. 18s with only read from central directory and about 12s from memory. > Opening of a very large zip file is extremely slow compared to > java.util.zip.ZipFile > > > Key: COMPRESS-466 > URL: https://issues.apache.org/jira/browse/COMPRESS-466 > Project: Commons Compress > Issue Type: Bug > Components: Compressors >Affects Versions: 1.18 > Environment: Tested both on Linux and OSX 10.13.6. >Reporter: Jakob Sultan Ericsson >Priority: Major > > We have a quite large zip file 35 gb and try to open this with ZipFile. > {code:java} > try (ZipFile zf = new ZipFile(new File("35gb.zip"))) { > System.out.println("File opened..." + (System.currentTimeMillis() > - start)); > } > {code} > This code takes about 300 000 - 400 000 ms (5-6 minutes). > If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms > (less than a second). > I'm not totally sure what it is the problem but I did some debugging and > basically all time is spent in > {code:java} > private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag) > {code} > Anything that can be done to improve this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633774#comment-16633774 ] Jakob Sultan Ericsson commented on COMPRESS-466: I made a change to support only reading central directory. I haven't added any support for multiple entries with the same name or any unicode support in comments. https://github.com/jakeri/commons-compress/tree/COMPRESS-466 My 35gb.zip went to 5-6 minutes to 17-18 seconds. The time is now spent in building central directory information. Pure speculation but maybe this time could be decreased even more if you read the central directory to memory once (sacrifice memory for speed) and then build the directory information by reading from a large ByteBuffer. > Opening of a very large zip file is extremely slow compared to > java.util.zip.ZipFile > > > Key: COMPRESS-466 > URL: https://issues.apache.org/jira/browse/COMPRESS-466 > Project: Commons Compress > Issue Type: Bug > Components: Compressors >Affects Versions: 1.18 > Environment: Tested both on Linux and OSX 10.13.6. >Reporter: Jakob Sultan Ericsson >Priority: Major > > We have a quite large zip file 35 gb and try to open this with ZipFile. > {code:java} > try (ZipFile zf = new ZipFile(new File("35gb.zip"))) { > System.out.println("File opened..." + (System.currentTimeMillis() > - start)); > } > {code} > This code takes about 300 000 - 400 000 ms (5-6 minutes). > If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms > (less than a second). > I'm not totally sure what it is the problem but I did some debugging and > basically all time is spent in > {code:java} > private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag) > {code} > Anything that can be done to improve this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
Jakob Sultan Ericsson created COMPRESS-466: -- Summary: Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile Key: COMPRESS-466 URL: https://issues.apache.org/jira/browse/COMPRESS-466 Project: Commons Compress Issue Type: Bug Components: Compressors Affects Versions: 1.18 Environment: Tested both on Linux and OSX 10.13.6. Reporter: Jakob Sultan Ericsson We have a quite large zip file 35 gb and try to open this with ZipFile. {code:java} try (ZipFile zf = new ZipFile(new File("35gb.zip"))) { System.out.println("File opened..." + (System.currentTimeMillis() - start)); } {code} This code takes about 300 000 - 400 000 ms (5-6 minutes). If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms (less than a second). I'm not totally sure what it is the problem but I did some debugging and basically all time is spent in {code:java} private void resolveLocalFileHeaderData(final Map entriesWithoutUTF8Flag) {code} Anything that can be done to improve this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)