[jira] [Commented] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-17 Thread Jakob Sultan Ericsson (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018043#comment-17018043
 ] 

Jakob Sultan Ericsson commented on COMPRESS-501:


Some of my commented code were left intentional to understand what is actually 
taking time in the code and start a discussion as we have done. :-)

Some other thoughts that I also experienced when I did this is that some parts 
such as parsing the actual date time can be somewhat time consuming. Maybe just 
saving the raw value (dos timestamp) and then later when/if you actually call 
getTime(), parse it to a correct milliseconds timestamp.

If I uncomment below rows, my naive test goes from 2s to about 3.9s.
{code:java}
long ts = ZipLong.getValue(cfhBuf, off);
final long time = ZipUtil.dosToJavaTime(ts);
ze.setTime(time);
{code}

I have also commented out reading zip64 extra information because we don't need 
this in our use case. I believe that this is might be a compatibility issue for 
general usage of commons-compress. But if I'm not mistaken disabling this 
speeds up reading.

> Possibility to introduce a fast Zip open with some caveats
> --
>
> Key: COMPRESS-501
> URL: https://issues.apache.org/jira/browse/COMPRESS-501
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.19
> Environment: OSX 10.14.6 and Linux
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Attachments: zipfile-speed-improvements.diff
>
>
> About a year ago I created an improvement 
> (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things 
> in commons-compress for Zip-files. This helped us quite a lot but we wanted 
> it to be even faster so I optimised away some stuff that I thought was not 
> that important for us.
> I was able to improve opening of a 34GB zip file from ~12s to ~2s.
> Now to my question, do you think it would be possible to introduce some of my 
> fixes (diff included) into master?
> Yes, I know that I shortcut some features for some specific zip files and 
> don't expose everything anymore.
> I haven't really made a good switchable solution for it because we just use 
> our own build locally with this path.
> But with some hints from you I might be able to do it somehow. I'm happy to 
> help and would love to get this speed open into master (it is always 
> cumbersome with custom changes to public libraries). 
> {code:java}
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
>  
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> index 767f615d..d441b12d 100644
> --- 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> +++ 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> @@ -146,6 +146,7 @@
>  private boolean isStreamContiguous = false;
>  private NameSource nameSource = NameSource.NAME;
>  private CommentSource commentSource = CommentSource.COMMENT;
> +private byte[] cdExtraData = null;
>  
>  
>  /**
> @@ -397,6 +398,14 @@ public void setAlignment(int alignment) {
>  this.alignment = alignment;
>  }
>  
> +public void setRawCentralDirectoryExtra(byte[] cdExtraData) {
> +this.cdExtraData = cdExtraData;
> +}
> +
> +public byte[] getRawCentralDirectoryExtra() {
> +return this.cdExtraData;
> +}
> +
>  /**
>   * Replaces all currently attached extra fields with the new array.
>   * @param fields an array of extra fields
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> index 152272b5..bb33b50f 100644
> --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> @@ -691,10 +691,10 @@ protected void finalize() throws Throwable {
>  final HashMap noUTF8Flag =
>  new HashMap<>();
>  
> -positionAtCentralDirectory();
> +ByteBuffer ceDir = positionAtCentralDirectory();
>  
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +ceDir.get(wordBuf);
>  long sig = ZipLong.getValue(wordBuf);
>  
>  if (sig != CFH_SIG && startsWithLocalFileHeader()) {
> @@ -703,9 +703,12 @@ protected void finalize() throws Throwable {
>  }
>  
>  while (sig == CFH_SIG) {
> -readCentralDirectoryEntry(noUTF8Flag);
> +readCentralDirectoryEntry(ceDir, noUTF8Flag);
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +if (ceDir.remaining() == 0) {
> +  

[jira] [Updated] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-10 Thread Jakob Sultan Ericsson (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Sultan Ericsson updated COMPRESS-501:
---
Description: 
About a year ago I created an improvement 
(https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things in 
commons-compress for Zip-files. This helped us quite a lot but we wanted it to 
be even faster so I optimised away some stuff that I thought was not that 
important for us.
I was able to improve opening of a 34GB zip file from ~12s to ~2s.

Now to my question, do you think it would be possible to introduce some of my 
fixes (diff included) into master?

Yes, I know that I shortcut some features for some specific zip files and don't 
expose everything anymore.
I haven't really made a good switchable solution for it because we just use our 
own build locally with this path.

But with some hints from you I might be able to do it somehow. I'm happy to 
help and would love to get this speed open into master (it is always cumbersome 
with custom changes to public libraries). 

{code:java}
diff --git 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
index 767f615d..d441b12d 100644
--- 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
+++ 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
@@ -146,6 +146,7 @@
 private boolean isStreamContiguous = false;
 private NameSource nameSource = NameSource.NAME;
 private CommentSource commentSource = CommentSource.COMMENT;
+private byte[] cdExtraData = null;
 
 
 /**
@@ -397,6 +398,14 @@ public void setAlignment(int alignment) {
 this.alignment = alignment;
 }
 
+public void setRawCentralDirectoryExtra(byte[] cdExtraData) {
+this.cdExtraData = cdExtraData;
+}
+
+public byte[] getRawCentralDirectoryExtra() {
+return this.cdExtraData;
+}
+
 /**
  * Replaces all currently attached extra fields with the new array.
  * @param fields an array of extra fields
diff --git 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
index 152272b5..bb33b50f 100644
--- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
+++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
@@ -691,10 +691,10 @@ protected void finalize() throws Throwable {
 final HashMap noUTF8Flag =
 new HashMap<>();
 
-positionAtCentralDirectory();
+ByteBuffer ceDir = positionAtCentralDirectory();
 
 wordBbuf.rewind();
-IOUtils.readFully(archive, wordBbuf);
+ceDir.get(wordBuf);
 long sig = ZipLong.getValue(wordBuf);
 
 if (sig != CFH_SIG && startsWithLocalFileHeader()) {
@@ -703,9 +703,12 @@ protected void finalize() throws Throwable {
 }
 
 while (sig == CFH_SIG) {
-readCentralDirectoryEntry(noUTF8Flag);
+readCentralDirectoryEntry(ceDir, noUTF8Flag);
 wordBbuf.rewind();
-IOUtils.readFully(archive, wordBbuf);
+if (ceDir.remaining() == 0) {
+break;
+}
+ceDir.get(wordBuf);
 sig = ZipLong.getValue(wordBuf);
 }
 return noUTF8Flag;
@@ -721,10 +724,10 @@ protected void finalize() throws Throwable {
  * added to this map.
  */
 private void
-readCentralDirectoryEntry(final Map 
noUTF8Flag)
+readCentralDirectoryEntry(ByteBuffer ceDir, final Map noUTF8Flag)
 throws IOException {
 cfhBbuf.rewind();
-IOUtils.readFully(archive, cfhBbuf);
+ceDir.get(cfhBuf);
 int off = 0;
 final Entry ze = new Entry();
 
@@ -752,8 +755,9 @@ protected void finalize() throws Throwable {
 ze.setMethod(ZipShort.getValue(cfhBuf, off));
 off += SHORT;
 
-final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, off));
-ze.setTime(time);
+//long ts = ZipLong.getValue(cfhBuf, off);
+//final long time = ZipUtil.dosToJavaTime(ts);
+//ze.setTime(time);
 off += WORD;
 
 ze.setCrc(ZipLong.getValue(cfhBuf, off));
@@ -784,7 +788,7 @@ protected void finalize() throws Throwable {
 off += WORD;
 
 final byte[] fileName = new byte[fileNameLen];
-IOUtils.readFully(archive, ByteBuffer.wrap(fileName));
+ceDir.get(fileName);
 ze.setName(entryEncoding.decode(fileName), fileName);
 
 // LFH offset,
@@ -792,19 +796,22 @@ protected void finalize() throws Throwable {
 // data offset will be filled later
 entries.add(ze);
 
+//ceDir.position(ceDir.position() + extraLen + commentLen);
 final byte[] cdExtraData = 

[jira] [Updated] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-10 Thread Jakob Sultan Ericsson (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Sultan Ericsson updated COMPRESS-501:
---
Attachment: zipfile-speed-improvements.diff

> Possibility to introduce a fast Zip open with some caveats
> --
>
> Key: COMPRESS-501
> URL: https://issues.apache.org/jira/browse/COMPRESS-501
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.19
> Environment: OSX 10.14.6 and Linux
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Attachments: zipfile-speed-improvements.diff
>
>
> About a year ago I created an improvement 
> (https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things 
> in commons-compress for Zip-files. This helped us quite a lot but we wanted 
> it to be even faster so I optimised away some stuff that I thought was not 
> that important for us.
> I was able to improve opening of a 34GB zip file from ~12s to ~2s.
> Now to my question, do you think it would be possible to introduce some of my 
> fixes (diff included) into master?
> Yes, I know that I shortcut some features for some specific zip files and 
> don't expose everything anymore.
> I haven't really made a good switchable solution for it because we just use 
> our own build locally with this path.
> But with some hints from you I might be able to do it somehow. I'm happy to 
> help and would love to get this speed open into master (it is always 
> cumbersome with custom changes to public libraries). 
> {code:java}
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
>  
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> index 767f615d..d441b12d 100644
> --- 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> +++ 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
> @@ -146,6 +146,7 @@
>  private boolean isStreamContiguous = false;
>  private NameSource nameSource = NameSource.NAME;
>  private CommentSource commentSource = CommentSource.COMMENT;
> +private byte[] cdExtraData = null;
>  
>  
>  /**
> @@ -397,6 +398,14 @@ public void setAlignment(int alignment) {
>  this.alignment = alignment;
>  }
>  
> +public void setRawCentralDirectoryExtra(byte[] cdExtraData) {
> +this.cdExtraData = cdExtraData;
> +}
> +
> +public byte[] getRawCentralDirectoryExtra() {
> +return this.cdExtraData;
> +}
> +
>  /**
>   * Replaces all currently attached extra fields with the new array.
>   * @param fields an array of extra fields
> diff --git 
> a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java 
> b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> index 152272b5..bb33b50f 100644
> --- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> +++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
> @@ -691,10 +691,10 @@ protected void finalize() throws Throwable {
>  final HashMap noUTF8Flag =
>  new HashMap<>();
>  
> -positionAtCentralDirectory();
> +ByteBuffer ceDir = positionAtCentralDirectory();
>  
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +ceDir.get(wordBuf);
>  long sig = ZipLong.getValue(wordBuf);
>  
>  if (sig != CFH_SIG && startsWithLocalFileHeader()) {
> @@ -703,9 +703,12 @@ protected void finalize() throws Throwable {
>  }
>  
>  while (sig == CFH_SIG) {
> -readCentralDirectoryEntry(noUTF8Flag);
> +readCentralDirectoryEntry(ceDir, noUTF8Flag);
>  wordBbuf.rewind();
> -IOUtils.readFully(archive, wordBbuf);
> +if (ceDir.remaining() == 0) {
> +break;
> +}
> +ceDir.get(wordBuf);
>  sig = ZipLong.getValue(wordBuf);
>  }
>  return noUTF8Flag;
> @@ -721,10 +724,10 @@ protected void finalize() throws Throwable {
>   * added to this map.
>   */
>  private void
> -readCentralDirectoryEntry(final Map 
> noUTF8Flag)
> +readCentralDirectoryEntry(ByteBuffer ceDir, final 
> Map noUTF8Flag)
>  throws IOException {
>  cfhBbuf.rewind();
> -IOUtils.readFully(archive, cfhBbuf);
> +ceDir.get(cfhBuf);
>  int off = 0;
>  final Entry ze = new Entry();
>  
> @@ -752,8 +755,9 @@ protected void finalize() throws Throwable {
>  ze.setMethod(ZipShort.getValue(cfhBuf, off));
>  off += SHORT;
>  
> -final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, 
> off));
> -ze.setTime(time);
> 

[jira] [Created] (COMPRESS-501) Possibility to introduce a fast Zip open with some caveats

2020-01-10 Thread Jakob Sultan Ericsson (Jira)
Jakob Sultan Ericsson created COMPRESS-501:
--

 Summary: Possibility to introduce a fast Zip open with some caveats
 Key: COMPRESS-501
 URL: https://issues.apache.org/jira/browse/COMPRESS-501
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.19
 Environment: OSX 10.14.6 and Linux
Reporter: Jakob Sultan Ericsson
 Attachments: zipfile-speed-improvements.diff

About a year ago I created an improvement 
(https://issues.apache.org/jira/browse/COMPRESS-466) to speed up some things in 
commons-compress for Zip-files. This helped us quite a lot but we wanted it to 
be even faster so I optimised away some stuff that I thought was not that 
important for us.
I was able to improve opening of a 34GB zip file from ~12s to ~2s.

Now to my question, do you think it would be possible to introduce some of my 
fixes (diff included) into master?

Yes, I know that I shortcut some features for some specific zip files and don't 
expose everything anymore.

I haven't really made a good switchable solution for it because we just use our 
own build locally with this path.

But with some hints from you I might be able to do it somehow. I'm happy to 
help and would love to get this speed open into master (it is always cumbersome 
with custom changes to public libraries). 

{code:java}
diff --git 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
index 767f615d..d441b12d 100644
--- 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
+++ 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveEntry.java
@@ -146,6 +146,7 @@
 private boolean isStreamContiguous = false;
 private NameSource nameSource = NameSource.NAME;
 private CommentSource commentSource = CommentSource.COMMENT;
+private byte[] cdExtraData = null;
 
 
 /**
@@ -397,6 +398,14 @@ public void setAlignment(int alignment) {
 this.alignment = alignment;
 }
 
+public void setRawCentralDirectoryExtra(byte[] cdExtraData) {
+this.cdExtraData = cdExtraData;
+}
+
+public byte[] getRawCentralDirectoryExtra() {
+return this.cdExtraData;
+}
+
 /**
  * Replaces all currently attached extra fields with the new array.
  * @param fields an array of extra fields
diff --git 
a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java 
b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
index 152272b5..bb33b50f 100644
--- a/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
+++ b/src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java
@@ -691,10 +691,10 @@ protected void finalize() throws Throwable {
 final HashMap noUTF8Flag =
 new HashMap<>();
 
-positionAtCentralDirectory();
+ByteBuffer ceDir = positionAtCentralDirectory();
 
 wordBbuf.rewind();
-IOUtils.readFully(archive, wordBbuf);
+ceDir.get(wordBuf);
 long sig = ZipLong.getValue(wordBuf);
 
 if (sig != CFH_SIG && startsWithLocalFileHeader()) {
@@ -703,9 +703,12 @@ protected void finalize() throws Throwable {
 }
 
 while (sig == CFH_SIG) {
-readCentralDirectoryEntry(noUTF8Flag);
+readCentralDirectoryEntry(ceDir, noUTF8Flag);
 wordBbuf.rewind();
-IOUtils.readFully(archive, wordBbuf);
+if (ceDir.remaining() == 0) {
+break;
+}
+ceDir.get(wordBuf);
 sig = ZipLong.getValue(wordBuf);
 }
 return noUTF8Flag;
@@ -721,10 +724,10 @@ protected void finalize() throws Throwable {
  * added to this map.
  */
 private void
-readCentralDirectoryEntry(final Map 
noUTF8Flag)
+readCentralDirectoryEntry(ByteBuffer ceDir, final Map noUTF8Flag)
 throws IOException {
 cfhBbuf.rewind();
-IOUtils.readFully(archive, cfhBbuf);
+ceDir.get(cfhBuf);
 int off = 0;
 final Entry ze = new Entry();
 
@@ -752,8 +755,9 @@ protected void finalize() throws Throwable {
 ze.setMethod(ZipShort.getValue(cfhBuf, off));
 off += SHORT;
 
-final long time = ZipUtil.dosToJavaTime(ZipLong.getValue(cfhBuf, off));
-ze.setTime(time);
+//long ts = ZipLong.getValue(cfhBuf, off);
+//final long time = ZipUtil.dosToJavaTime(ts);
+//ze.setTime(time);
 off += WORD;
 
 ze.setCrc(ZipLong.getValue(cfhBuf, off));
@@ -784,7 +788,7 @@ protected void finalize() throws Throwable {
 off += WORD;
 
 final byte[] fileName = new byte[fileNameLen];
-IOUtils.readFully(archive, ByteBuffer.wrap(fileName));
+ceDir.get(fileName);
 

[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-08 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641482#comment-16641482
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


One thing though? Why does {{getRawInputStream()}} return null in this case?
 Isn't basically same as {{getInputStream()}}

On thing that might not be totally related to this, why is 
{{ZipArchiveEntry.getLocalHeaderOffset()}} protected?
 We might have problems with taking the X seconds (18 in my test) penalty for 
opening the file and reading it every time. If {{getLocalHeaderOffset}} is 
public I can basically find out where the data starts and decompress it myself.

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Fix For: 1.19
>
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641193#comment-16641193
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


Looks good and is working fine. 
Thanks. You did a better refactor than I dared to do. :-)


> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641124#comment-16641124
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


Yes i realized that. :-) I have made working patch last week but I forgot to 
update this on my fork. I can publish it later tonight. 
I also tested to read everything in one go and parse from memory it was a bit 
faster but not as much as I thought. 

18s with only read from central directory and about 12s from memory. 

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-01 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633774#comment-16633774
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


I made a change to support only reading central directory. 

I haven't added any support for multiple entries with the same name or any 
unicode support in comments.

https://github.com/jakeri/commons-compress/tree/COMPRESS-466

My 35gb.zip went to 5-6 minutes to 17-18 seconds. The time is now spent in 
building central directory information.

Pure speculation but maybe this time could be decreased even more if you read 
the central directory to memory once (sacrifice memory for speed) and then 
build the directory information by reading from a large ByteBuffer.

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-09-26 Thread Jakob Sultan Ericsson (JIRA)
Jakob Sultan Ericsson created COMPRESS-466:
--

 Summary: Opening of a very large zip file is extremely slow 
compared to java.util.zip.ZipFile
 Key: COMPRESS-466
 URL: https://issues.apache.org/jira/browse/COMPRESS-466
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.18
 Environment: Tested both on Linux and OSX 10.13.6.
Reporter: Jakob Sultan Ericsson


We have a quite large zip file 35 gb and try to open this with ZipFile. 

{code:java}
try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
System.out.println("File opened..." + (System.currentTimeMillis() - 
start));
}
{code}

This code takes about 300 000 - 400 000 ms (5-6 minutes).
If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
(less than a second). 

I'm not totally sure what it is the problem but I did some debugging and 
basically all time is spent in
{code:java}
private void resolveLocalFileHeaderData(final Map entriesWithoutUTF8Flag)
{code}

Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)