[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-09-27 Thread Stefan Bodewig (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630569#comment-16630569
 ] 

Stefan Bodewig commented on COMPRESS-466:
-

Commons Compress parses the extra fields of local file headers in addition to 
the extra fields of the central data section - which the java.util version does 
not.

The less technical description is that java.util.ZipFile may be missing 
important data for the entries that the Commons Compress version provides. In 
many if not most cases there will be no difference, though.

Right now there is no way around it, but it would certainly be possible to add 
a flag to ZipFile's constructor that says "I know that parsing the central data 
section is enough" and skip this step.

There is at least one thing I'm aware of that won't work if we skip reading the 
local file header: reading entry names or comments from unicode extra fields. 
See http://commons.apache.org/proper/commons-compress/zip.html#Encoding

The resolveLocalFileHeaderData method does a few additional things that would 
need to be handled in a different way if it was skipped (making sure we know 
all entries that share the same name and ensuring we find the proper start of 
the data stream).

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-01 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633774#comment-16633774
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


I made a change to support only reading central directory. 

I haven't added any support for multiple entries with the same name or any 
unicode support in comments.

https://github.com/jakeri/commons-compress/tree/COMPRESS-466

My 35gb.zip went to 5-6 minutes to 17-18 seconds. The time is now spent in 
building central directory information.

Pure speculation but maybe this time could be decreased even more if you read 
the central directory to memory once (sacrifice memory for speed) and then 
build the directory information by reading from a large ByteBuffer.

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Stefan Bodewig (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641106#comment-16641106
 ] 

Stefan Bodewig commented on COMPRESS-466:
-

I'm afraid there is more to it. In your version {{ZipFile.getEntry}} is going 
to always return {{null}} as {{nameMap}} hasn't been populated and we need a 
few more adjustments. I'm using your patch as a starting point for adding the 
ability to skip parsing of the local directory entries when you know you don't 
need the extra field data of the local file header.

We do know the total size of the central directory as it is stored inside the 
"End of central directory record" or "Zip64 end of central directory record". 
Things might get faster if we read things in one go, but we'd probably want to 
measure whether the difference is actually significant (and it would be a 
different issue :) )

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641124#comment-16641124
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


Yes i realized that. :-) I have made working patch last week but I forgot to 
update this on my fork. I can publish it later tonight. 
I also tested to read everything in one go and parse from memory it was a bit 
faster but not as much as I thought. 

18s with only read from central directory and about 12s from memory. 

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Stefan Bodewig (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641163#comment-16641163
 ] 

Stefan Bodewig commented on COMPRESS-466:
-

I've just committed my version of the patch to master. It would be good if you 
could give it a try.

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-07 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641193#comment-16641193
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


Looks good and is working fine. 
Thanks. You did a better refactor than I dared to do. :-)


> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

2018-10-08 Thread Jakob Sultan Ericsson (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641482#comment-16641482
 ] 

Jakob Sultan Ericsson commented on COMPRESS-466:


One thing though? Why does {{getRawInputStream()}} return null in this case?
 Isn't basically same as {{getInputStream()}}

On thing that might not be totally related to this, why is 
{{ZipArchiveEntry.getLocalHeaderOffset()}} protected?
 We might have problems with taking the X seconds (18 in my test) penalty for 
opening the file and reading it every time. If {{getLocalHeaderOffset}} is 
public I can basically find out where the data starts and decompress it myself.

> Opening of a very large zip file is extremely slow compared to 
> java.util.zip.ZipFile
> 
>
> Key: COMPRESS-466
> URL: https://issues.apache.org/jira/browse/COMPRESS-466
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Compressors
>Affects Versions: 1.18
> Environment: Tested both on Linux and OSX 10.13.6.
>Reporter: Jakob Sultan Ericsson
>Priority: Major
> Fix For: 1.19
>
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
> try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
> System.out.println("File opened..." + (System.currentTimeMillis() 
> - start));
> }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms 
> (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and 
> basically all time is spent in
> {code:java}
> private void resolveLocalFileHeaderData(final Map NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)