Very good, thank you!

Also note that the "new" implementation also requires *a lot* more heap, since 
all *uncompressed* file content is copied to the heap before deflating.

Best regards,

/Lennart Börjeson

> 15 apr. 2019 kl. 18:34 skrev Claes Redestad <claes.redes...@oracle.com>:
> 
> Hi Lennart,
> 
> I can reproduce this locally, and have narrowed down to
> https://bugs.openjdk.java.net/browse/JDK-8034802 as the cause.
> 
> As you say the compression is deferred to ZipFileSystem.close() now,
> whereas previously it happened eagerly. We will have to analyze the
> changes more in-depth to try and see why this is the case.
> 
> Thanks!
> 
> /Claes
> 
> On 2019-04-15 15:32, Lennart Börjeson wrote:
>> I have made a small command-line utility which creates zip archives by 
>> compressing the input files in parallel.
>> I do this by calling Files.copy(input, zipOutputStream) in a parallel Stream 
>> over all input files.
>> I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and on 
>> server-class machines with up to 40 cores.
>> It scales rather well and I get the expected speedup, i.e. roughly 
>> proportional to the number of cores.
>> With OpenJDK 12, however, I get no speedup whatsoever. My understanding is 
>> that in JDK 12, the copy method simply
>> transfers the contents of the input stream to a ByteArrayOutputStream, which 
>> is then deflated when I close the ZipFileSystem (by the ZipFileSystem.sync() 
>> method).
>> Previously, the deflation was done when in the call to Files.copy, thus 
>> executed in parallel, and the final ZipFileSystem.close() didn't do anything 
>> much.
>> (But I may of course be wrong. In case there's a simpler explanation and 
>> something I can remedy in my code, please let me know.)
>> I have a small GitHub gist for the utility here: 
>> https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923 
>> <https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923>
>> Steps to reproduce (timings on my 8-core MacBook Pro):
>> 1) Get the contents of the gist as a single file, Zip.java
>> 2) Compile it using Java 8.
>> $ export JAVA_HOME=<PATH-TO-JDK8-HOME>
>> $ javac -encoding utf8 Zip.java
>> 3) run on a sufficiently large number of files to exercise the parallelity: 
>> (I've used about 70 text files ca 60MB each)
>> $ time java -Xmx6g Zip -p /tmp/test.zip <WHATEVER>/*.log.
>> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options 
>> [PARALLEL]
>> ...
>> completed zip archive, now closing... done!
>> real 0m35.558s
>> user 3m58.134s
>> sys  0m5.543s
>> As is evident from the ratio between "user time" and "real time", all cores 
>> have been busy most of the time.
>> (Running with JDK 9, 10, and 11 produces similar timings.)
>> But running with JDK 12 defeats the parallelism:
>> $ export JAVA_HOME=<PATH-TO-JDK12-HOME>
>> $ rm /tmp/test.zip # From previous run
>> $ time java -Xmx6g Zip -p /tmp/test.zip <WHATEVER>/*.log.
>> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options 
>> [PARALLEL]
>> ...
>> completed zip archive, now closing... done!
>> real 3m1.187s
>> user 3m5.422s
>> sys  0m12.396s
>> Now there's almost no speedup. When observing the output, note that the 
>> ZipFileSystem.close() method is called immediately after the "now 
>> closing..." output, and "Done!" Is written when it returns, and when running 
>> with JDK 12 almost all running time is apparently spent there.
>> I'm hoping the previous behaviour could somehow be restored, i.e. that 
>> deflation actually happens when I'm copying the input files to the 
>> ZipFileSystem, and not when I close it.
>> Best regards,
>> /Lennart Börjeson
>>> 12 apr. 2019 kl. 14:25 skrev Lennart Börjeson <lenbo...@gmail.com>:
>>> 
>>> I've found what I believe is a rather severe performance regression in 
>>> ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
>>> 
>>> Is this the right forum to report such issues?
>>> 
>>> Best regards,
>>> 
>>> /Lennart Börjeson

Reply via email to