Hi Sherman,
I think that you will get significant benefit from generating the data
structures in the background threads.
I think that is you profile the usageyou will see that the generation
of the header information is the dominant feature.
That is why I parallelised the writing process.
There are several bottlenecks such as the encoding of the name name
and (although you dismiss it) the calculation of the dos time format
is a CPU hog (the -D qualifier). I hink that it is about 10% of the
overall CPU load
This is by the way pretty much in line with the extraction feature
below added in java 6, so I cant see that there is a great reason
against it,
after all why spend time storing information that (in most use cases)
is not read (either because the jar utility does not by default
maintain it, and most jar files are
probably not expanded anyway
/**
* If true, maintain compatibility with JDK releases prior to 6.0 by
* timestamping extracted files with the time at which they are
extracted.
* Default is to use the time given in the archive.
*/
private static final boolean useExtractionTime =
Boolean.getBoolean("sun.tools.jar.useExtractionTime");
Here are the times that I get running the code that you wrote on my setup
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf, cf, 10279
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT1, cfT1, 9652
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT2, cfT2, 6139
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT3, cfT3, 5683
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT4, cfT4, 6102
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT5, cfT5, 6172
I think that the reason that it tails off in performance is rather
that you are overloading the system with the background threads. You
have many threads (ie > cores) loading the files, and they are
contending for the CPU
and the writer thread is not getting its share of time, so with 3
threads + the initail file scanning and the writer thread there are
more threads that can be services
If you introduce an ArrayBlockingQueue for both of the scanning ->
compression and the compression->writing
and also get run of the cpu bound ( until the scanner gets going)
polling like
while(true) {
Object o = elist.poll();
if (o == null)
continue;
I dont think that you have the seperation of the loading and storing
sorted out. The code adds the future to elist, and the worker thread
reads it whether or not it has completed,
so some times the loading is done on the background thread before the
main thread reads it, and sometimes it blocks, even when other jobs
have completed, so I think that a completion queue
works better for this. It will complicate the END processing though
If I am reading the code correctly I think that there are potential
memory issues.
There are an unlimitted number of jobs submitted to an executor, which
while it only executes T jobs, the jobs may still queue up in elist,
and each job can buffer 50Mb of data. If the writing of the output is
too slow you could run out of memory
Line 666 and 672 (both return statements ) I think should be continue;
With T1 there is no effective pipelining as I see it. The scannign
thread has to complete before the loading thread can start (as there
is only 1 CPU). So withthe blocking thread model we have to start at 2
threads as otherwise it may deadlock itself
with a blocking queue (and minor changes caused or implied by a
blocking queue)
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf, cf, 10274
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT2, cfT2, 7201
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT3, cfT3, 5836
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT4, cfT4, 5884
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT5, cfT5, 5890
I tried to repoduce the exception that you have, but I cant
I donw have a java8 install on this machine, or unix.
It does seem very strange that there is a file being written as "../"
in the first place though ( let alone a duplicate)
I didnt think that any of the API would return ../
Is it only on Z3 that this error occurs?
I will install a Java8 with the patch, but it will be at the start of
next week
regards
Mike
------------------------------------------------------------------------
*From:* Xueming Shen <xueming.s...@oracle.com>
*To:* core-libs-dev@openjdk.java.net
*Sent:* Thursday, 27 October 2011, 0:19
*Subject:* Re: performance updates to jar and zip
Hi Mark
It appears the patch you provided throws unexpected exception
(attached at the end of my
email) when I tried it out on the latest JDK8 repository. Since
I only did a quick scan of your
patch, I'm not sure what went wrong here.
This patch includes lots of stuff that obviously you are
trying/testing on, as you "warned" us in
your email, I can see at least it tries to
(1) to support different compression level 0-9
(2) parallel Zip file writing
(3) with various m-thread strategy -Z
(4) Files.walkFileTree instead of File.list
(5) the -D :-) which I would really not recommend to do
(6) small optimization in various places.
which makes the code a little hard to read and the resulting data
hard to compare with.
I would suggest to divide this proposal to separate pieces and
work on them one by one,
for example maybe we can try to solve the main puzzle (2) + (3)
first, and then the other
optimization opportunities.
To collect some data, I followed your lead to write a simple MT
support implementation
in Jar Main class as showed at
http://cr.openjdk.java.net/~sherman/mtjar2/webrev2/
<http://cr.openjdk.java.net/%7Esherman/mtjar2/webrev2/>
which I guess is similar to what your are doing. It uses a
"simple" strategy
(1) A dedicated thread (from the ExecutorService thread pool) to
iterate the file system
tree to "collect" the target files, submit a "compression
job" for each of these files
to the ExecutorService and keep the returned "Future" (from
the submission) in a
queue "elist".
(2) Threads from ExecutorService to use temporary buffer memory to
read and compress
the the file in memory .
(3) The main thread is polling the queue "elist", waiting for the
"compression job" to
cmplete and write the result into the target ZipOutputStream.
The resulting data looks promising, I'm seeing the jar-ing speed
doubled when jar-ing
the rt.jar and a jdk7 binary tree, on a "slow" but 4-core linux vm
machine (I have the
similar result on a 2-hcore linux as well)
java Jar cf jdk.jar jdk1.7.0 Jar TotalTime:17278
java Jar cfT1 jdk.jar jdk1.7.0 Jar TotalTime:12345
java Jar cfT2 jdk.jar jdk1.7.0 Jar TotalTime:7559
java Jar cfT3 jdk.jar jdk1.7.0 Jar TotalTime:7572
java Jar cfT4 jdk.jar jdk1.7.0 Jar TotalTime:7801
java Jar cfT5 jdk.jar jdk1.7.0 Jar TotalTime:8112
The new "T" option for "n-thread", the digit number followed is to
specify the
fixed thread number for the executor service's thread pool. It
appears that we can
achieve the "best" result with only 3 threads in this
configuration. One thread for
scanning the file system, one thread for the compression and the
main thread for
the writing out. My guess is that the fact we have to "write out"
to a single file
(the resulting jar) limits the potential benefit of having more
"compressing" threads.
I also tried to measure the "file scanning" speed in my
mini-benchmark FIter
http://cr.openjdk.java.net/~sherman/mtjar2/FIter.java
<http://cr.openjdk.java.net/%7Esherman/mtjar2/FIter.java>
Here are the "surprising" results.
"nio" is the walkFileTree,
"io" is the File.list()
"io2" is the File.listFiles().
The nio's File.walkFileTree is 15 times faster than the
"traditional" recursion+File.list().
wow!
Linux--------------------------------------------------------------------------
sherman@sherman-linux:~/Workspace/test$ java FIter ../jdk8_mtJar/src
java.io.File iteration
------------------
nio.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:85
------------------
io.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:269
------------------
io2.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:450
Windows7---------------------------------------------------------------------------------
$ /cygdrive/c/Program\ Files\ \(x86\)/Java/jdk1.7.0/bin/java FIter
../sqa/jdk8/src
java.io.File iteration
------------------
nio.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:323
------------------
io.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:2633
------------------
io2.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:4592
----------------------------------------------------------------------
sherman@sherman-linux:~/Workspace/test$
../jdk8_mtJar/build/linux-i586/bin/jar cf6DZ3 rt0.jar rtjar
duplicate path
java.util.zip.ZipException: duplicate entry: ../
at
java.util.zip.AbstractZipWriter.writeHeader(AbstractZipWriter.java:647)
at
java.util.zip.AbstractZipWriter.startWritingStored(AbstractZipWriter.java:384)
at
java.util.zip.AbstractZipWriter.writeWithResource(AbstractZipWriter.java:350)
at
java.util.zip.AbstractZipWriter.writeAll(AbstractZipWriter.java:273)
at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:410)
at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:350)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
java.util.concurrent.ExecutionException:
java.util.zip.ZipException: duplicate entry: ../
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at sun.tools.jar.Main.waitFor(Main.java:810)
at sun.tools.jar.Main.run(Main.java:679)
at sun.tools.jar.Main.main(Main.java:1842)
Caused by: java.util.zip.ZipException: duplicate entry: ../
-Sherman
On 10/20/2011 3:55 PM, Mike Skells wrote:
> Hi All,
> I have some performance updates for the jar tool and for the
Zip/Jar writing components, including some code to allow parallel
writing of Jar and ZIP files (in java.util)
> This work is not finished as yet but I am looking to see if
anyone has any views as to the shape this should move in
> Currently it is a testbed for comparing different techniques,
but largely based on the Jar utility
>
> The changes allow the work to be spread across multiple CPUs and
optimise the some of the code and I/O paths
>
> This comparative figures do not include the effect of the nio
changes that I proposed in earlier emails
>
> Command line changes
> 0--9 - I have added support for specifying different compression
levels (the existing jar command just allows default compression
or '0' for no compression
> D This allows the files to all be written with the date of now,
lather than the file date (the conversion of the date to zip
format is a CPU hog, and not needed in some use-cases)
> Z0-5 - these are the different mechanisms to allow different
parallel execution models - I would not expect this to be a
production qualifier
>
> The test environment is a 4 core Intel core2 pc running windows
vista 64, the test case is jaring up the content of rt.jar to a
jar file. Each test is repeated 6 times and the last 5 are
averaged to produce the answers. Each test is run in a fresh VM
>
> The performance figures are below as a CSV. The last column is
the duration of the task in ms.
>
> In summary the existing jar utility takes (for uncompressed,
compressed) 8.4 , 9.4 seconds to complete and this can be reduced
to 1.6, 2.3 seconds
> The different parallel algorithms are 0 - none all in one thread
as before
> 1 - file scanning in one core, 10 threads loading and buffering
files, zip writing in a single thread using the existing
ZipOuputStream
> 2. - file scanning in one core, 10 threads loading and buffering
files, zip writing mostly mutithreaded (e.g. parallel compression,
single write to the output stream)
> 3 - as 2 but writes to a file rather than a stream
> 4. as 2 but uses channels to be to write with direct buffers
> 5 as 4 but using heap buffers
>
> 3-5 have the zip capability in the code to seek and update
headers that are incomplete, but this is not much tested
>
>
>
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program
Files\Java\jdk1.6.0_24\lib\tools.jar, -cf0, java 1.6 rt -cf0, 8482
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program
Files\Java\jdk1.6.0_24\lib\tools.jar, -cf, java 1.6 rt -cf, 9318
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program
Files\Java\jdk1.7.0\lib\tools.jar, -cf0, java 1.7 rt -cf0, 8497
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program
Files\Java\jdk1.7.0\lib\tools.jar, -cf, java 1.7 rt -cf, 9518
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\Test\Archive\baseline.jar, -cf0, orig 1.7 rt -cf0, 8448
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\Test\Archive\baseline.jar, -cf, orig 1.7 rt -cf, 9484
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0,
project 1.7 rt -cf0, 3133
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0D,
project 1.7 rt -cf0D, 2824
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0Z0,
project 1.7 rt -cf0 parallel 0, 3026
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ0,
project 1.7 rt -cf0D parallel 0, 2961
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ1,
project 1.7 rt -cf0D parallel 1, 2022
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ2,
project 1.7 rt -cf0D parallel 2, 1757
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ3,
project 1.7 rt -cf0D parallel 3, 1632
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ4,
project 1.7 rt -cf0D parallel 4, 1994
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ5,
project 1.7 rt -cf0D parallel 5, 1978
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1,
project 1.7 rt -cf1, 5237
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1D,
project 1.7 rt -cf1D, 5073
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1Z0,
project 1.7 rt -cf1 parallel 0, 5367
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ0,
project 1.7 rt -cf1D parallel 0, 5002
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ1,
project 1.7 rt -cf1D parallel 1, 5125
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ2,
project 1.7 rt -cf1D parallel 2, 2257
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ3,
project 1.7 rt -cf1D parallel 3, 2145
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ4,
project 1.7 rt -cf1D parallel 4, 2505
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ5,
project 1.7 rt -cf1D parallel 5, 2549
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf2,
project 1.7 rt -cf2, 5371
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf3,
project 1.7 rt -cf3, 5409
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf4,
project 1.7 rt -cf4, 5778
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf5,
project 1.7 rt -cf5, 5906
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6,
project 1.7 rt -cf6, 6082
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf7,
project 1.7 rt -cf7, 6070
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf8,
project 1.7 rt -cf8, 6251
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9,
project 1.7 rt -cf9, 6191
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6D,
project 1.7 rt -cf6D, 5843
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6Z0,
project 1.7 rt -cf6 parallel 0, 6095
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ0,
project 1.7 rt -cf6D parallel 0, 5907
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ1,
project 1.7 rt -cf6D parallel 1, 5957
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ2,
project 1.7 rt -cf6D parallel 2, 2388
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ3,
project 1.7 rt -cf6D parallel 3, 2351
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ4,
project 1.7 rt -cf6D parallel 4, 2694
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ5,
project 1.7 rt -cf6D parallel 5, 2830
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9D,
project 1.7 rt -cf9D, 6134
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9Z0,
project 1.7 rt -cf9 parallel 0, 6258
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ0,
project 1.7 rt -cf9D parallel 0, 6066
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ1,
project 1.7 rt -cf9D parallel 1, 6203
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ2,
project 1.7 rt -cf9D parallel 2, 2490
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ3,
project 1.7 rt -cf9D parallel 3, 2361
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ4,
project 1.7 rt -cf9D parallel 4, 2788
> C:\Program Files\Java\jdk1.7.0\bin\java.exe,
C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ5,
project 1.7 rt -cf9D parallel 5, 2847
>
> regards
> Mike