> On 1 May 2016, at 12:53, Alan Bateman <alan.bate...@oracle.com> wrote:
> 
> On 01/05/2016 03:02, Claes Redestad wrote:
>> Hi,
>> 
>> Alan asked me to take a look at jmod performance (also jlink, but saving 
>> that for another day), so I set
>> up a naive benchmark[1] and started profiling.
>> 
>> ... and saw nothing really suspicious except that time is split between 
>> doing I/O and executing native code in
>> libz.so, which I guess isn't surprising. Oddly enough the only java methods 
>> that even show up in
>> profiles are related to writing, so I figured taking a closer look at the 
>> code for writing output from jmod
>> wouldn't hurt. Turns out I was wrong, since I soon found that the output 
>> stream used by JmodTask is
>> unbuffered...
>> 
>> Applied a trivial patch[2] and results of running the micro with -f 10 -i 1 
>> -bm ss (which is more or less like
>> running jmod standalone):
>> 
>> Benchmark                   Mode  Cnt  Score   Error  Units
>> JmodBenchmark.jmodJavaBase    ss   10  1.966 ± 0.297   s/op # before
>> JmodBenchmark.jmodJavaBase    ss   10  1.196 ± 0.142   s/op # after
>> 
>> Seems like a notable reduction right there. Timing runs of jmod standalone 
>> gives analogous results on
>> real time, but user time is still almost as high.
>> 
>> Poking around further and it's obvious JIT threads are eating a larger 
>> portion of my cycles now - likely C2 is
>> ramping up but not having time to get much done in the short life-time of 
>> jmod, which is mostly spent in
>> native code anyhow. Switching to running short-running apps with only C1 can 
>> be profitable, especially on
>> machines with a lot of cores (like the 2x8x2 machine I'm running this on), 
>> so I ran the numbers:
>> 
>> Again, with time:
>> 
>> Benchmark                   Mode  Cnt  Score   Error  Units
>> JmodBenchmark.jmodJavaBase    ss   10  1.175 ± 0.147   s/op
>> 
>> real    0m17.140s
>> user    0m54.868s
>> sys    0m4.172s
>> 
>> -XX:TieredStopAtLevel=1
>> 
>> Benchmark                    Mode  Cnt  Score   Error  Units
>> JmodBenchmark.jmodJavaBase  thrpt   10  1.075 ± 0.194  ops/s
>> 
>> real    0m14.810s
>> user    0m15.556s
>> sys    0m1.584s
>> 
>> Yep, only running "C1" improves things a lot in this case and on my 
>> environment.
>> 
>> I suggest accepting the patch[2] as well as switching the jmod runner to run 
>> with -XX:TieredStopAtLevel=1
>> or similar. Both are likely needed for most to see any effect on build times.
>> 
>> A long term alternative to consider might be to implement a server-based 
>> jmod akin to the javac server.
> Thanks Claes, this is good analysis!

Yes, this is great work. Thanks Claes.

> The create method should be using a BufferedOutputStream,

This was an oversight in the original implementation. The output
should be buffered.

> I'm surprised that it isn't. 'll get that patch in the current refresh 
> although it looks like this helps more with the benchmark that with the build.
> 
> I changed make/CreateJmods.java to use -XX:TieredStopAtLevel=1 and make a bit 
> difference in the build. The wall clock time to create the jmods on my local 
> machine drops from 46s. to 22s. I also tried a remote Windows machine and the 
> time to create the jmods also dropped by about 20s.

Wow, this is a real win. Good find.

> I'm sure Erik will have advice on how to fit this in. As things stand, the VM 
> options for the jmod command are configured in spec.gmk.in to to use 
> $(JAVA_TOOL_FLAGS_SMALL). Maybe it's time to change JAVA_TOOL_FLAGS_SMALL as 
> it it seems to be  -XX:+UseSerialGC and some heap settings at this time.

I would expect that a number of other tools could benefit from this
too.

-Chris.

Reply via email to