On 01/05/2016 03:02, Claes Redestad wrote:
Hi,

Alan asked me to take a look at jmod performance (also jlink, but saving that for another day), so I set
up a naive benchmark[1] and started profiling.

... and saw nothing really suspicious except that time is split between doing I/O and executing native code in libz.so, which I guess isn't surprising. Oddly enough the only java methods that even show up in profiles are related to writing, so I figured taking a closer look at the code for writing output from jmod wouldn't hurt. Turns out I was wrong, since I soon found that the output stream used by JmodTask is
unbuffered...

Applied a trivial patch[2] and results of running the micro with -f 10 -i 1 -bm ss (which is more or less like
running jmod standalone):

Benchmark                   Mode  Cnt  Score   Error  Units
JmodBenchmark.jmodJavaBase    ss   10  1.966 ± 0.297   s/op # before
JmodBenchmark.jmodJavaBase    ss   10  1.196 ± 0.142   s/op # after

Seems like a notable reduction right there. Timing runs of jmod standalone gives analogous results on
real time, but user time is still almost as high.

Poking around further and it's obvious JIT threads are eating a larger portion of my cycles now - likely C2 is ramping up but not having time to get much done in the short life-time of jmod, which is mostly spent in native code anyhow. Switching to running short-running apps with only C1 can be profitable, especially on machines with a lot of cores (like the 2x8x2 machine I'm running this on), so I ran the numbers:

Again, with time:

Benchmark                   Mode  Cnt  Score   Error  Units
JmodBenchmark.jmodJavaBase    ss   10  1.175 ± 0.147   s/op

real    0m17.140s
user    0m54.868s
sys    0m4.172s

-XX:TieredStopAtLevel=1

Benchmark                    Mode  Cnt  Score   Error  Units
JmodBenchmark.jmodJavaBase  thrpt   10  1.075 ± 0.194  ops/s

real    0m14.810s
user    0m15.556s
sys    0m1.584s

Yep, only running "C1" improves things a lot in this case and on my environment.

I suggest accepting the patch[2] as well as switching the jmod runner to run with -XX:TieredStopAtLevel=1 or similar. Both are likely needed for most to see any effect on build times.

A long term alternative to consider might be to implement a server-based jmod akin to the javac server.
Thanks Claes, this is good analysis!

The create method should be using a BufferedOutputStream, I'm surprised that it isn't. 'll get that patch in the current refresh although it looks like this helps more with the benchmark that with the build.

I changed make/CreateJmods.java to use -XX:TieredStopAtLevel=1 and make a bit difference in the build. The wall clock time to create the jmods on my local machine drops from 46s. to 22s. I also tried a remote Windows machine and the time to create the jmods also dropped by about 20s.

I'm sure Erik will have advice on how to fit this in. As things stand, the VM options for the jmod command are configured in spec.gmk.in to to use $(JAVA_TOOL_FLAGS_SMALL). Maybe it's time to change JAVA_TOOL_FLAGS_SMALL as it it seems to be -XX:+UseSerialGC and some heap settings at this time.

As regards changing the jmod launcher then one concern with that it the options might conflict with options specified via -J so would need to look at that more closely.

-Alan

Reply via email to