> On 1 May 2016, at 12:53, Alan Bateman <alan.bate...@oracle.com> wrote: > > On 01/05/2016 03:02, Claes Redestad wrote: >> Hi, >> >> Alan asked me to take a look at jmod performance (also jlink, but saving >> that for another day), so I set >> up a naive benchmark[1] and started profiling. >> >> ... and saw nothing really suspicious except that time is split between >> doing I/O and executing native code in >> libz.so, which I guess isn't surprising. Oddly enough the only java methods >> that even show up in >> profiles are related to writing, so I figured taking a closer look at the >> code for writing output from jmod >> wouldn't hurt. Turns out I was wrong, since I soon found that the output >> stream used by JmodTask is >> unbuffered... >> >> Applied a trivial patch[2] and results of running the micro with -f 10 -i 1 >> -bm ss (which is more or less like >> running jmod standalone): >> >> Benchmark Mode Cnt Score Error Units >> JmodBenchmark.jmodJavaBase ss 10 1.966 ± 0.297 s/op # before >> JmodBenchmark.jmodJavaBase ss 10 1.196 ± 0.142 s/op # after >> >> Seems like a notable reduction right there. Timing runs of jmod standalone >> gives analogous results on >> real time, but user time is still almost as high. >> >> Poking around further and it's obvious JIT threads are eating a larger >> portion of my cycles now - likely C2 is >> ramping up but not having time to get much done in the short life-time of >> jmod, which is mostly spent in >> native code anyhow. Switching to running short-running apps with only C1 can >> be profitable, especially on >> machines with a lot of cores (like the 2x8x2 machine I'm running this on), >> so I ran the numbers: >> >> Again, with time: >> >> Benchmark Mode Cnt Score Error Units >> JmodBenchmark.jmodJavaBase ss 10 1.175 ± 0.147 s/op >> >> real 0m17.140s >> user 0m54.868s >> sys 0m4.172s >> >> -XX:TieredStopAtLevel=1 >> >> Benchmark Mode Cnt Score Error Units >> JmodBenchmark.jmodJavaBase thrpt 10 1.075 ± 0.194 ops/s >> >> real 0m14.810s >> user 0m15.556s >> sys 0m1.584s >> >> Yep, only running "C1" improves things a lot in this case and on my >> environment. >> >> I suggest accepting the patch[2] as well as switching the jmod runner to run >> with -XX:TieredStopAtLevel=1 >> or similar. Both are likely needed for most to see any effect on build times. >> >> A long term alternative to consider might be to implement a server-based >> jmod akin to the javac server. > Thanks Claes, this is good analysis!
Yes, this is great work. Thanks Claes. > The create method should be using a BufferedOutputStream, This was an oversight in the original implementation. The output should be buffered. > I'm surprised that it isn't. 'll get that patch in the current refresh > although it looks like this helps more with the benchmark that with the build. > > I changed make/CreateJmods.java to use -XX:TieredStopAtLevel=1 and make a bit > difference in the build. The wall clock time to create the jmods on my local > machine drops from 46s. to 22s. I also tried a remote Windows machine and the > time to create the jmods also dropped by about 20s. Wow, this is a real win. Good find. > I'm sure Erik will have advice on how to fit this in. As things stand, the VM > options for the jmod command are configured in spec.gmk.in to to use > $(JAVA_TOOL_FLAGS_SMALL). Maybe it's time to change JAVA_TOOL_FLAGS_SMALL as > it it seems to be -XX:+UseSerialGC and some heap settings at this time. I would expect that a number of other tools could benefit from this too. -Chris.