On 2018-11-02 17:21, Erik Joelsson wrote:
Nice work!
What exactly are you measuring, "make hotspot" or some other target?
Yes, "make hotspot".
If we can find a reasonable set of extra files for the windows pch
that restores all or most of the performance, that would of course be
preferable. I doubt we will find a significantly better selection on
Mac compared to Linux though.
It seems the best selection for Mac more or less exactly equals Linux.
Which is nice. For Windows, I was able to more or less precisely match
the original behaviour with the Linux set + the four inline.hpp files I
removed for Linux:
# include "oops/oop.inline.hpp"
# include "memory/allocation.inline.hpp"
# include "oops/access.inline.hpp"
# include "runtime/handles.inline.hpp"
Then I got from the original:
real 6m39.035s
user 0m58.580s
sys 2m48.138s
hotspot with original pch
to:
real 6m18.645s
user 0m55.963s
sys 2m28.264s
hotspot with new pch, BKM (on and above 130), including inline
Quite good for just adding four more files depending on the Windows
platform.
By adding yet some more include files (and keeping the inline files), I
was able to improve Windows compile time somewhat more:
real 6m7.355s
user 0m55.718s
sys 2m26.153s
hotspot with new pch on and above 110, including inline
Then I also added this set:
// 130-110
# include "runtime/thread.inline.hpp"
# include "utilities/bitMap.inline.hpp"
# include "oops/arrayOop.inline.hpp"
# include "gc/shared/gcId.hpp"
# include "runtime/mutexLocker.hpp"
# include "oops/objArrayOop.inline.hpp"
# include "classfile/javaClasses.inline.hpp"
# include "memory/referenceType.hpp"
# include "oops/weakHandle.hpp"
# include "oops/compressedOops.inline.hpp"
# include "gc/shared/barrierSet.hpp"
# include "utilities/stack.hpp"
# include "gc/g1/g1YCTypes.hpp"
# include "memory/padded.hpp"
# include "logging/logHandle.hpp"
This starts to look a bit specialized (the g1 files is likely to need
#ifdef guards etc), so maybe it's not worth it.
/Magnus
/Erik
On 2018-11-02 07:00, Magnus Ihse Bursie wrote:
On 2018-11-02 12:14, Magnus Ihse Bursie wrote:
Caveats: I have only run this on my local linux build with the
default server JVM configuration. Other machines will have different
sweet spots. Other JVM variants/feature combinations will have
different sweet spots. And, most importantly, I have not tested this
at all on Windows. Nevertheless, I'm almost prepared to suggest a
patch that uses this selection of files if running on gcc, just as
is, because of the speed improvements I measured.
I've started running tests on other platforms. Unfortunately, I don't
have access to quite as powerful machines, so everything takes much
longer. For the moment, I've only tested my "BKM" (best known method)
from linux, to see if it works.
For xcode/macos I got:
real 4m21,528s
user 27m28,623s
sys 2m18,244s
hotspot with original pch
real 4m28,867s
user 29m10,685s
sys 2m14,456s
hotspot without pch
real 3m6,322s
user 19m3,000s
sys 1m41,252s
hotspot with new BKM pch
So obviously this is a nice improvement even here. I could probably
try around a bit and see if there is an even better fit with a
different selections of header files, but even without that, I'd say
this patch is by itself as good for clang as it is for gcc.
For windows I got:
real 6m39.035s
user 0m58.580s
sys 2m48.138s
hotspot with original pch
real 10m29.227s
user 1m6.909s
sys 2m24.108s
hotspot without pch
real 6m56.262s
user 0m57.563s
sys 2m27.514s
hotspot with new BKM pch
I'm not sure what's going on with the user time numbers here.
Presumably cygwin cannot get to the real Windows time data. What I
can see is the huge difference in wall clock time between PCH and no
PCH. I can also see that the new trimmed BKM list retains most of
that improvement, but is actually somewhat slower that the original
list. I'm currently rerunning with a larger set on Windows, to see if
this helps improve things. I can certainly live with a
precompiled.hpp that includes some additional files on Windows.
/Magnus