Re: Stop using precompiled headers for Linux?

Magnus Ihse Bursie Fri, 02 Nov 2018 04:15:49 -0700


On 2018-11-02 11:39, Magnus Ihse Bursie wrote:

On 2018-11-02 00:53, Ioi Lam wrote:
Maybe precompiled.hpp can be periodically (weekly?) updated by arobot, which parses the dependencies files generated by gcc, and pickthe most popular N files?
I think that's tricky to implement automatically. However, I've donemore or less, that, and I've got some wonderful results! :-)


Ok, I'm done running my tests.

TL;DR: I've managed to reduce wall-clock time from 2m 45s (with pch) or2m 23s (without pch), to 1m 55s. The cpu time spent went from 52m 27s(with pch) or 55m 30s (without pch) to 41m 10s. This is a huge gain forour automated builds! And a clear improvement even for the ordinarydeveloper.

The list of included header files is reduced to just 37. The winningcombination was to include all header files that was included in morethan 130 different files, but to exclude all files with the name"*.inline.hpp". Hopefully, a further gain of not pulling in the*.inline.hpp files is that the risk of pch/non-pch failures will diminish.

However, these 37 files in turn pull in an additional 201 header files.Of these, three are *.inline.hpp:share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdBits.inline.hpp,os_cpu/linux_x86/bytes_linux_x86.inline.hpp andos_cpu/linux_x86/copy_linux_x86.inline.hpp. This looks like a problemwith the header files to me.

With some exceptions (mostly related to JFR), these additional 200 fileshave "generic" looking names (like share/gc/g1/g1_globals.hpp), whichindicate to me that it is reasonable to have them in this list, just asthe list of the original 37 tended to be quite general and high-levelincludes. However, some files (likeshare/jfr/instrumentation/jfrEventClassTransformer.hpp) has maybe leakedin where they should not really be. It might be worth letting a hotspotengineer spend some cycles to check up these files and see if anythingcan be improved.

Caveats: I have only run this on my local linux build with the defaultserver JVM configuration. Other machines will have different sweetspots. Other JVM variants/feature combinations will have different sweetspots. And, most importantly, I have not tested this at all on Windows.Nevertheless, I'm almost prepared to suggest a patch that uses thisselection of files if running on gcc, just as is, because of the speedimprovements I measured.


And some data:

Here is my log from my runs. The "on or above" means the cutoff I usedfor how many files that needed to include the files that were selected.As you can see, there is not much difference between cutoffs between130-150, or (without the inline files) between 110 and 150. (There werea lot of additional inline files in the positions below 130.) With allother equal, I'd prefer a solution with fewer files. That is less likelyto go bad.


real    2m45.623s
user    52m27.813s
sys    5m27.176s
hotspot with original pch

real    2m23.837s
user    55m30.448s
sys    3m39.739s
hotspot without pch

real    1m59.533s
user    42m50.019s
sys    3m0.893s
hotspot new pch on or above 250

real    1m58.937s
user    42m18.994s
sys    3m0.245s
hotspot new pch on or above 200

real    2m0.729s
user    42m16.636s
sys    2m57.125s
hotspot new pch on or above 170

real    1m58.064s
user    42m9.618s
sys    2m57.635s
hotspot new pch on or above 150

real    1m58.053s
user    42m9.796s
sys    2m58.732s
hotspot new pch on or above 130

real    2m3.364s
user    42m54.818s
sys    3m2.737s
hotspot new pch on or above 100

real    2m6.698s
user    44m30.434s
sys    3m12.015s
hotspot new pch on or above 70

real    2m0.598s
user    41m17.810s
sys    2m56.258s
hotspot new pch on or above 150 without inline

real    1m55.981s
user    41m10.076s
sys    2m51.983s
hotspot new pch on or above 130 without inline

real    1m56.449s
user    41m10.667s
sys    2m53.808s
hotspot new pch on or above 110 without inline

And here is the "winning" list (which I declared as "on or above 130,without inline"). I encourage everyone to try this on their own system,and report back the results!


#ifndef DONT_USE_PRECOMPILED_HEADER
# include "classfile/classLoaderData.hpp"
# include "classfile/javaClasses.hpp"
# include "classfile/systemDictionary.hpp"
# include "gc/shared/collectedHeap.hpp"
# include "gc/shared/gcCause.hpp"
# include "logging/log.hpp"
# include "memory/allocation.hpp"
# include "memory/iterator.hpp"
# include "memory/memRegion.hpp"
# include "memory/resourceArea.hpp"
# include "memory/universe.hpp"
# include "oops/instanceKlass.hpp"
# include "oops/klass.hpp"
# include "oops/method.hpp"
# include "oops/objArrayKlass.hpp"
# include "oops/objArrayOop.hpp"
# include "oops/oop.hpp"
# include "oops/oopsHierarchy.hpp"
# include "runtime/atomic.hpp"
# include "runtime/globals.hpp"
# include "runtime/handles.hpp"
# include "runtime/mutex.hpp"
# include "runtime/orderAccess.hpp"
# include "runtime/os.hpp"
# include "runtime/thread.hpp"
# include "runtime/timer.hpp"
# include "services/memTracker.hpp"
# include "utilities/align.hpp"
# include "utilities/bitMap.hpp"
# include "utilities/copy.hpp"
# include "utilities/debug.hpp"
# include "utilities/exceptions.hpp"
# include "utilities/globalDefinitions.hpp"
# include "utilities/growableArray.hpp"
# include "utilities/macros.hpp"
# include "utilities/ostream.hpp"
# include "utilities/ticks.hpp"
#endif // !DONT_USE_PRECOMPILED_HEADER

/Magnus

I'd still like to run some more tests, but preliminiary data indicatesthat there is much to be gained by having a more sensible list offiles in the precompiled header.
The fewer files we got on this list, the less likely it is to become(drastically) outdated. So I don't think we need to do thisautomatically, but perhaps manually every now and then when we feelbuild times are increasing.
/Magnus
- Ioi


On 11/1/18 4:38 PM, David Holmes wrote:
It's not at all obvious to me that the way we use PCH is theright/best way to use it. We dump every header we think it would begood to precompile into precompiled.hpp and then only ask gcc toprecompile it. That results in a ~250MB file that has to be readinto and processed for every source file! That doesn't seem veryefficient to me.
Cheers,
David

On 2/11/2018 3:18 AM, Erik Joelsson wrote:
Hello,
My point here, which wasn't very clear, is that Mac and Linux seemto lose just as much real compile time. The big difference in thesetests was rather the number of cpus in the machine (32 threads inthe linux box vs 8 on the mac). The total amount of work done wasincreased when PCH was disabled, that's the user time. Here is mytheory on why the real (wall clock) time was not consistent withuser time between these experiments can be explained:
With pch the time line (simplified) looks like this:

1. Single thread creating PCH
2. All cores compiling C++ files

When disabling pch it's just:

1. All cores compiling C++ files
To gain speed with PCH, the time spent in 1 much be less than thetime saved in 2. The potential time saved in 2 goes down as thenumber of cpus go up. I'm pretty sure that if I repeated theexperiment on Linux on a smaller box (typically one we use in CI),the results would look similar to Macosx, and similarly, if I hadaccess to a much bigger mac, it would behave like the big Linuxbox. This is why I'm saying this should be done for both or none ofthese platforms.
In addition to this, the experiment only built hotspot. If you wewould instead build the whole JDK, then the time wasted in 1 in thePCH case would be negated to a large extent by other build targetsrunning concurrently, so for a full build, PCH is still providingvalue.
The question here is that if the value of PCH isn't very big,perhaps it's not worth it if it's also creating as much grief asdescribed here. There is no doubt that there is value however. Andgiven the examination done by Magnus, it seems this value could beincreased.
The main reason why we haven't disabled PCH in CI before this. Wereally really want to get CI builds fast. We don't have a ton ofover capacity to just throw at it. PCH made builds faster, so weused them. My other reason is consistency between builds.Supporting multiple different modes of building creates thepotential for inconsistencies. For that reason I would definitelynot support having PCH on by default, but turned off in ourCI/dev-submit. We pick one or the other as the official buildconfiguration, and we stick with the official build configurationfor all builds of any official capacity (which includes CI).
In the current CI setup, we have a bunch of tiers that execute oneafter the other. The jdk-submit currently only runs tier1. In tier2I've put slowdebug builds with PCH disabled, just to help verify acommon developer configuration. These builds are not meant to beused for testing or anything like that, they are just run forverification, which is why this is ok. We could argue that it wouldmake sense to move the linux-x64-slowdebug without pch build totier1 so that it's included in dev-submit.
/Erik

On 2018-11-01 03:38, Magnus Ihse Bursie wrote:
On 2018-10-31 00:54, Erik Joelsson wrote:
Below are the corresponding numbers from a Mac, (Mac Pro (Late2013), 3.7 GHz, Quad-Core Intel Xeon E5, 16 GB). To be clear, the-npch is without precompiled headers. Here we see a slightdegradation when disabling on both user time and wall clock time.My guess is that the user time increase is about the same, butbecause of a lower cpu count, the extra load is not as easilycovered.
These tests were run with just building hotspot. This means thatthe precompiled header is generated alone on one core whilenothing else is happening, which would explain this degradationin build speed. If we were instead building the whole product, wewould see a better correlation between user and real time.
Given the very small benefit here, it could make sense to disableprecompiled headers by default for Linux and Mac, just as we didwith ccache.
I do know that the benefit is huge on Windows though, so wecannot remove the feature completely. Any other comments?
Well, if you show that it is a loss in time on macosx to disableprecompiled headers, and no-one (as far as I've seen) hascomplained about PCH on mac, then why not keep them on as defaultthere? That the gain is small is no argument to lose it. (Iremember a time when you were hunting seconds in the build time ;-))
On linux, the story seems different, though. People experience PCHas a problem, and there is a net loss of time, at least onselected testing machines. It makes sense to turn it off asdefault, then.
/Magnus
/Erik

macosx-x64
real     4m13.658s
user     27m17.595s
sys     2m11.306s

macosx-x64-npch
real     4m27.823s
user     30m0.434s
sys     2m18.669s

macosx-x64-debug
real     5m21.032s
user     35m57.347s
sys     2m20.588s

macosx-x64-debug-npch
real     5m33.728s
user     38m10.311s
sys     2m27.587s

macosx-x64-slowdebug
real     3m54.439s
user     25m32.197s
sys     2m8.750s

macosx-x64-slowdebug-npch
real     4m11.987s
user     27m59.857s
sys     2m18.093s


On 2018-10-30 14:00, Erik Joelsson wrote:
Hello,

On 2018-10-30 13:17, Aleksey Shipilev wrote:
On 10/30/2018 06:26 PM, Ioi Lam wrote:
Is there any advantage of using precompiled headers on Linux?
I have measured it recently on shenandoah repositories, andfastdebug/release build times have notimproved with or without PCH. Actually, it gets worse when youtouch a single header that is in PCHlist, and you end up recompiling the entire Hotspot. I would bein favor of disabling it by default.
I just did a measurement on my local workstation (2x8 cores x2ht Ubuntu 18.04 using Oracle devkit GCC 7.3.0). I ran "time makehotspot" with clean build directories.
linux-x64:
real    4m6.657s
user    61m23.090s
sys    6m24.477s

linux-x64-npch
real    3m41.130s
user    66m11.824s
sys    4m19.224s

linux-x64-debug
real    4m47.117s
user    75m53.740s
sys    8m21.408s

linux-x64-debug-npch
real    4m42.877s
user    84m30.764s
sys    4m54.666s

linux-x64-slowdebug
real    3m54.564s
user    44m2.828s
sys    6m22.785s

linux-x64-slowdebug-npch
real    3m23.092s
user    55m3.142s
sys    4m10.172s
These numbers support your claim. Wall clock time is actuallyincreased with PCH enabled, but total user time is decreased.Does not seem worth it to me.
It's on by default and we keep having
breakage where someone would forget to add #include. Thelatest instance is JDK-8213148.
Yes, we catch most of these breakages in CIs. Which tells meadding it to jdk-submit would cover
most of the breakage during pre-integration testing.
jdk-submit is currently running what we call "tier1". We do havebuilds of Linux slowdebug with precompiled headers disabled intier2. We also build solaris-sparcv9 in tier1 which does notsupport precompiled headers at all, so to not be caught injdk-submit you would have to be in Linux specific code. Theexample bug does not seem to be that. Mach5/jdk-submit was downover the weekend and yesterday so my suspicion is the offendingcode in this case was never tested.
That said, given that we get practically no benefit from PCH onLinux/GCC, we should probably just turn it off by default forLinux and/or GCC. I think we need to investigate Macos as wellhere.
/Erik
-Aleksey

Re: Stop using precompiled headers for Linux?

Reply via email to