Re: Increases in build system time
Steffen Nurpmeso wrote: > This thread reminds me of me turning off hyperthreading. > Using the four cores i have with HT turned on results in a 40 > percent time penalty compared to when its off. (For example, > compiling the Linux kernel 4.19.X takes almost exactly 10 minutes > when it is turned off, and about 14 minutes when it is turned > on. Just a thought.) FWIW, these tests were run with hyperthreading disabled. -- Andreas Gustafsson, g...@gson.org
Re: Increases in build system time
Andreas Gustafsson wrote in <24014.56871.750141.885...@guava.gson.org>: |Jaromír Doleček wrote: |> I wonder also if we could try enabling vm.ubc_direct on the build \ |> machine? | |Using 2019.11.14.13.58.22 sources: | |with default settings: |4612.56 real 16896.10 user 9325.87 sys | |with vm.ubc_direct = 1: |4615.95 real 16819.96 user 9416.13 sys This thread reminds me of me turning off hyperthreading. Using the four cores i have with HT turned on results in a 40 percent time penalty compared to when its off. (For example, compiling the Linux kernel 4.19.X takes almost exactly 10 minutes when it is turned off, and about 14 minutes when it is turned on. Just a thought.) --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: Increases in build system time
Mateusz Guzik wrote: > > http://www.gson.org/netbsd/bugs/system-time/fg.svg > > First thing which jumps at me is DIAGNOSTIC being on (seen with e.g., > _vstate_assert). Did your older kernels have it? If you just compiled > GENERIC from release branches it is presumably removed, so would be > nice to retest without it. All the versions tested were built from the CVS trunk, and all used the GENERIC kernel. The only thing from a release branch was the build target (8.1), which was the same in all test runs. > That said, can you rerun without DIGANOSTIC but with lockstat? I'd rather leave that to someone else, and to a separate thread. All the test results presented in this thread were produced with the same options so that they can be meaningfully compared, and running new tests with different options would only confuse things. -- Andreas Gustafsson, g...@gson.org
Re: Increases in build system time
Jaromír Doleček wrote: > I wonder also if we could try enabling vm.ubc_direct on the build machine? Using 2019.11.14.13.58.22 sources: with default settings: 4612.56 real 16896.10 user 9325.87 sys with vm.ubc_direct = 1: 4615.95 real 16819.96 user 9416.13 sys -- Andreas Gustafsson, g...@gson.org
Re: Increases in build system time
Please un-CC me from any threads.
Re: Increases in build system time
On 11/15/19, Andreas Gustafsson wrote: > Mateusz Guzik wrote: >> Can you get a kernel-side flamegraph? > > Done, using sources from 2019.11.14.13.58.22: > > http://www.gson.org/netbsd/bugs/system-time/fg.svg > Thanks. First thing which jumps at me is DIAGNOSTIC being on (seen with e.g., _vstate_assert). Did your older kernels have it? If you just compiled GENERIC from release branches it is presumably removed, so would be nice to retest without it. Then there is very minor stuff which in isolation wont make a difference but would be nice take care of: - pmap_page_copy uses memcpy, which performs a little bit extra work on top of just copying - the size is known at compilation time and both addresses are guaranteed to be aligned to 4096. Therefore it can just copy without trying to align. iow this should use a dedicated routine. - pmap_page_zero uses non-temporal stores, which are almost guaranteed to only add to cache misses later on - background page zeroing probably does not win anything and only adds to contention on uvm_fpageqlock. I don't know if I'm reading this right, but it seems the lock itself is only a spinlock to accomodate its use from the idle loop. Should the feature be eliminated on amd64, the lock can be converted to just a regular lock which would be faster single-threaded (no interrupt crappery) and multi-threaded (no need to read off IPL from the lock) Here I don't see what uvm_fault_internal is contending on, it's most likely aforementioned uvm_fpageqlock. A couple years back I wrote a patch to batch ops using the lock, can probably be reasonably easily forward-ported. That said, can you rerun without DIGANOSTIC but with lockstat? -- Mateusz Guzik
Re: Increases in build system time
Mateusz Guzik wrote: > Can you get a kernel-side flamegraph? Done, using sources from 2019.11.14.13.58.22: http://www.gson.org/netbsd/bugs/system-time/fg.svg -- Andreas Gustafsson, g...@gson.org
Re: Increases in build system time
Michael van Elst wrote: > g...@gson.org (Andreas Gustafsson) writes: > > >mitigations, which I guess is not really surprising. But the 12% net > >increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem > >to call for closer investigation. > > Is this also reflected in real time? Only partly. With the jemalloc change, the system time increased by 938 seconds, but the real time only increased by 170 seconds, and the user time decreased by 89 seconds: 2019.03.08.20.34.24/build_8.log.gz: 4229.21 real 16686.03 user 7932.08 sys 2019.03.08.20.35.10/build_8.log.gz: 4398.88 real 16597.25 user 8870.49 sys With the vfs_vnode.c change, the system time increased by 305 seconds, but the real time only increased by 35 seconds: 2016.12.14.15.48.55/build_8.log.gz: 3934.44 real 15707.68 user 4243.02 sys 2016.12.14.15.49.35/build_8.log.gz: 3969.58 real 15718.85 user 4548.50 sys -- Andreas Gustafsson, g...@gson.org
Re: Increases in build system time
This is awesome - I know it took a lot of work (your section on bisection briefly alluded to that), so thanks for doing all of this - very, very useful On Thu, 14 Nov 2019 at 11:27, Andreas Gustafsson wrote: > > Hi all, > > Back in September, I wrote: > > I'm trying to run a bisection to determine why builds hosted on recent > > versions of NetBSD seem to be taking significantly more system time > > than they used to, building the same thing. > > I finally have some results to report. These are from builds of the > NetBSD-8/amd64 release hosted on various versions of -current/amd64, > on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all). The > amount of system time taken by each build was measured using time(1). > > Between a -current from September 2016 and one from October 2019, the > system time more than doubled, from 4245 seconds to 9344 seconds. > The time(1) output from the oldest and newest version was: > > 3930.86 real 15737.04 user 4245.26 sys > 4461.47 real 16687.37 user 9344.68 sys > > This means that on the recent -current, on average, roughly four of > the eight cores were executing the build tools (compilers, etc), > roughly two were executing the kernel, and the remaining two were > presumably idle. > > The increase did not happen all at once but in several smaller steps > as shown in this graph: > > http://www.gson.org/netbsd/bugs/system-time/graph.png > > For each step, finding the commits that caused it required a separate > bisection. Each bisection took 1-2 days to run, so I have only > bisected the largest steps, those of 5 percent or more. They are > listed below in order from largest to smallest, with CVS revisions > and commit messages. > > 38% increase: > > 2018.04.04.12.59.49 maxv src/sys/arch/amd64/amd64/machdep.c 1.303 > 2018.04.04.12.59.49 maxv src/sys/arch/x86/include/cpu.h 1.91 > 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/cpu.c 1.154 > 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/spectre.c 1.8 > > Enable the SpectreV2 mitigation by default at boot time. > > 12% increase: > > 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 > > Back to using jemalloc for x86_64; all problems have been resolved. > > 9% increase: > > 2018.02.26.05.52.50 maxv src/sys/arch/amd64/conf/GENERIC 1.485 > > Enable SVS by default. > > 7% increase: > > 2016.12.14.15.49.35 hannken src/sys/kern/vfs_vnode.c 1.63 > > Change the freelists to lrulists, all vnodes are always on one > of the lists. Speeds up namei on cached vnodes by ~3 percent. > > Merge "vrele_thread" into "vdrain_thread" so we have one thread > working on the lrulists. Adapt vfs_drainvnodes() to always wait > for a complete cycle of vdrain_thread(). > > 5% increase: > > 2018.04.07.22.39.31 christos src/external/Makefile 1.21 > 2018.04.07.22.39.31 christos src/external/README 1.16 > [302 more revisions by christos elided] > 2018.04.07.22.39.53 christos src/external/bsd/Makefile 1.59 > 2018.04.07.22.41.55 christos src/doc/3RDPARTY 1.1515 > 2018.04.07.22.41.55 christos src/doc/CHANGES 1.2376 > 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/ALL 1.85 > 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/GENERIC 1.489 > 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/ALL 1.437 > 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/GENERIC 1.1177 > 2018.04.08.01.30.01 christos src/external/mpl/Makefile 1.1 > > [Too many commit messages to list here, but the following from > mrg's commit of src/sys/arch/amd64/conf/GENERIC 1.489 may > be relevant] > > turn on GCC spectre v2 mitigation options. > > 5% increase: > > 2019.03.10.15.32.42 christos > src/external/bsd/jemalloc/lib/Makefile.inc 1.5 > > turn on debugging to help find problems > > 5% decrease: > > 2019.07.23.06.31.20 martin src/external/bsd/jemalloc/lib/Makefile.inc > 1.10 > > Disable JEMALLOC_DEBUG, it served us well, but now we want performance > back. Discussed with christos. > > To summarize, most of the increase was due to Spectre and Meltdown > mitigations, which I guess is not really surprising. But the 12% net > increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem > to call for closer investigation. > -- > Andreas Gustafsson, g...@gson.org > >
Re: Increases in build system time
Le jeu. 14 nov. 2019 à 21:41, Christos Zoulas a écrit : > In article <24013.43646.552099.15...@guava.gson.org>, > Andreas Gustafsson wrote: > > > >Hi all, > > > >Back in September, I wrote: > > > 12% increase: > > > >2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 > > > >Back to using jemalloc for x86_64; all problems have been resolved. > > Indeed I would expect the new jemalloc to do the same or better not > so much worse. Perhaps it has to do with TLS? Or some poor tuning/default? > I will look into it. > I wonder also if we could try enabling vm.ubc_direct on the build machine? Jaromir
Re: Increases in build system time
On 11/14/19, Andreas Gustafsson wrote: > > Hi all, > > Back in September, I wrote: >> I'm trying to run a bisection to determine why builds hosted on recent >> versions of NetBSD seem to be taking significantly more system time >> than they used to, building the same thing. > > I finally have some results to report. These are from builds of the > NetBSD-8/amd64 release hosted on various versions of -current/amd64, > on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all). The > amount of system time taken by each build was measured using time(1). > > Between a -current from September 2016 and one from October 2019, the > system time more than doubled, from 4245 seconds to 9344 seconds. > The time(1) output from the oldest and newest version was: > > 3930.86 real 15737.04 user 4245.26 sys > 4461.47 real 16687.37 user 9344.68 sys > Can you get a kernel-side flamegraph? # dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); }' -o out.kern_stacks -c "your_build_command" or so. cat out.kern_stacks | perl stackcollapse.pl | perl flamegraph.pl > fg.svg See https://github.com/brendangregg/FlameGraph.git I know it used to work fine, but tried it few months back in a vm and profile- probes were not there. I don't know if it was a local wart. -- Mateusz Guzik
Re: Increases in build system time
g...@gson.org (Andreas Gustafsson) writes: >mitigations, which I guess is not really surprising. But the 12% net >increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem >to call for closer investigation. Is this also reflected in real time? -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Increases in build system time
In article <20191114200928.ga2...@homeworld.netbsd.org>, wrote: >On Thu, Nov 14, 2019 at 09:26:54PM +0200, Andreas Gustafsson wrote: >> 12% increase: >> >> 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 >> >> Back to using jemalloc for x86_64; all problems have been resolved. > >I wonder if enabling back MAP_ALIGNED in jemalloc can help. We already have that (see pages.c), but we don't have the os_overcommits = true; christos
Re: Increases in build system time
In article <24013.43646.552099.15...@guava.gson.org>, Andreas Gustafsson wrote: > >Hi all, > >Back in September, I wrote: > 12% increase: > >2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 > >Back to using jemalloc for x86_64; all problems have been resolved. Indeed I would expect the new jemalloc to do the same or better not so much worse. Perhaps it has to do with TLS? Or some poor tuning/default? I will look into it. christos
Re: Increases in build system time
On Thu, Nov 14, 2019 at 09:26:54PM +0200, Andreas Gustafsson wrote: > 12% increase: > > 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 > > Back to using jemalloc for x86_64; all problems have been resolved. I wonder if enabling back MAP_ALIGNED in jemalloc can help. http://mail-index.netbsd.org/tech-userlevel/2019/07/11/msg011980.html
Increases in build system time
Hi all, Back in September, I wrote: > I'm trying to run a bisection to determine why builds hosted on recent > versions of NetBSD seem to be taking significantly more system time > than they used to, building the same thing. I finally have some results to report. These are from builds of the NetBSD-8/amd64 release hosted on various versions of -current/amd64, on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all). The amount of system time taken by each build was measured using time(1). Between a -current from September 2016 and one from October 2019, the system time more than doubled, from 4245 seconds to 9344 seconds. The time(1) output from the oldest and newest version was: 3930.86 real 15737.04 user 4245.26 sys 4461.47 real 16687.37 user 9344.68 sys This means that on the recent -current, on average, roughly four of the eight cores were executing the build tools (compilers, etc), roughly two were executing the kernel, and the remaining two were presumably idle. The increase did not happen all at once but in several smaller steps as shown in this graph: http://www.gson.org/netbsd/bugs/system-time/graph.png For each step, finding the commits that caused it required a separate bisection. Each bisection took 1-2 days to run, so I have only bisected the largest steps, those of 5 percent or more. They are listed below in order from largest to smallest, with CVS revisions and commit messages. 38% increase: 2018.04.04.12.59.49 maxv src/sys/arch/amd64/amd64/machdep.c 1.303 2018.04.04.12.59.49 maxv src/sys/arch/x86/include/cpu.h 1.91 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/cpu.c 1.154 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/spectre.c 1.8 Enable the SpectreV2 mitigation by default at boot time. 12% increase: 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108 Back to using jemalloc for x86_64; all problems have been resolved. 9% increase: 2018.02.26.05.52.50 maxv src/sys/arch/amd64/conf/GENERIC 1.485 Enable SVS by default. 7% increase: 2016.12.14.15.49.35 hannken src/sys/kern/vfs_vnode.c 1.63 Change the freelists to lrulists, all vnodes are always on one of the lists. Speeds up namei on cached vnodes by ~3 percent. Merge "vrele_thread" into "vdrain_thread" so we have one thread working on the lrulists. Adapt vfs_drainvnodes() to always wait for a complete cycle of vdrain_thread(). 5% increase: 2018.04.07.22.39.31 christos src/external/Makefile 1.21 2018.04.07.22.39.31 christos src/external/README 1.16 [302 more revisions by christos elided] 2018.04.07.22.39.53 christos src/external/bsd/Makefile 1.59 2018.04.07.22.41.55 christos src/doc/3RDPARTY 1.1515 2018.04.07.22.41.55 christos src/doc/CHANGES 1.2376 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/ALL 1.85 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/GENERIC 1.489 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/ALL 1.437 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/GENERIC 1.1177 2018.04.08.01.30.01 christos src/external/mpl/Makefile 1.1 [Too many commit messages to list here, but the following from mrg's commit of src/sys/arch/amd64/conf/GENERIC 1.489 may be relevant] turn on GCC spectre v2 mitigation options. 5% increase: 2019.03.10.15.32.42 christos src/external/bsd/jemalloc/lib/Makefile.inc 1.5 turn on debugging to help find problems 5% decrease: 2019.07.23.06.31.20 martin src/external/bsd/jemalloc/lib/Makefile.inc 1.10 Disable JEMALLOC_DEBUG, it served us well, but now we want performance back. Discussed with christos. To summarize, most of the increase was due to Spectre and Meltdown mitigations, which I guess is not really surprising. But the 12% net increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem to call for closer investigation. -- Andreas Gustafsson, g...@gson.org