Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Steffen Nurpmeso wrote:
> This thread reminds me of me turning off hyperthreading.
> Using the four cores i have with HT turned on results in a 40
> percent time penalty compared to when its off.  (For example,
> compiling the Linux kernel 4.19.X takes almost exactly 10 minutes
> when it is turned off, and about 14 minutes when it is turned
> on.  Just a thought.)

FWIW, these tests were run with hyperthreading disabled.
-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Steffen Nurpmeso
Andreas Gustafsson wrote in <24014.56871.750141.885...@guava.gson.org>:
 |Jaromír Doleček wrote:
 |> I wonder also if we could try enabling vm.ubc_direct on the build \
 |> machine?
 |
 |Using 2019.11.14.13.58.22 sources:
 |
 |with default settings:
 |4612.56 real 16896.10 user  9325.87 sys
 |
 |with vm.ubc_direct = 1:
 |4615.95 real 16819.96 user  9416.13 sys

This thread reminds me of me turning off hyperthreading.
Using the four cores i have with HT turned on results in a 40
percent time penalty compared to when its off.  (For example,
compiling the Linux kernel 4.19.X takes almost exactly 10 minutes
when it is turned off, and about 14 minutes when it is turned
on.  Just a thought.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Mateusz Guzik wrote:
> >   http://www.gson.org/netbsd/bugs/system-time/fg.svg
> 
> First thing which jumps at me is DIAGNOSTIC being on (seen with e.g.,
> _vstate_assert). Did your older kernels have it? If you just compiled
> GENERIC from release branches it is presumably removed, so would be
> nice to retest without it.

All the versions tested were built from the CVS trunk, and all used
the GENERIC kernel.  The only thing from a release branch was the
build target (8.1), which was the same in all test runs.

> That said, can you rerun without DIGANOSTIC but with lockstat?

I'd rather leave that to someone else, and to a separate thread.  All
the test results presented in this thread were produced with the same
options so that they can be meaningfully compared, and running new
tests with different options would only confuse things.
-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Jaromír Doleček wrote:
> I wonder also if we could try enabling vm.ubc_direct on the build machine?

Using 2019.11.14.13.58.22 sources:

with default settings:
4612.56 real 16896.10 user  9325.87 sys

with vm.ubc_direct = 1:
4615.95 real 16819.96 user  9416.13 sys

-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-15 Thread maya
Please un-CC me from any threads.


Re: Increases in build system time

2019-11-15 Thread Mateusz Guzik
On 11/15/19, Andreas Gustafsson  wrote:
> Mateusz Guzik wrote:
>> Can you get a kernel-side flamegraph?
>
> Done, using sources from 2019.11.14.13.58.22:
>
>   http://www.gson.org/netbsd/bugs/system-time/fg.svg
>

Thanks.

First thing which jumps at me is DIAGNOSTIC being on (seen with e.g.,
_vstate_assert). Did your older kernels have it? If you just compiled
GENERIC from release branches it is presumably removed, so would be
nice to retest without it.

Then there is very minor stuff which in isolation wont make a difference
but would be nice take care of:
- pmap_page_copy uses memcpy, which performs a little bit extra work on
top of just copying - the size is known at compilation time and both
addresses are guaranteed to be aligned to 4096. Therefore it can just
copy without trying to align. iow this should use a dedicated routine.
- pmap_page_zero uses non-temporal stores, which are almost guaranteed
to only add to cache misses later on
- background page zeroing probably does not win anything and only adds
to contention on uvm_fpageqlock. I don't know if I'm reading this right,
but it seems the lock itself is only a spinlock to accomodate its use
from the idle loop. Should the feature be eliminated on amd64, the lock
can be converted to just a regular lock which would be faster single-threaded
(no interrupt crappery) and multi-threaded (no need to read off IPL from
the lock)

Here I don't see what uvm_fault_internal is contending on, it's most
likely aforementioned uvm_fpageqlock. A couple years back I wrote a
patch to batch ops using the lock, can probably be reasonably easily
forward-ported.

That said, can you rerun without DIGANOSTIC but with lockstat?

-- 
Mateusz Guzik 


Re: Increases in build system time

2019-11-15 Thread Andreas Gustafsson
Mateusz Guzik wrote:
> Can you get a kernel-side flamegraph?

Done, using sources from 2019.11.14.13.58.22:

  http://www.gson.org/netbsd/bugs/system-time/fg.svg

-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-14 Thread Andreas Gustafsson
Michael van Elst wrote:
> g...@gson.org (Andreas Gustafsson) writes:
> 
> >mitigations, which I guess is not really surprising.  But the 12% net
> >increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
> >to call for closer investigation.
> 
> Is this also reflected in real time?

Only partly.  With the jemalloc change, the system time increased by
938 seconds, but the real time only increased by 170 seconds, and the
user time decreased by 89 seconds:

  2019.03.08.20.34.24/build_8.log.gz: 4229.21 real 16686.03 user  
7932.08 sys
  2019.03.08.20.35.10/build_8.log.gz: 4398.88 real 16597.25 user  
8870.49 sys

With the vfs_vnode.c change, the system time increased by 305 seconds,
but the real time only increased by 35 seconds:

  2016.12.14.15.48.55/build_8.log.gz: 3934.44 real 15707.68 user  
4243.02 sys
  2016.12.14.15.49.35/build_8.log.gz: 3969.58 real 15718.85 user  
4548.50 sys

-- 
Andreas Gustafsson, g...@gson.org


Re: Increases in build system time

2019-11-14 Thread Alistair Crooks
This is awesome - I know it took a lot of work (your section on bisection
briefly alluded to that), so thanks for doing all of this - very, very
useful

On Thu, 14 Nov 2019 at 11:27, Andreas Gustafsson  wrote:

>
> Hi all,
>
> Back in September, I wrote:
> > I'm trying to run a bisection to determine why builds hosted on recent
> > versions of NetBSD seem to be taking significantly more system time
> > than they used to, building the same thing.
>
> I finally have some results to report.  These are from builds of the
> NetBSD-8/amd64 release hosted on various versions of -current/amd64,
> on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all).  The
> amount of system time taken by each build was measured using time(1).
>
> Between a -current from September 2016 and one from October 2019, the
> system time more than doubled, from 4245 seconds to 9344 seconds.
> The time(1) output from the oldest and newest version was:
>
> 3930.86 real 15737.04 user  4245.26 sys
> 4461.47 real 16687.37 user  9344.68 sys
>
> This means that on the recent -current, on average, roughly four of
> the eight cores were executing the build tools (compilers, etc),
> roughly two were executing the kernel, and the remaining two were
> presumably idle.
>
> The increase did not happen all at once but in several smaller steps
> as shown in this graph:
>
>   http://www.gson.org/netbsd/bugs/system-time/graph.png
>
> For each step, finding the commits that caused it required a separate
> bisection.  Each bisection took 1-2 days to run, so I have only
> bisected the largest steps, those of 5 percent or more.  They are
> listed below in order from largest to smallest, with CVS revisions
> and commit messages.
>
>   38% increase:
>
> 2018.04.04.12.59.49 maxv src/sys/arch/amd64/amd64/machdep.c 1.303
> 2018.04.04.12.59.49 maxv src/sys/arch/x86/include/cpu.h 1.91
> 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/cpu.c 1.154
> 2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/spectre.c 1.8
>
> Enable the SpectreV2 mitigation by default at boot time.
>
>   12% increase:
>
> 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108
>
> Back to using jemalloc for x86_64; all problems have been resolved.
>
>   9% increase:
>
> 2018.02.26.05.52.50 maxv src/sys/arch/amd64/conf/GENERIC 1.485
>
> Enable SVS by default.
>
>   7% increase:
>
> 2016.12.14.15.49.35 hannken src/sys/kern/vfs_vnode.c 1.63
>
> Change the freelists to lrulists, all vnodes are always on one
> of the lists.  Speeds up namei on cached vnodes by ~3 percent.
>
> Merge "vrele_thread" into "vdrain_thread" so we have one thread
> working on the lrulists.  Adapt vfs_drainvnodes() to always wait
> for a complete cycle of vdrain_thread().
>
>   5% increase:
>
> 2018.04.07.22.39.31 christos src/external/Makefile 1.21
> 2018.04.07.22.39.31 christos src/external/README 1.16
> [302 more revisions by christos elided]
> 2018.04.07.22.39.53 christos src/external/bsd/Makefile 1.59
> 2018.04.07.22.41.55 christos src/doc/3RDPARTY 1.1515
> 2018.04.07.22.41.55 christos src/doc/CHANGES 1.2376
> 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/ALL 1.85
> 2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/GENERIC 1.489
> 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/ALL 1.437
> 2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/GENERIC 1.1177
> 2018.04.08.01.30.01 christos src/external/mpl/Makefile 1.1
>
> [Too many commit messages to list here, but the following from
> mrg's commit of src/sys/arch/amd64/conf/GENERIC 1.489 may
> be relevant]
>
> turn on GCC spectre v2 mitigation options.
>
>   5% increase:
>
> 2019.03.10.15.32.42 christos
> src/external/bsd/jemalloc/lib/Makefile.inc 1.5
>
> turn on debugging to help find problems
>
>   5% decrease:
>
> 2019.07.23.06.31.20 martin src/external/bsd/jemalloc/lib/Makefile.inc
> 1.10
>
> Disable JEMALLOC_DEBUG, it served us well, but now we want performance
> back. Discussed with christos.
>
> To summarize, most of the increase was due to Spectre and Meltdown
> mitigations, which I guess is not really surprising.  But the 12% net
> increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
> to call for closer investigation.
> --
> Andreas Gustafsson, g...@gson.org
>
>


Re: Increases in build system time

2019-11-14 Thread Jaromír Doleček
Le jeu. 14 nov. 2019 à 21:41, Christos Zoulas  a
écrit :

> In article <24013.43646.552099.15...@guava.gson.org>,
> Andreas Gustafsson   wrote:
> >
> >Hi all,
> >
> >Back in September, I wrote:
>
> >  12% increase:
> >
> >2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108
> >
> >Back to using jemalloc for x86_64; all problems have been resolved.
>
> Indeed I would expect the new jemalloc to do the same or better not
> so much worse. Perhaps it has to do with TLS? Or some poor tuning/default?
> I will look into it.
>

I wonder also if we could try enabling vm.ubc_direct on the build machine?

Jaromir


Re: Increases in build system time

2019-11-14 Thread Mateusz Guzik
On 11/14/19, Andreas Gustafsson  wrote:
>
> Hi all,
>
> Back in September, I wrote:
>> I'm trying to run a bisection to determine why builds hosted on recent
>> versions of NetBSD seem to be taking significantly more system time
>> than they used to, building the same thing.
>
> I finally have some results to report.  These are from builds of the
> NetBSD-8/amd64 release hosted on various versions of -current/amd64,
> on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all).  The
> amount of system time taken by each build was measured using time(1).
>
> Between a -current from September 2016 and one from October 2019, the
> system time more than doubled, from 4245 seconds to 9344 seconds.
> The time(1) output from the oldest and newest version was:
>
> 3930.86 real 15737.04 user  4245.26 sys
> 4461.47 real 16687.37 user  9344.68 sys
>

Can you get a kernel-side flamegraph?

# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] =
count(); }' -o out.kern_stacks -c "your_build_command" or so.

cat out.kern_stacks | perl stackcollapse.pl | perl flamegraph.pl > fg.svg

See https://github.com/brendangregg/FlameGraph.git

I know it used to work fine, but tried it few months back in a vm and
profile- probes were not there. I don't know if it was a local wart.

-- 
Mateusz Guzik 


Re: Increases in build system time

2019-11-14 Thread Michael van Elst
g...@gson.org (Andreas Gustafsson) writes:

>mitigations, which I guess is not really surprising.  But the 12% net
>increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
>to call for closer investigation.

Is this also reflected in real time?


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Increases in build system time

2019-11-14 Thread Christos Zoulas
In article <20191114200928.ga2...@homeworld.netbsd.org>,
  wrote:
>On Thu, Nov 14, 2019 at 09:26:54PM +0200, Andreas Gustafsson wrote:
>>   12% increase:
>> 
>> 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108
>> 
>> Back to using jemalloc for x86_64; all problems have been resolved.
>
>I wonder if enabling back MAP_ALIGNED in jemalloc can help.

We already have that (see pages.c),
but we don't have the os_overcommits = true;

christos



Re: Increases in build system time

2019-11-14 Thread Christos Zoulas
In article <24013.43646.552099.15...@guava.gson.org>,
Andreas Gustafsson   wrote:
>
>Hi all,
>
>Back in September, I wrote:

>  12% increase:
>
>2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108
>
>Back to using jemalloc for x86_64; all problems have been resolved.

Indeed I would expect the new jemalloc to do the same or better not
so much worse. Perhaps it has to do with TLS? Or some poor tuning/default?
I will look into it.

christos



Re: Increases in build system time

2019-11-14 Thread maya
On Thu, Nov 14, 2019 at 09:26:54PM +0200, Andreas Gustafsson wrote:
>   12% increase:
> 
> 2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108
> 
> Back to using jemalloc for x86_64; all problems have been resolved.

I wonder if enabling back MAP_ALIGNED in jemalloc can help.
http://mail-index.netbsd.org/tech-userlevel/2019/07/11/msg011980.html


Increases in build system time

2019-11-14 Thread Andreas Gustafsson


Hi all,

Back in September, I wrote:
> I'm trying to run a bisection to determine why builds hosted on recent
> versions of NetBSD seem to be taking significantly more system time
> than they used to, building the same thing.

I finally have some results to report.  These are from builds of the
NetBSD-8/amd64 release hosted on various versions of -current/amd64,
on a HP DL360 G7 with dual Xeon L5630 CPUs (8 cores in all).  The
amount of system time taken by each build was measured using time(1).

Between a -current from September 2016 and one from October 2019, the
system time more than doubled, from 4245 seconds to 9344 seconds.
The time(1) output from the oldest and newest version was:

3930.86 real 15737.04 user  4245.26 sys
4461.47 real 16687.37 user  9344.68 sys

This means that on the recent -current, on average, roughly four of
the eight cores were executing the build tools (compilers, etc),
roughly two were executing the kernel, and the remaining two were
presumably idle.

The increase did not happen all at once but in several smaller steps
as shown in this graph:

  http://www.gson.org/netbsd/bugs/system-time/graph.png

For each step, finding the commits that caused it required a separate
bisection.  Each bisection took 1-2 days to run, so I have only
bisected the largest steps, those of 5 percent or more.  They are
listed below in order from largest to smallest, with CVS revisions
and commit messages.

  38% increase:

2018.04.04.12.59.49 maxv src/sys/arch/amd64/amd64/machdep.c 1.303
2018.04.04.12.59.49 maxv src/sys/arch/x86/include/cpu.h 1.91
2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/cpu.c 1.154
2018.04.04.12.59.49 maxv src/sys/arch/x86/x86/spectre.c 1.8

Enable the SpectreV2 mitigation by default at boot time.

  12% increase:

2019.03.08.20.35.10 christos src/share/mk/bsd.own.mk 1.1108

Back to using jemalloc for x86_64; all problems have been resolved.

  9% increase:

2018.02.26.05.52.50 maxv src/sys/arch/amd64/conf/GENERIC 1.485

Enable SVS by default.

  7% increase:

2016.12.14.15.49.35 hannken src/sys/kern/vfs_vnode.c 1.63

Change the freelists to lrulists, all vnodes are always on one
of the lists.  Speeds up namei on cached vnodes by ~3 percent.

Merge "vrele_thread" into "vdrain_thread" so we have one thread
working on the lrulists.  Adapt vfs_drainvnodes() to always wait
for a complete cycle of vdrain_thread().

  5% increase:

2018.04.07.22.39.31 christos src/external/Makefile 1.21
2018.04.07.22.39.31 christos src/external/README 1.16
[302 more revisions by christos elided]
2018.04.07.22.39.53 christos src/external/bsd/Makefile 1.59
2018.04.07.22.41.55 christos src/doc/3RDPARTY 1.1515
2018.04.07.22.41.55 christos src/doc/CHANGES 1.2376
2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/ALL 1.85
2018.04.08.00.52.38 mrg src/sys/arch/amd64/conf/GENERIC 1.489
2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/ALL 1.437
2018.04.08.00.52.38 mrg src/sys/arch/i386/conf/GENERIC 1.1177
2018.04.08.01.30.01 christos src/external/mpl/Makefile 1.1

[Too many commit messages to list here, but the following from
mrg's commit of src/sys/arch/amd64/conf/GENERIC 1.489 may
be relevant]

turn on GCC spectre v2 mitigation options.

  5% increase:

2019.03.10.15.32.42 christos src/external/bsd/jemalloc/lib/Makefile.inc 1.5

turn on debugging to help find problems

  5% decrease:

2019.07.23.06.31.20 martin src/external/bsd/jemalloc/lib/Makefile.inc 1.10

Disable JEMALLOC_DEBUG, it served us well, but now we want performance
back. Discussed with christos.

To summarize, most of the increase was due to Spectre and Meltdown
mitigations, which I guess is not really surprising.  But the 12% net
increase from jemalloc and the 7% increase from vfs_vnode.c 1.63 seem
to call for closer investigation.
-- 
Andreas Gustafsson, g...@gson.org