re: 8.0 performance issue when running build.sh?
> > >> 100.002054 14.18 kernel_lock > > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 also note that this is 14.18ms of locked time across 26 minutes. this isn't a contention point. .mrg.
Re: 8.0 performance issue when running build.sh?
> On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > it still used only by veriexec? We can easily option that out of the build > box kernels. Indeed. And there are better ways to do what veriexec does, in any case. -- thorpej
Re: 8.0 performance issue when running build.sh?
I would be interested to finish that off, I need to make some time to get to doing it though. I have been sitting on some changes to veriexec for ~ years that change it from locking everything to using reference counts and condition variables which removes some nasty hacks I did. I have not committed the changes because the kernel would sometimes deadlock and I was having trouble tracking down why. Perhaps I was looking in the wrong spot for the error and it was fileassoc all along that was causing the deadlock. - Original Message - From: "Mindaugas Rasiukevicius" To:"Jason Thorpe" Cc: Sent:Fri, 10 Aug 2018 00:12:23 +0100 Subject:Re: 8.0 performance issue when running build.sh? Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
t...@panix.com (Thor Lancelot Simon) writes: >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 >Actually, I wonder if we could kill off the time spent by fileassoc. Would be a marginal improvement only. The real scaling problems are in UVM. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: 8.0 performance issue when running build.sh?
Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote: > On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > > >> 100.002054 14.18 kernel_lock > > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > > it still used only by veriexec? We can easily option that out of the > > > build > > > box kernels. > > > > Or even better, make it less heavy? > > > > It's not really intuitive that you could improve filesystem > > performance by removing this obscure component. > > If it is not in use, fileassoc_file_delete will short cut already. ...and of course, the check seems to be just useless. So yes, it should be possible to make it much less heavy. Joerg
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > >> 100.002054 14.18 kernel_lock > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the build > > box kernels. > > Or even better, make it less heavy? > > It's not really intuitive that you could improve filesystem > performance by removing this obscure component. If it is not in use, fileassoc_file_delete will short cut already. Joerg
Re: 8.0 performance issue when running build.sh?
2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: >> 100.002054 14.18 kernel_lock >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > Actually, I wonder if we could kill off the time spent by fileassoc. Is > it still used only by veriexec? We can easily option that out of the build > box kernels. Or even better, make it less heavy? It's not really intuitive that you could improve filesystem performance by removing this obscure component. Jaromir
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > With the patch applied: > > Elapsed time: 1564.93 seconds. > > -- Kernel lock spin > > Total% Count Time/ms Lock Caller > -- --- - -- -- > 100.002054 14.18 kernel_lock > 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > 5.29 662 0.75 kernel_lockVOP_POLL+93 > 5.29 95 0.75 kernel_lockbiodone2+81 > 0.91 15 0.13 kernel_locksleepq_block+1c5 > 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a > 0.29 9 0.04 kernel_lockip_slowtimo+1a > 0.27 2 0.04 kernel_lockVFS_SYNC+65 > 0.07 2 0.01 kernel_lockcallout_softclock+42c > 0.06 3 0.01 kernel_locknd6_timer_work+49 > 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f > 0.01 4 0.00 kernel_lockkevent1+698 > > so .. no need to worry about kernel_lock for this load any more. Actually, I wonder if we could kill off the time spent by fileassoc. Is it still used only by veriexec? We can easily option that out of the build box kernels. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: 8.0 performance issue when running build.sh?
With the patch applied: Elapsed time: 1564.93 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.002054 14.18 kernel_lock 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 5.29 662 0.75 kernel_lockVOP_POLL+93 5.29 95 0.75 kernel_lockbiodone2+81 0.91 15 0.13 kernel_locksleepq_block+1c5 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a 0.29 9 0.04 kernel_lockip_slowtimo+1a 0.27 2 0.04 kernel_lockVFS_SYNC+65 0.07 2 0.01 kernel_lockcallout_softclock+42c 0.06 3 0.01 kernel_locknd6_timer_work+49 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f 0.01 4 0.00 kernel_lockkevent1+698 so .. no need to worry about kernel_lock for this load any more. Mindaugas, can you please commit your patch and request pullup? Martin
Re: 8.0 performance issue when running build.sh?
> On 6. Aug 2018, at 23:18, Mindaugas Rasiukevicius wrote: > > Martin Husemann wrote: >> So here is a more detailed analyzis using flamegraphs: >> >> https://netbsd.org/~martin/bc-perf/ >> >> >> All operations happen on tmpfs, and the locking there is a significant >> part of the game - however, lots of mostly idle time is wasted and it is >> not clear to me what is going on. > > Just from a very quick look, it seems like a regression introduced with > the vcache changes: the MP-safe flag is set too late and not inherited > by the root vnode. > > http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff Very good catch, @martin could you try this diff on an autobuilder? Looks like it speeds up tmpfs lookups by a factor of ~40 on -8.0. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: 8.0 performance issue when running build.sh?
> On Aug 6, 2018, at 12:17 PM, Rhialto wrote: > > 21.96 24626 82447.93 kernel_lockip_slowtimo+1a The work that's happening here looks like a scalability nightmare., regardless of holding the kernel lock or not. A couple of things: 1- It should do a better job of determining if there's any work that it actually has to do. As it is, it's going to lock the kernel_lock and traverse all of the buckets in a hash table even if there aren't any entries in it. Even in the NET_MPSAFE scenario, that's pointless work. 2- The kernel_lock, by its nature, is the lock-of-last-resort, and code needs to be doing everything it can to defer taking that lock until it is deemed absolutely necessary. Even in the not-NET_MPSAFE scenario, there should be a way to determine that IP fragment expiration processing needs to actually occur before taking the kernel_lock. -- thorpej
Re: 8.0 performance issue when running build.sh?
rhia...@falu.nl (Rhialto) writes: >=46rom the caller list, I get the impression that the statistics include >(potentially) all system activity, not just work being done by/on behalf >of the specified program. Otherwise I can't explain the network related >names. All distfiles should have been there already (it was a rebuild of >existing packages but for 8.0). Yes, lockstat data includes all system activity. > 29.41 59522 110425.74 kernel_lockcdev_poll+72 > 21.96 24626 82447.93 kernel_lockip_slowtimo+1a > 20.04 23020 75230.65 kernel_lockfrag6_slowtimo+1f > 10.09 26272 37891.06 kernel_lockfrag6_fasttimo+1a Network isn't MPSAFE yet, so it is naturally that it generates contention on a busy system. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: 8.0 performance issue when running build.sh?
Martin Husemann wrote: > So here is a more detailed analyzis using flamegraphs: > > https://netbsd.org/~martin/bc-perf/ > > > All operations happen on tmpfs, and the locking there is a significant > part of the game - however, lots of mostly idle time is wasted and it is > not clear to me what is going on. Just from a very quick look, it seems like a regression introduced with the vcache changes: the MP-safe flag is set too late and not inherited by the root vnode. http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff -- Mindaugas
Re: 8.0 performance issue when running build.sh?
I happened to be rebuilding some packages on amd64/8.0 while I was reading your article, so I thought I might as well measure something myself. My results are not directly comparable to yours. I build from local rotating disk, not a ram disk. But I have 32 GB of RAM so there should be a lot of caching. From the caller list, I get the impression that the statistics include (potentially) all system activity, not just work being done by/on behalf of the specified program. Otherwise I can't explain the network related names. All distfiles should have been there already (it was a rebuild of existing packages but for 8.0). NetBSD murthe.falu.nl 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64 pkg_comp:default.conf# lockstat -L kernel_lock pkg_chk -ak ... Elapsed time: 47954.90 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 618791 375442.52 kernel_lock 29.41 59522 110425.74 kernel_lockcdev_poll+72 21.96 24626 82447.93 kernel_lockip_slowtimo+1a 20.04 23020 75230.65 kernel_lockfrag6_slowtimo+1f 10.09 26272 37891.06 kernel_lockfrag6_fasttimo+1a 7.50 58970 28150.68 kernel_lockVOP_POLL+93 3.16 11940 11860.16 kernel_locksleepq_block+1b5 2.99 56296 11207.05 kernel_lockVOP_LOCK+b2 1.69 104084 6338.54 kernel_lockbdev_strategy+7f 0.883939 3311.60 kernel_lockfileassoc_file_delete+20 0.725289 2700.97 kernel_lockVOP_IOCTL+65 0.45 703 1680.33 kernel_locknd6_timer_work+49 0.403102 1500.71 kernel_lockVOP_LOOKUP+5e 0.26 17808967.32 kernel_lockVOP_UNLOCK+70 0.146354514.59 kernel_lockVOP_WRITE+61 0.06 115236.77 kernel_locktcp_timer_keep+48 0.047082168.23 kernel_locktcp_rcvd_wrapper+18 0.04 85504144.38 kernel_lockcallout_softclock+2b0 0.034775103.15 kernel_lockkevent1+692 0.023725 79.60 kernel_lockkqueue_register+419 0.01 41908 56.07 kernel_lockintr_biglock_wrapper+16 0.01 56054 50.86 kernel_lockbiodone2+6d 0.01 94 43.05 kernel_lockcdev_read+72 0.01 23 40.65 kernel_lockudp_send_wrapper+2b 0.016199 38.81 kernel_lockVOP_READ+61 0.01 20 32.44 kernel_lockcdev_ioctl+88 0.01 178 30.20 kernel_lockVOP_GETATTR+5e 0.011050 27.36 kernel_lockknote_detach+134 0.01 60 25.10 kernel_lockVFS_SYNC+65 0.01 43 21.64 kernel_lockVOP_READDIR+6d 0.01 635 19.13 kernel_locktcp_send_wrapper+22 0.00 101 16.98 kernel_lockcdev_open+be 0.00 2 13.49 kernel_lockip_drain+e 0.00 23 11.60 kernel_lockudp6_send_wrapper+2b 0.006764 9.57 kernel_locksoftint_dispatch+dc 0.00 9 9.09 kernel_locktcp_timer_rexmt+52 0.00 3 5.78 kernel_locktcp_attach_wrapper+18 0.00 2 5.37 kernel_lockmld_timeo+16 0.00 21 5.10 kernel_locktcp_detach_wrapper+16 0.00 28 5.07 kernel_lockVOP_FCNTL+64 0.00 1 5.03 kernel_locktcp_drain+1d 0.00 31 3.28 kernel_lockbdev_ioctl+88 0.00 484 2.70 kernel_lockdoifioctl+8f3 0.00 107 1.22 kernel_lockVOP_ACCESS+5d 0.00 809 0.91 kernel_lockip6intr+1f 0.00 1 0.65 kernel_lockrip6_send_wrapper+34 0.00 101 0.47 kernel_lockudp6_detach_wrapper+14 0.00 62 0.37 kernel_lockudp_attach_wrapper+16 0.00 55 0.31 kernel_lockudp_detach_wrapper+16 0.00 261 0.18 kernel_lockkqueue_register+25b 0.00 37 0.13 kernel_lockudp6_attach_wrapper+1a 0.00 144 0.13 kernel_lockipintr+37 0.00 216 0.11 kernel_lockVOP_SEEK+9c 0.00 32 0.08 kernel_lockudp6_ctloutput_wrapper+1f 0.00 12 0.03 kernel_lockudp_ctloutput_wrapper+1f 0.00 5 0.02 kernel_locktcp_ctloutput_wrapper+1f 0.00 33 0.02 kernel_lockusb_transfer_complete+122 0.00 6 0.02 kernel_locktcp_peeraddr_wrapper+1b 0.00 10 0.02 kernel_locktcp_disconnect_wrapper+18 0.00 5 0.02 kernel_lockudp6_bind_wrapper+1e 0.00 10 0.01 kernel_lock
Re: 8.0 performance issue when running build.sh?
So here is a more detailed analyzis using flamegraphs: https://netbsd.org/~martin/bc-perf/ All operations happen on tmpfs, and the locking there is a significant part of the game - however, lots of mostly idle time is wasted and it is not clear to me what is going on. Initial tests hint at -current having the same issue. Any ideas? Martin
Re: 8.0 performance issue when running build.sh?
So after more investigation here is an update: The effect depends highly on "too high" -j args to build.sh. As a stopgap hack I reduced them on the build cluster and now the netbsd-7 builds are slow, but not super slow (and all other builds are slightly slower). I tested building of netbsd-7 on a netbsd-8 build cluster slave a lot and the ideal -j value is 11 or 12 (only!) for a single build. Builds for other branches apparently do better with higher -j values, but I have not done extensive testing there (yet). With a bit of help (tx everyone involved!) I got lockstat output, and the single *really* hot lock on 8.0 is kernel_lock, while everything else looks quite normal. So for kernel_lock we traced the callers, and got this: On a build cluster slave running 7.1_STABLE, building netbsd-7 with "build.sh -m alpha -j 24" and source, obj, dest, release and tooldirs all on tmpfs, tools already build, we get: Elapsed time: 1475.74 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 159874366.69 kernel_lock 41.76 65404153.12 kernel_lockVOP_POLL+43 19.29 36948 70.74 kernel_lockVOP_LOCK+43 13.25 20066 48.59 kernel_locktcp_send_wrapper+22 7.458042 27.30 kernel_lockintr_biglock_wrapper+12 4.946220 18.11 kernel_locksleepq_block+1c7 4.885654 17.91 kernel_lockfileassoc_file_delete+1c 3.497681 12.80 kernel_lockVOP_UNLOCK+40 2.592558 9.50 kernel_lockbdev_strategy+52 1.055626 3.85 kernel_lockVOP_WRITE+48 0.38 904 1.38 kernel_lockVOP_READ+48 0.29 254 1.05 kernel_lockbiodone2+6f 0.23 103 0.84 kernel_lockfrag6_fasttimo+1a 0.15 217 0.57 kernel_locktcp_input+e27 0.15 102 0.54 kernel_locktcp_input+1366 0.02 17 0.08 kernel_lockcallout_softclock+38d 0.02 6 0.06 kernel_lockip6flow_slowtimo+e 0.02 13 0.06 kernel_lockip_slowtimo+1a 0.02 15 0.06 kernel_lockip6flow_slowtimo_work+1f 0.02 17 0.06 kernel_lockfrag6_slowtimo+1b 0.01 16 0.04 kernel_lockipflow_slowtimo_work+1f 0.01 10 0.02 kernel_lockipflow_slowtimo+e 0.00 1 0.01 kernel_lockudp_send_wrapper+2b Same setup, on a build cluster slave running 8.0_RC2 we get: Elapsed time: 3857.33 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 838202502 25353589.88 kernel_lock 86.70 662529722 21982378.01 kernel_locksleepq_block+1b5 6.27 62286987 1590912.88 kernel_lockVOP_LOCK+b2 4.05 54443880 1027059.31 kernel_lockVOP_LOOKUP+5e 2.38 51325126 603535.23 kernel_lockVOP_UNLOCK+70 0.42 5623585 105948.06 kernel_lockVOP_GETATTR+5e 0.05 741051 13894.97 kernel_lockVOP_POLL+93 0.02 290699 5546.12 kernel_locktcp_send_wrapper+22 0.02 91076 5167.14 kernel_lockintr_biglock_wrapper+16 0.02 206972 4178.13 kernel_lockVOP_INACTIVE+5d 0.01 133766 3472.19 kernel_lockscsipi_adapter_request+63 0.01 161192 3109.39 kernel_lockfileassoc_file_delete+20 0.01 101838 1833.00 kernel_lockVOP_CREATE+62 0.01 83442 1812.25 kernel_lockVOP_WRITE+61 0.01 64171 1342.68 kernel_lockVOP_READ+61 0.00 28336846.55 kernel_locksoftint_dispatch+dc 0.00 37851772.09 kernel_lockVOP_REMOVE+61 0.00 10945606.49 kernel_lockipintr+37 0.007435353.17 kernel_lockbiodone2+6d 0.00 17025284.61 kernel_lockVOP_ACCESS+5d 0.006108172.84 kernel_lockfrag6_fasttimo+1a 0.001200103.23 kernel_lockcallout_softclock+2b1 0.002789 75.00 kernel_locknd6_timer_work+49 0.002426 70.32 kernel_lockip_slowtimo+1a 0.002384 57.25 kernel_lockfrag6_slowtimo+1f 0.00 818 16.57 kernel_lockVOP_ABORTOP+94 0.00 567 14.15 kernel_lockVFS_SYNC+65 0.00 352 10.43 kernel_lockkevent1+692 0.00 428 8.72 kernel_lockVOP_LINK+61 0.00 129 4.86 kernel_lockudp_send_wrapper+2b 0.00 58 1.73 kernel_lockkqueue_register+419 0.00 56 1.46 kernel_lockknote_detach+135 0.00 5 0.27 kernel_lockarpintr+21 0.00
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 06:50:02AM -0400, Thor Lancelot Simon wrote: > Only the master does much if any disk (actually SSD) I/O. It must be > either the mpt driver or the scsi subsystem. At a _guess_, mpt? I > don't have time to look right now. I couldn't spot any special dma map allocations in mpt. > This is annoying, but seems unlikely to be the cause of the performance > regression; I agree, it is just .. strange. > I don't think the master runs builds any more, does it? So > it shouldn't be doing much and should have tons of CPU and memory > bandwidth available to move that data. It does run builds (but only a single instance). spz compared dmesg output with the old kernel and there was nothing obvious in there either. Martin
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 09:37:26AM +0200, Martin Husemann wrote: > > Do you happen to know why > > vmstat -e | fgrep "bus_dma bounces" > > shows a > 500 rate e.g. on b45? I never see a single bounce on any of my > amd64 machines. The build slaves seem to do only a few of them though. Only the master does much if any disk (actually SSD) I/O. It must be either the mpt driver or the scsi subsystem. At a _guess_, mpt? I don't have time to look right now. This is annoying, but seems unlikely to be the cause of the performance regression; I don't think the master runs builds any more, does it? So it shouldn't be doing much and should have tons of CPU and memory bandwidth available to move that data. -- Thor Lancelot Simont...@panix.com "The two most common variations translate as follows: illegitimi non carborundum = the unlawful are not silicon carbide illegitimis non carborundum = the unlawful don't have silicon carbide."
Re: 8.0 performance issue when running build.sh?
On Fri, 13 Jul 2018, Martin Husemann wrote: On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote: I have a system with (probably) enough RAM - amd64, 8/16 core/threads, with 128GB, running 8.99.18 - to test if someone wants to provide an explicit test scenario that can run on top of an existing "production" environment. If your production environment is based on -7, it might work. Steps to reproduce (I hope, as I couldn't): - run a netbsd-7 system Ooops - that would be a show-stopper. My one-and-only environment is 8.99.18 and I have no other bootable system (and no painless way to create one). - check out netbsd-7 src and xsrc - run something (with proper -j) : build.sh -m alpha -j N build release iso-image" - save the last ~20 lines of the log somewhere - clean up all build results, tools, obj, (if on tmpfs this is not needed) - reboot to a netbsd-8 kernel - (if on tmpfs, redo the checkout) - run the same build.sh again +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote: > I have a system with (probably) enough RAM - amd64, 8/16 core/threads, > with 128GB, running 8.99.18 - to test if someone wants to provide an > explicit test scenario that can run on top of an existing "production" > environment. If your production environment is based on -7, it might work. Steps to reproduce (I hope, as I couldn't): - run a netbsd-7 system - check out netbsd-7 src and xsrc - run something (with proper -j) : build.sh -m alpha -j N build release iso-image" - save the last ~20 lines of the log somewhere - clean up all build results, tools, obj, (if on tmpfs this is not needed) - reboot to a netbsd-8 kernel - (if on tmpfs, redo the checkout) - run the same build.sh again Martin
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 04:18:39AM +0700, Robert Elz wrote: > But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD > are not affected (or not nearly as much). They are, but only by ~20%, while the -7 builds are > 400%. My test builds on local hardware show +/-3% (I'd call that noise). And they are not always slower when running -8. Martin
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 03:18:08PM -0400, Thor Lancelot Simon wrote: > On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote: > > On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: > > > Are you running the builds from tmpfs to tmpfs, like the build cluster > > > does? > > > > No, I don't have enough ram to test it that way. > > So if 8.0 has a serious tmpfs regression... we don't yet know. I *did* use tempfs, but not for everything. Do you happen to know why vmstat -e | fgrep "bus_dma bounces" shows a > 500 rate e.g. on b45? I never see a single bounce on any of my amd64 machines. The build slaves seem to do only a few of them though. Martin
Re: 8.0 performance issue when running build.sh?
On Thu, 12 Jul 2018, Thor Lancelot Simon wrote: On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote: On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: Are you running the builds from tmpfs to tmpfs, like the build cluster does? No, I don't have enough ram to test it that way. So if 8.0 has a serious tmpfs regression... we don't yet know. I have a system with (probably) enough RAM - amd64, 8/16 core/threads, with 128GB, running 8.99.18 - to test if someone wants to provide an explicit test scenario that can run on top of an existing "production" environment. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: 8.0 performance issue when running build.sh?
Date:Thu, 12 Jul 2018 15:18:08 -0400 From:Thor Lancelot Simon Message-ID: <20180712191808.ga2...@panix.com> | So if 8.0 has a serious tmpfs regression... we don't yet know. If the most serious problem is something in the basic infrastructure then it ought to be affecting all of the builds. But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD are not affected (or not nearly as much). (That is, the -7 builds end up taking about twice as long as HEAD and -8 builds, and much longer than -6 (which is smaller, and uses a much simpler gcc)) I believe the most likely cause is something peculiar to -7, when it runs on a -8 base, itself, perhaps related to the gcc version, or make, of binutilos, or ... and might be something that only affects some of the architectures (I don't have enough info to know). It could be almost anything given that scope limitation, but I don't see how something basic like tmpfs on -8 can be a possible cause. kre
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote: > On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: > > Are you running the builds from tmpfs to tmpfs, like the build cluster > > does? > > No, I don't have enough ram to test it that way. So if 8.0 has a serious tmpfs regression... we don't yet know. -- Thor Lancelot Simont...@panix.com "The two most common variations translate as follows: illegitimi non carborundum = the unlawful are not silicon carbide illegitimis non carborundum = the unlawful don't have silicon carbide."
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: > Are you running the builds from tmpfs to tmpfs, like the build cluster > does? No, I don't have enough ram to test it that way. Martin
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 05:29:57PM +0200, Martin Husemann wrote: > On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote: > > So stay tuned, maybe only Intel to blame ;-) > > Nope, that wasn't it. > > I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly > repeated the same test on an affected Intel CPU and also could see no > slowdown. Are you running the builds from tmpfs to tmpfs, like the build cluster does? Thor
Re: 8.0 performance issue when running build.sh?
On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote: > So stay tuned, maybe only Intel to blame ;-) Nope, that wasn't it. I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly repeated the same test on an affected Intel CPU and also could see no slowdown. So this seems to not be a generic NetBSD 8.0 issue, but something strange with the concrete hardware (maybe a driver bug). It is never easy - but at least this should not block the remaining 8.0 release cycle. Martin
Re: 8.0 performance issue when running build.sh?
On 11.07.2018 11:47, Takeshi Nakayama wrote: Martin Husemann wrote > >>> Another observation is that grep(1) on one NetBSD server is >>> significantly slower between the switch from -7 to 8RC1. >> >> Please file separate PRs for each (and maybe provide some input files >> to reproduce the issue). > > Already filed: > http://gnats.netbsd.org/53241 > > -- Takeshi Nakayama > Time of "LC_ALL=C grep" query: 0.18 real 0.08 user 0.09 sys Time of "LC_ALL=pl_PL.UTF-8 grep": 15,94 real15,74 user 0,18 sys Good catch! It's 200 times slower! signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
>>> Martin Husemann wrote > > Another observation is that grep(1) on one NetBSD server is > > significantly slower between the switch from -7 to 8RC1. > > Please file separate PRs for each (and maybe provide some input files > to reproduce the issue). Already filed: http://gnats.netbsd.org/53241 -- Takeshi Nakayama
Re: 8.0 performance issue when running build.sh?
On 11.07.2018 09:09, Simon Burge wrote: > Hi folks, > > Martin Husemann wrote: > >> On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: >>> After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, >>> the ld(1) linker has serious issues with linking Clang/LLVM single >>> libraries within 20 minutes. This causes frequent timeouts on the NetBSD >>> buildbot in the LLVM buildfarm. Timeouts were never observed in the >>> past, today there might be few of them daily. >> >> Sounds like a binutils issue (or something like too little RAM available >> on the machine). > > Probably only tangently related, but I wasn't able to link an amd64 > GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no > NOCTF=yes" at least got something build but was extraordinarily slow to > do so. I never worried about raising a PR for that as I figured the > world had moved on... > > Cheers, > Simon. > There is certainly contribution to excessive RAM usage.. but I must defer myself profiling tasks and focus on other things now. It's better to focus on lld porting myself (10x boost compared to ld in larger C++ codebase) before profiling. Additionally that LLVM build machine is a remote machine and is serving public service. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
Hi folks, Martin Husemann wrote: > On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: > > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, > > the ld(1) linker has serious issues with linking Clang/LLVM single > > libraries within 20 minutes. This causes frequent timeouts on the NetBSD > > buildbot in the LLVM buildfarm. Timeouts were never observed in the > > past, today there might be few of them daily. > > Sounds like a binutils issue (or something like too little RAM available > on the machine). Probably only tangently related, but I wasn't able to link an amd64 GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no NOCTF=yes" at least got something build but was extraordinarily slow to do so. I never worried about raising a PR for that as I figured the world had moved on... Cheers, Simon.
Re: 8.0 performance issue when running build.sh?
On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, > the ld(1) linker has serious issues with linking Clang/LLVM single > libraries within 20 minutes. This causes frequent timeouts on the NetBSD > buildbot in the LLVM buildfarm. Timeouts were never observed in the > past, today there might be few of them daily. Sounds like a binutils issue (or something like too little RAM available on the machine). > Another observation is that grep(1) on one NetBSD server is > significantly slower between the switch from -7 to 8RC1. Please file separate PRs for each (and maybe provide some input files to reproduce the issue). Martin
Re: 8.0 performance issue when running build.sh?
On 10.07.2018 11:01, Martin Husemann wrote: > On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: >> I have no scientific data yet, but just noticed that build times on the >> auto build cluster did rise very dramatically since it has been updated >> to run NetBSD 8.0 RC2. >> >> Since builds move around build slaves sometimes (not exactly randomly, >> but anyway) I picked the alpha port as an example (the first few >> architectures in the alphabetical list get build slaves assigned pretty >> consistently). > > Here is an intermediate result from further experiments and statistics: > > - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current >and not what will be in the final 8.0 release) has a measurable performance >impact - but it is not the big issue here. > > - if we ignore netbsd-7* branches, the performance loss is reasonable >explainable by the SVS penalty - we are going to check that theory soon. > > - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad >interaction with SVS, making those build times sky rocket - if turning >off SVS does not solve this, we will need to dig deeper. > > So stay tuned, maybe only Intel to blame ;-) > > If anyone has concrete pointers for the last issue (or ideas what to change/ > measure) please speak up. > > Martin > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, the ld(1) linker has serious issues with linking Clang/LLVM single libraries within 20 minutes. This causes frequent timeouts on the NetBSD buildbot in the LLVM buildfarm. Timeouts were never observed in the past, today there might be few of them daily. We were experimenting with disabled SVS, but it didn't help. Another observation is that grep(1) on one NetBSD server is significantly slower between the switch from -7 to 8RC1. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
Le 10/07/2018 à 11:01, Martin Husemann a écrit : On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: I have no scientific data yet, but just noticed that build times on the auto build cluster did rise very dramatically since it has been updated to run NetBSD 8.0 RC2. Since builds move around build slaves sometimes (not exactly randomly, but anyway) I picked the alpha port as an example (the first few architectures in the alphabetical list get build slaves assigned pretty consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. For the record: EagerFPU has a fixed performance cost, which is, saving+restoring the FPU context during each context switch. LazyFPU, however, had a variable performance cost: during context switches the FPU state was kept on the CPU, in the hope that if we switched back to the owner lwp we may not have had to do a save+restore. If we did have to, however, we needed to send an IPI, and cost(IPI+save+restore) > cost(save+restore). So LazyFPU may have been less/more expensive than EagerFPU, depending on the workload/scheduling. The reason it is more expensive for you is maybe because on your machine each lwp ("make" thread) stays on the same CPU, and the kpreemtions cause a save+restore that is not actually necessary since each CPU always comes back to the owner lwp. (As you said, also, you have the old version of EagerFPU in RC2, which is more expensive than that of the current -current, so it's part of the problem too.) I've already said it, but XSAVEOPT actually eliminates this problem, since it performs the equivalent of LazyFPU (not saving+restoring when not needed) without requiring an IPI. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Not sure this is related, but it seems that the performance of build.sh on netbsd-current is oscillating as fuck. Maybe I just hadn't noticed that it was oscillating this much before, but right now a build.sh -j 4 kernel=GENERIC is in [5min; 5min35s]. Even with SpectreV2/SpectreV4/SVS/EagerFPU all disabled. It seems to me that at a time doing two or three builds was enough and you would get an oscillation that was <5s. Now it looks like I have to do more than 5 builds to get a relevant average. Maxime
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: > I have no scientific data yet, but just noticed that build times on the > auto build cluster did rise very dramatically since it has been updated > to run NetBSD 8.0 RC2. > > Since builds move around build slaves sometimes (not exactly randomly, > but anyway) I picked the alpha port as an example (the first few > architectures in the alphabetical list get build slaves assigned pretty > consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Martin
Re: 8.0 performance issue when running build.sh?
Le 06/07/2018 à 16:48, Martin Husemann a écrit : On Fri, Jul 06, 2018 at 04:40:48PM +0200, Maxime Villard wrote: This are all successfull builds of HEAD for alpha that happened after June 1: What does that mean, are you building something *on* an Alpha CPU, or are you building the Alpha port on another CPU? It is all about the releng auto build cluster, which is running amd64. Then it is likely caused by two things: * in EagerFPU I made a mistake initially, and it caused the FPU state to be restored when the kernel was leaving a softint. I sent a pullup already for netbsd-8, but it hasn't yet been applied. The fix removes a save+restore, which improves performance. * the XSTATE_BV bitmap is not zeroed at execve time in NetBSD-8. It probably causes some performance loss, because XRSTOR always restores each FPU state instead of just the ones we used. In recent CPUs there are many FPU states and we use only a few in the base system, so the extra restoring costs us. Even more so with EagerFPU, I guess, because we do save+restore unconditionally, rather than on-demand. I fixed that in October 2017 in -current, but it didn't make it to -8. I guess it will have to, now. Beyond that we need to use XSAVEOPT, too. Maxime
8.0 performance issue when running build.sh?
I have no scientific data yet, but just noticed that build times on the auto build cluster did rise very dramatically since it has been updated to run NetBSD 8.0 RC2. Since builds move around build slaves sometimes (not exactly randomly, but anyway) I picked the alpha port as an example (the first few architectures in the alphabetical list get build slaves assigned pretty consistently). This are all successfull builds of HEAD for alpha that happened after June 1: BuildSlave Seconds HEAD:201806011030Z b43 3807 HEAD:201806020940Z b43 3785 HEAD:201806021830Z b43 3797 HEAD:201806040210Z b43 3814 HEAD:201806050210Z b43 3814 HEAD:201806050840Z b43 3766 HEAD:201806052050Z b43 3835 HEAD:201806060330Z b43 3817 HEAD:201806061300Z b43 3771 HEAD:201806062340Z b43 3825 HEAD:201806071500Z b43 3780 HEAD:201806081600Z b43 3800 HEAD:201806090120Z b43 3814 HEAD:201806091820Z b43 3816 HEAD:201806100640Z b43 3803 HEAD:201806101910Z b43 3806 HEAD:201806110430Z b43 3814 HEAD:201806112100Z b43 3822 HEAD:201806120750Z b43 3770 HEAD:201806122120Z b43 3817 HEAD:201806130400Z b43 3794 HEAD:201806131340Z b43 3820 HEAD:201806140020Z b43 3803 HEAD:201806151330Z b43 3806 HEAD:201806152250Z b43 3817 HEAD:201806161530Z b43 3810 HEAD:201806170310Z b43 3826 HEAD:201806171540Z b43 3788 HEAD:201806180110Z b43 3794 HEAD:201806181740Z b43 3827 HEAD:201806190430Z b43 3805 HEAD:201806191750Z b43 3763 HEAD:201806200020Z b43 3777 HEAD:201806200950Z b43 3838 HEAD:201806202030Z b43 3815 HEAD:201806211150Z b43 3775 HEAD:201806221250Z b43 3804 HEAD:20180610Z b43 3840 HEAD:201806231440Z b43 3800 HEAD:201806240250Z b43 3826 HEAD:201806242110Z b43 3797 HEAD:201806250640Z b43 3763 HEAD:201806260450Z b43 3812 HEAD:201806261520Z b43 3801 HEAD:201806270340Z b43 3656 HEAD:201806270810Z b43 3820 HEAD:201806271700Z b43 3763 HEAD:201806280320Z b43 3787 HEAD:201806281810Z b43 3790 HEAD:201806301810Z b43 5583 HEAD:201807012030Z b43 5617 HEAD:201807021930Z b43 5569 HEAD:201807030940Z b43 4246 HEAD:201807031950Z b43 4192 HEAD:201807042110Z b43 4256 HEAD:201807051620Z b43 4212 You will easily guess when the system was upgraded. Average "alpha" build time was 3800s before and 4811s after the update. I would not worry a lot about this, as other changes have been made to HEAD so the things being build are not the same, but overall we see a very similar pattern even for builds of branches that did not significantly change. For some reason the netbsd-7* builds seem to be the worst affected, they went from ~4h to > 12h. As I said, no scientific data here - but ideas or explanations welcome! Martin P.S.: the CPUs are affected by all the recentish issues, we have svs and fpu_eager enabled, but also mprotect, aslr, P.P.S.: you can see overall build times here: http://releng.netbsd.org/cgi-bin/builds.cgi