Re: performance issues during build.sh -j 40 kernel
On Sun, Sep 10, 2017 at 06:51:31PM +0100, Mindaugas Rasiukevicius wrote: > Mateusz Guzik wrote: > > 1. exclusive vnode locking (genfs_lock) > > > > ... > > > > 2. uvm_fault_internal > > > > ... > > > > 4. vm locks in general > > > > We know these points of lock contention, but they are not really that > trivial to fix. Breaking down the UVM pagequeue locks would generally > be a major project, as it would be the first step towards NUMA support. > In any case, patches are welcome. :) > Breaking locks is of course the preferred long term solution, but also time consuming. On the other hand there are most likely reasonably easy fixes consisting of collapsing lock/unlock cycles into just one lock/unlock etc. FreeBSD is no saint here either with one global lock for free pages, yet it manages to work OK-ish with 80 hardware threads and is quite nice with 40. That said, I had enough problems $elsewhere to not be interested in looking too hard here. :> > > 3. pmap > > > > It seems most issues stem from slow pmap handling. Chances are there are > > perfectly avoidable shootdowns and in fact cases where there is no need > > to alter KVA in the first place. > > At least x86 pmap already performs batching and has quite efficient > synchronisation logic. You are right that there are some key places > where avoiding KVA map/unmap would have a major performance improvement, > e.g. UBC and mbuf zero-copy mechanisms (it could operate on physical > pages for I/O). However, these changes are not really related to pmap. > Some subsystems just need an alternative to temporary KVA mappings. > I was predominantly looking at teardown of ubc mappings. The flamegraph suggests overly high cost there. > > > > I would like to add a remark about locking primitives. > > > > Today the rage is with MCS locks, which are fine but not trivial to > > integrate with sleepable locks like your mutexes. Even so, the current > > implementation is significantly slower than it has to be. > > > > ... > > > > Spinning mutexes should probably be handled by a different routine. > > > > ... > > > > I disagree, because this is a wrong approach to the problem. Instead of > marginally optimising the slow-path (and the more contended is the lock, > the less impact these micro-optimisations have), the subsystems should be > refactored to eliminate the lock contention in the first place. Yes, it > is much more work, but it is the long term fix. Having said that, I can > see some use cases where MCS locks could be useful, but it is really a low > priority in the big picture. > Locks are fundamentally about damage control. As noted earlier, spurious bus transaction due to an avoidable read make performance unnecessarily tad bit worse. That was minor anyway, more important bit was the backoff. Even on systems modest by today standards the quality of locking primitives can be a difference between a system which is slower than ideal but perfectly usable and one which is just dog slow. That said, making backoff parameters autoscale on cpus with some kind of upper cap is definitely warranted. -- Mateusz Guzik Swearing Maintenance Engineer
Re: performance issues during build.sh -j 40 kernel
> Le 09/09/2017 à 20:48, Mateusz Guzik a écrit : On Sun, Sep 10, 2017 at 07:29:11PM +0200, Maxime Villard wrote: > Le 09/09/2017 à 20:48, Mateusz Guzik a écrit : > > [...] > > I installed the 7.1 release, downloaded recent git snapshot and built the > > trunk kernel while using config stolen from the release (had to edit out > > something about 3g modems to make it compile). I presume this is enough > > to not have debug of any sort enabled. > > Not sure I understand; did you test a kernel from the netbsd-7.1 branch, or > from netbsd-current? You might want to test netbsd-current, I know that several > performance-related improvements were made. > I noted it's a current kernel. The 7.1 release bits were there to ensure I don't run into userspace/kernel debug. > > 3. pmap > > > > It seems most issues stem from slow pmap handling. Chances are there are > > perfectly avoidable shootdowns and in fact cases where there is no need > > to alter KVA in the first place. > > This seems rather surprising to me. I tried to reduce the number of shootdowns > some time ago, but they were already optimized, and my attempts just made them > slower to process. The only related thing I fixed was making sure there is no > kernel page that gets flushed under a local shootdown, but as far as I > remember, it didn't significantly improve performance (on a somewhat old > hardware, I must admit). > Note this was tested on kvm, where shootdowns are more expensive than on bare metal so the result is probably worsened compared to bare-metal (still, kvm is a perfectly fine production vm deployment, so I don't feel bad for testing on it). I'm did not investigate in detail (I'll have to), but I believe dragonflybsd went to extended measures to reduce/eleminate IPIs in general. Most definitely worth looking at. -- Mateusz Guzik Swearing Maintenance Engineer
re: how to tell if a process is 64-bit
Thor Lancelot Simon writes: > On Sun, Sep 10, 2017 at 03:29:22PM +, paul.kon...@dell.com wrote: > > > > MIPS has four ABIs, if you include "O64". Whether a particular OS allows > > all four concurrently is another matter; it isn't clear that would make > > sense. Mixing "O" and "N" ABIs is rather messy. > > > > Would you call N32 a 64-bit ABI? It has 64 bit registers, so if a value > > is passed to the kernel in a register it comes across as 64 bits. But it > > has 32 bit addresses. > > I wouldn't, because if an address is passed to the kernel, it comes across > as 32 bits. But what _do_ we do on modern, 32-bit MIPS? Are we still O32? > It does kind of look like it -- all our 32-bit MIPS ports' sets files seem > to be linked to ../../../shared/mipsel/ which must be O32 since it is also > used for the pmax sets. as i mentioned earlier in this thread, our mips64 defaults to n32 userland for everything except kvm-only using utils (that all need to be fixed.) o32, n32 and n64 all are supported, though n64 dynamic is currently broken for some reason i haven't looked closely at. any mips port without "64" in it is o32-only, because it's built to only support a 32 bit register-size CPU. .mrg.
Re: how to tell if a process is 64-bit
On Sun, Sep 10, 2017 at 03:29:22PM +, paul.kon...@dell.com wrote: > > MIPS has four ABIs, if you include "O64". Whether a particular OS allows > all four concurrently is another matter; it isn't clear that would make > sense. Mixing "O" and "N" ABIs is rather messy. > > Would you call N32 a 64-bit ABI? It has 64 bit registers, so if a value > is passed to the kernel in a register it comes across as 64 bits. But it > has 32 bit addresses. I wouldn't, because if an address is passed to the kernel, it comes across as 32 bits. But what _do_ we do on modern, 32-bit MIPS? Are we still O32? It does kind of look like it -- all our 32-bit MIPS ports' sets files seem to be linked to ../../../shared/mipsel/ which must be O32 since it is also used for the pmax sets. -- Thor Lancelot Simont...@panix.com "We cannot usually in social life pursue a single value or a single moral aim, untroubled by the need to compromise with others." - H.L.A. Hart
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 09:03:27PM +0200, Maxime Villard wrote: > If you have a fix to untangle this mess, be my guest. I proposed to > reimplement > the 43* functions separately into compat_linux, people disagreed. Others have proposed to move it to a compat_common module, and this is the way to go I guess. But I won't do it as I'm happy with COMPAT_LINUX being enabled by default (despite eventual bugs). -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 20:51, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 08:46:52PM +0200, Maxime Villard wrote: Le 10/09/2017 à 19:59, Manuel Bouyer a écrit : There's something I don't understand in this thread: what is the point of having the code in kernel if you still have to use modload to make it availble ? Why not comment it out in kernel and have users modload it if they want to ? said earlier, but on a different list, see http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html OK. So you want this because (some?) compat modules can't be dynamically loaded. This problem should be fixed, instead of of worked around in such a ugly way. If you have a fix to untangle this mess, be my guest. I proposed to reimplement the 43* functions separately into compat_linux, people disagreed.
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 08:46:52PM +0200, Maxime Villard wrote: > Le 10/09/2017 à 19:59, Manuel Bouyer a écrit : > > There's something I don't understand in this thread: what is the point > > of having the code in kernel if you still have to use modload to make it > > availble ? Why not comment it out in kernel and have users modload it > > if they want to ? > > said earlier, but on a different list, see > > http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html OK. So you want this because (some?) compat modules can't be dynamically loaded. This problem should be fixed, instead of of worked around in such a ugly way. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 19:59, Manuel Bouyer a écrit : There's something I don't understand in this thread: what is the point of having the code in kernel if you still have to use modload to make it availble ? Why not comment it out in kernel and have users modload it if they want to ? said earlier, but on a different list, see http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html
Re: performance issues during build.sh -j 40 kernel
On Sun, Sep 10, 2017 at 07:56:11PM +0200, Maxime Villard wrote: > Le 10/09/2017 à 19:50, Joerg Sonnenberger a écrit : > > On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote: > > > That's true, but changing this also has quite a significant downside on > > > some workloads for second order effects. I don't think it is a good idea > > > to change this right now, as it doesn't even fix the real problem. > > > > Just to quantify this part, for a current release build on tmpfs, I see: > > > > After: > > 4267 > > 4280 > > 4261 > > 4247 > > 4300 > > > > Before: > > 3915 > > 3951 > > 3991 > > 3961 > > 3968 > > That's the cacheline alignment on the uvm locks, right? In that case, what do > you think are the "second order effects"? Yes, it is adding the alignment in uvm_init.c. So an isolated build of GENERIC on tmpfs gives: https://www.netbsd.org/~joerg/lockstat-generic.txt (that's without DIAGNOSTICS, hannken added a very heavy assert in genfs recently, that needs to be investigated separateply). What I strongly suspect is that the major factor for the lock contention in uvm_fault_internal is still the uvm_fpageqlock contention. While a change to the contention of that might be locally positive, it can just as well increase the contention on the vmobjlock. Joerg
Re: Proposal: Disable autoload of compat_xyz modules
There's something I don't understand in this thread: what is the point of having the code in kernel if you still have to use modload to make it availble ? Why not comment it out in kernel and have users modload it if they want to ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: performance issues during build.sh -j 40 kernel
Le 10/09/2017 à 19:50, Joerg Sonnenberger a écrit : On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote: That's true, but changing this also has quite a significant downside on some workloads for second order effects. I don't think it is a good idea to change this right now, as it doesn't even fix the real problem. Just to quantify this part, for a current release build on tmpfs, I see: After: 4267 4280 4261 4247 4300 Before: 3915 3951 3991 3961 3968 That's the cacheline alignment on the uvm locks, right? In that case, what do you think are the "second order effects"? Maxime
Re: performance issues during build.sh -j 40 kernel
Mateusz Guzik wrote: > ... > > 1) #define UBC_NWINS 1024 > > The parameter was set in 2001 and is used on amd64 to this very day. > > lockstat says: > 51.63 585505 321201.06 e4011d8304c0 > 40.39 291550 251302.17 e4011d8304c0 ubc_alloc+69 > 9.13 255967 56776.26 e4011d8304c0 ubc_release+a5 > 1.72 35632 10680.06 e4011d8304c0 uvm_fault_internal+532 > [snip] > > The contention is on the global ubc vmobj lock just prior to hash lookup. > I recompiled the kernel with randomly slapped value of 65536 and the the > problem cleared itself with ubc_alloc going way down. > > I made no attempts to check what value makes sense or how to autoscale it. > ... Yes, ubc_nwins should be auto-tuned, I'd say depending on the physical memory size and the number of CPUs (as some weighted multiplier). > 2. uvm_pageidlezero > > Idle zeroing these days definitely makes no sense on amd64. Any amount of > pages possibly prepared is quickly shredded and vast majority of all > allocations end up zeroing in place. With rep stosb this is even less of > a problem. My feeling is the same: on heavily loaded systems the pressures are too high and for idling systems it's not worth the hassle. However, I guess others might have a different feeling. More benchmarks and analysis could settle this. > 3. false sharing > > Followed the issue noted earlier I __cacheline_aligned aforementioned > locks. But also moved atomically updated counters out of uvmexp. > > uvmexp is full of counters updated with mere increments possibly by > multiple threads, thus the issue of this obj was not resolved. > > Nonetheless, said annotations applied combined with the rest give the > improvement mentioned earlier. Yes, although if they get significantly contended, they should be moved out to struct uvm_cpu and/or percpu(9) API and aggregated on collection. It depends on the counter, of course. > 1. exclusive vnode locking (genfs_lock) > > ... > > 2. uvm_fault_internal > > ... > > 4. vm locks in general > We know these points of lock contention, but they are not really that trivial to fix. Breaking down the UVM pagequeue locks would generally be a major project, as it would be the first step towards NUMA support. In any case, patches are welcome. :) > 3. pmap > > It seems most issues stem from slow pmap handling. Chances are there are > perfectly avoidable shootdowns and in fact cases where there is no need > to alter KVA in the first place. At least x86 pmap already performs batching and has quite efficient synchronisation logic. You are right that there are some key places where avoiding KVA map/unmap would have a major performance improvement, e.g. UBC and mbuf zero-copy mechanisms (it could operate on physical pages for I/O). However, these changes are not really related to pmap. Some subsystems just need an alternative to temporary KVA mappings. > > I would like to add a remark about locking primitives. > > Today the rage is with MCS locks, which are fine but not trivial to > integrate with sleepable locks like your mutexes. Even so, the current > implementation is significantly slower than it has to be. > > ... > > Spinning mutexes should probably be handled by a different routine. > > ... > I disagree, because this is a wrong approach to the problem. Instead of marginally optimising the slow-path (and the more contended is the lock, the less impact these micro-optimisations have), the subsystems should be refactored to eliminate the lock contention in the first place. Yes, it is much more work, but it is the long term fix. Having said that, I can see some use cases where MCS locks could be useful, but it is really a low priority in the big picture. -- Mindaugas
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 17:24, Greg Troxel a écrit : [...] Reading maxv@'s suggestion, I wondered about autoload of non-built-in modules (but maybe that is already disabled). My quick reaction is that it would be nice if the "don't autoload" flag had the same behavior for builtin and non-builtin modules, so that builtin/not is just a linking style thing, and not more. Modules can be autoloaded from the filesystem, in exec_autoload(). In such a case, we want the kernel to do a MODULE_CMD_INIT on them, regardless of whether they have the MODINFO_BUILTIN_NOLOAD flag set or not. This flag must be parsed exclusively for the builtin modules, and not the rest. [...] expand config(8) to be able to set "noautoload", so that if a module is included as part of a kernel, it will be marked noautoload if and noly if the flag is on the line, regardless of defaults. This would not affect the modules in stand; they'd still have the default value of the noautoload flag from the default This would be good. But I guess it entails introducing a new "module" keyword, as opposed to the current "options" used for a certain number of drivers. Another short-term alternative would be to add options that set MODINFO_BUILTIN_NOLOAD. Something like: #ifdef COMPAT_LINUX_BUILTIN_NOLOAD MD1 MD2 MD3, MODINFO_BUILTIN_NOLOAD); #else MD1 MD2 MD3, 0); #endif options COMPAT_LINUX options COMPAT_LINUX_BUILTIN_NOLOAD People that want the module builtin+loaded would comment the second line. Note that this is similar to the notion that shipping functions for a kernel module and dynamically registering them for use are two different unrelated options - which is more or less what was suggested earlier in this thread. But it indeed becomes a bit more complicated to understand and use... Maxime
Re: performance issues during build.sh -j 40 kernel
On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote: > That's true, but changing this also has quite a significant downside on > some workloads for second order effects. I don't think it is a good idea > to change this right now, as it doesn't even fix the real problem. Just to quantify this part, for a current release build on tmpfs, I see: After: 4267 4280 4261 4247 4300 Before: 3915 3951 3991 3961 3968 After with longer spin off: 4327 4333 4343 4331 4312 Time is in seconds. So adding the cacheline alignment slows the system down by 8% on average. That's a -j32 release on a dual Xeon. Joerg
Re: performance issues during build.sh -j 40 kernel
Thanks for this analysis. I have three remarks: Le 09/09/2017 à 20:48, Mateusz Guzik a écrit : [...] I installed the 7.1 release, downloaded recent git snapshot and built the trunk kernel while using config stolen from the release (had to edit out something about 3g modems to make it compile). I presume this is enough to not have debug of any sort enabled. Not sure I understand; did you test a kernel from the netbsd-7.1 branch, or from netbsd-current? You might want to test netbsd-current, I know that several performance-related improvements were made. [...] Here it turned out to be harmful by inducing avoidable cacheline traffic. Look at nm kernel | sort -nk 1: 810b8fc0 B uvm_swap_data_lock 810b8fc8 B uvm_kentry_lock 810b8fd0 B uvm_fpageqlock 810b8fd8 B uvm_pageqlock 810b8fe0 B uvm_kernel_object I saw exactly this too a few months ago. In fact, there is a certain number of places that generate huge false sharing. Typically, the xpq_idx_array[MAXCPUS] array in Xen. I've fixed only few of them, but it is clear that they should all be taken care of. [...] 3. pmap It seems most issues stem from slow pmap handling. Chances are there are perfectly avoidable shootdowns and in fact cases where there is no need to alter KVA in the first place. This seems rather surprising to me. I tried to reduce the number of shootdowns some time ago, but they were already optimized, and my attempts just made them slower to process. The only related thing I fixed was making sure there is no kernel page that gets flushed under a local shootdown, but as far as I remember, it didn't significantly improve performance (on a somewhat old hardware, I must admit). I'll take care of some of the false sharing soon. Maxime
Re: performance issues during build.sh -j 40 kernel
On Sat, Sep 09, 2017 at 08:48:19PM +0200, Mateusz Guzik wrote: > 1) #define UBC_NWINS 1024 Yes, this one should scale automatically. Needs a bit thought about that a good scaling would be. > 2. uvm_pageidlezero I disagree on this, a lot. At best it is a band aid unless the uvm_f?pageqlock handling is fixed. Not that unlike FreeBSD, this has been using non-temporal stores for ever, so it has very little additional cacheline traffic beyond the free queue interaction. While it doesn't help on a completely busy system, it does provide value for any system that is even occassionally. > > 810b8fc0 B uvm_swap_data_lock > 810b8fc8 B uvm_kentry_lock > 810b8fd0 B uvm_fpageqlock > 810b8fd8 B uvm_pageqlock > 810b8fe0 B uvm_kernel_object > > > All these locks false-share a cacheline. In particular fpagqlock is > obstructing uvm_pageqlock. That's true, but changing this also has quite a significant downside on some workloads for second order effects. I don't think it is a good idea to change this right now, as it doesn't even fix the real problem. > Doing #if 0'ing the uvm_pageidlezero call in the idle func shaved about 2 > seconds real time: > 589.02s user 792.62s system 2541% cpu 54.365 total There is a sysctl for it, you know? > Followed the issue noted earlier I __cacheline_aligned aforementioned > locks. But also moved atomically updated counters out of uvmexp. Actually, most of them should be switched to localcounter. Joerg
Re: how to tell if a process is 64-bit
> On Sep 10, 2017, at 10:31 AM, Thor Lancelot Simon wrote: > > On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote: >>> In a cross-platform process utility tool the question came up how to >>> decide if a process is 64-bit. >> >> First, I have to ask: what does it mean to say that a particular >> process is - or isn't - 64-bit? > > I think the only simple answer is "it is 64-bit in the relevant sense if > it uses the platform's 64-bit ABI for interaction with the kernel". > > This actually raises a question for me about MIPS: do we have another > process flag to indicate O32 vs. N32, or can we simply not run O32 > executables on 64-bit or N32 kernels (surely we don't use the O32 ABI > for all kernel interaction by 32-bit processes)? MIPS has four ABIs, if you include "O64". Whether a particular OS allows all four concurrently is another matter; it isn't clear that would make sense. Mixing "O" and "N" ABIs is rather messy. Would you call N32 a 64-bit ABI? It has 64 bit registers, so if a value is passed to the kernel in a register it comes across as 64 bits. But it has 32 bit addresses. paul
Re: Proposal: Disable autoload of compat_xyz modules
> Am 10.09.2017 um 12:35 schrieb Maxime Villard : > > Le 10/09/2017 à 12:24, Paul Goyette a écrit : >> On Sun, 10 Sep 2017, Maxime Villard wrote: >>> Re-thinking about this again, it seems to me we could simply add a flags >>> field in modinfo_t, with a bit that says "if this module is builtin, then >>> don't load it". To use compat_xyz, you'll have to type modload, and the >>> kernel will load the module from the builtin list. >>> >>> Something like [1] (from memory, not tested at all). Obviously this patch >>> is not complete, since we need to update each MODULE(). >>> >>> While it is clear that it does not solve the cross-dependency issue we're >>> having, it does reduce the attack surface almost as much as if the module >>> was not builtin, with very little effort. Cheap, but relevant. >>> >>> [1] http://m00nbsd.net/garbage/module/noload.diff >> Well, probably not quite what you wanted, but if a module is built-in >> you can disable it by using modunload(8). Any built-in module which has >> been disabled in this manner needs to be explicitly reload manually, and >> you'll need to additionally specify the -f option to modload(8). > > I know. > >> Perhaps /etc/rc.d/modules can be updated to have both a load and an >> unload phase, with appropriate syntax for the associated config file. > > Thought about this too, but it seemed bizarre to me to have the kernel load > modules, then rc.d/modules unload them, and then the user reload them. > >> This would be a lot cleaner IMHO than updating individual modules. > > I believe per-module flags can be useful in the future, and not just in the > noload case; a module could want to tell the kernel how it wants to be loaded. I think "how a module should be loaded" should be left to the sysadmins discretion, not the module itself. Besides that, I don't like the whole idea of built-in modules not being activated by default, after all that is how it has been for many releases. > > Maxime
performance issues during build.sh -j 40 kernel
Hello, I have been playing a little bit with a NetBSD vm running on Centos7 + kvm. I ran into severe performance issues which I partially investigated. A bunch of total hacks was written to confirm few problems, but there is nothing committable without doing actual work and major problems remain. I think the kernel is in dire need to have someone sit on issues reported below and see them through. I'm happy to test patches, although I wont necessarily have access to the same hardware used for current tests. Hardware specs: Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz 2 sockets * 10 cores * 2 hardware threads 32GB of ram I assigned all 40 threads to the vm + gave it 16GB of ram. The host is otherwise idle. I installed the 7.1 release, downloaded recent git snapshot and built the trunk kernel while using config stolen from the release (had to edit out something about 3g modems to make it compile). I presume this is enough to not have debug of any sort enabled. The filesystem is just ufs mounted with noatime. Attempts to use virtio for storage resulted in extremely abysmall performance which I did not investigate. Using SATA gave read errors and the vm failed to boot multiuser. I settled for IDE which works reasonbly fine, but inherently makes the test worse. All tests were performed with the trunk kernel booted. Here is a bunch of "./build.sh -j 40 kernel=MYCONF > /dev/null" on stock kernel: 618.65s user 1097.80s system 2502% cpu 1:08.60 total 628.73s user 1128.71s system 2540% cpu 1:09.18 total 629.05s user 1082.58s system 2517% cpu 1:07.99 total 641.11s user 1081.05s system 2545% cpu 1:07.65 total 641.18s user 1079.89s system 2522% cpu 1:08.24 total And on kernel with total hacks: 594.08s user 693.11s system 2459% cpu 52.331 total 594.81s user 711.90s system 2498% cpu 52.292 total 600.34s user 676.39s system 2486% cpu 51.336 total 597.33s user 725.78s system 2536% cpu 52.157 total 597.13s user 708.79s system 2510% cpu 52.011 total i.e. it's still pretty bad, with system time being above user. However, real time dropped from ~68 to ~52 and %sys from ~1100 to ~700. Hacks can be seen here (wear gloves and something to protect eyes): https://people.freebsd.org/~mjg/netbsd/hacks.diff 1) #define UBC_NWINS 1024 The parameter was set in 2001 and is used on amd64 to this very day. lockstat says: 51.63 585505 321201.06 e4011d8304c0 40.39 291550 251302.17 e4011d8304c0 ubc_alloc+69 9.13 255967 56776.26 e4011d8304c0 ubc_release+a5 1.72 35632 10680.06 e4011d8304c0 uvm_fault_internal+532 [snip] The contention is on the global ubc vmobj lock just prior to hash lookup. I recompiled the kernel with randomly slapped value of 65536 and the the problem cleared itself with ubc_alloc going way down. I made no attempts to check what value makes sense or how to autoscale it. This change alone accounts for most of the speed up by giving: 586.87s user 919.99s system 2612% cpu 57.676 total 2. uvm_pageidlezero Idle zeroing these days definitely makes no sense on amd64. Any amount of pages possibly prepared is quickly shredded and vast majority of all allocations end up zeroing in place. With rep stosb this is even less of a problem. Here it turned out to be harmful by inducing avoidable cacheline traffic. Look at nm kernel | sort -nk 1: 810b8fc0 B uvm_swap_data_lock 810b8fc8 B uvm_kentry_lock 810b8fd0 B uvm_fpageqlock 810b8fd8 B uvm_pageqlock 810b8fe0 B uvm_kernel_object All these locks false-share a cacheline. In particular fpagqlock is obstructing uvm_pageqlock. Attempt to run zeroing performs mutex_tryenter. It uncoditionally does lock cmpxchg which dirties the cacheline, thus even if zeroing would end up not being performed the damage was already done. Chances are succesfull zeroing is also a problem, but that I did not investigate. Doing #if 0'ing the uvm_pageidlezero call in the idle func shaved about 2 seconds real time: 589.02s user 792.62s system 2541% cpu 54.365 total This should definitely be disabled for amd64 altogether and probably removed in general. 3. false sharing Followed the issue noted earlier I __cacheline_aligned aforementioned locks. But also moved atomically updated counters out of uvmexp. uvmexp is full of counters updated with mere increments possibly by multiple threads, thus the issue of this obj was not resolved. Nonetheless, said annotations applied combined with the rest give the improvement mentioned earlier. == Here is a flamegraph from a fully patched kernel: https://people.freebsd.org/~mjg/netbsd/build-kernel-j40.svg And here are top mutex spinners: 59.42 1560022 184255.00 e40138351180 57.52 1538978 178356.84 e40138351180 uvm_fault_internal+7e0 1.238884 3819.43 e40138351180 uvm_unmap_remove+101 0.67 12159 2078.61 e40138351180 cach
Re: Proposal: Disable autoload of compat_xyz modules
Manuel Bouyer writes: > On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: >> Re-thinking about this again, it seems to me we could simply add a flags >> field in modinfo_t, with a bit that says "if this module is builtin, then >> don't load it". To use compat_xyz, you'll have to type modload, and the >> kernel will load the module from the builtin list. > > If I compile a kernel with a built-in module, I expect this module to > be active. Otherwise I don't compile it. But maxv@ is not talking about you deciding to compile a kernel and putting in a line for a module. The question is about compat modules that are in GENERIC, and how to choose defaults so that users who want to use them aren't inconveniencyed and that users that don't want to use them don't have reduced security. Reading maxv@'s suggestion, I wondered about autoload of non-built-in modules (but maybe that is already disabled). My quick reaction is that it would be nice if the "don't autoload" flag had the same behavior for builtin and non-builtin modules, so that builtin/not is just a linking style thing, and not more. But I see your point about respecting explicit configuration. So I wonder about (without providing a patch of course): having a per-compiled-module flag to disable autoload, as suggested (in builtin and not, unless I'm confused) set the noautoload flag to true in modules that are deemed an unnecessary risk to people who have not made a choice to use them [so far this is maxv's proposal, I think] expand config(8) to be able to set "noautoload", so that if a module is included as part of a kernel, it will be marked noautoload if and noly if the flag is on the line, regardless of defaults. This would not affect the modules in stand; they'd still have the default value of the noautoload flag from the default add the noautload flag to in-tree kernel configs for the above modules which means that in Manuel's custom kernel he can just leave out the noautoload flag and then that kernel will behave as always. People trying to run a MODULAR kernel would still need to either edit their module sources to change the flag (which if you are a MODULAR type, is more or less like editing GENERIC) or do manual modload. Overall I find this disabling of things by default but leaving them in far preferable to not building them or removing them from sources in terms of getting to a better place in the security/usability trade space. signature.asc Description: PGP signature
Re: how to tell if a process is 64-bit
On 10.09.2017 16:31, Thor Lancelot Simon wrote: > On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote: >>> In a cross-platform process utility tool the question came up how to >>> decide if a process is 64-bit. >> >> First, I have to ask: what does it mean to say that a particular >> process is - or isn't - 64-bit? > > I think the only simple answer is "it is 64-bit in the relevant sense if > it uses the platform's 64-bit ABI for interaction with the kernel". > > This actually raises a question for me about MIPS: do we have another > process flag to indicate O32 vs. N32, or can we simply not run O32 > executables on 64-bit or N32 kernels (surely we don't use the O32 ABI > for all kernel interaction by 32-bit processes)? > > Thor > From a debugger pointer of view it's useful to know ABI and emulation name of a running application. This is also useful in core(5) files. On Linux there is a problem to guess ABI and there is guessing (mostly assuming the host one). However any changes in this field are premature from my side. I need to get elementary features to work properly. signature.asc Description: OpenPGP digital signature
Re: how to tell if a process is 64-bit
On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote: > > In a cross-platform process utility tool the question came up how to > > decide if a process is 64-bit. > > First, I have to ask: what does it mean to say that a particular > process is - or isn't - 64-bit? I think the only simple answer is "it is 64-bit in the relevant sense if it uses the platform's 64-bit ABI for interaction with the kernel". This actually raises a question for me about MIPS: do we have another process flag to indicate O32 vs. N32, or can we simply not run O32 executables on 64-bit or N32 kernels (surely we don't use the O32 ABI for all kernel interaction by 32-bit processes)? Thor
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 12:43, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 12:38:52PM +0200, Maxime Villard wrote: Le 10/09/2017 à 12:22, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: Re-thinking about this again, it seems to me we could simply add a flags field in modinfo_t, with a bit that says "if this module is builtin, then don't load it". To use compat_xyz, you'll have to type modload, and the kernel will load the module from the builtin list. If I compile a kernel with a built-in module, I expect this module to be active. Otherwise I don't compile it. This kind of all-or-nothing mindset just does not work if we want to reduce the attack surface but still have features nearby. A level of indirection is needed, and it didn't seem to me that having per-module flags was a really bad idea. A secure system is also a system which is simple. Adding indirections doesn't keep the system simple. True enough; but in this particular case, leaving compat features enabled just for the sake of simplicity produces a system that is much more vulnerable than if it had one level of indirection.
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 13:37, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 01:32:27PM +0200, Maxime Villard wrote: Le 10/09/2017 à 13:16, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote: True enough; but in this particular case, leaving compat features enabled just for the sake of simplicity produces a system that is much more vulnerable than if it had one level of indirection. If you know it's vulnerable then fix it, do not spend time trying to work around it. Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still vulnerable, as is the native exec path or compat_netbsd32 ... yes, but these are critical to the functioning of the system, contrary to the ones I'm talking about
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 01:32:27PM +0200, Maxime Villard wrote: > Le 10/09/2017 à 13:16, Manuel Bouyer a écrit : > > On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote: > > > True enough; but in this particular case, leaving compat features enabled > > > just > > > for the sake of simplicity produces a system that is much more vulnerable > > > than > > > if it had one level of indirection. > > > > If you know it's vulnerable then fix it, do not spend time trying to > > work around it. > > Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still > vulnerable, as is the native exec path or compat_netbsd32 ... -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 13:16, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote: True enough; but in this particular case, leaving compat features enabled just for the sake of simplicity produces a system that is much more vulnerable than if it had one level of indirection. If you know it's vulnerable then fix it, do not spend time trying to work around it. Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still vulnerable, but in ways that are far from being obvious. Just look at the vulnerability I fixed in linux32 a few days ago. It was agreed here that somehow there needs to be a way to reduce the attack surface by default without totally "disabling" the features that have a common use case - what I'm discussing now is how to achieve that, not whether to do it or not. Having said that, I can understand that my noload proposal may not be the best. Maxime
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote: > True enough; but in this particular case, leaving compat features enabled just > for the sake of simplicity produces a system that is much more vulnerable than > if it had one level of indirection. If you know it's vulnerable then fix it, do not spend time trying to work around it. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 12:22, Manuel Bouyer a écrit : On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: Re-thinking about this again, it seems to me we could simply add a flags field in modinfo_t, with a bit that says "if this module is builtin, then don't load it". To use compat_xyz, you'll have to type modload, and the kernel will load the module from the builtin list. If I compile a kernel with a built-in module, I expect this module to be active. Otherwise I don't compile it. This kind of all-or-nothing mindset just does not work if we want to reduce the attack surface but still have features nearby. A level of indirection is needed, and it didn't seem to me that having per-module flags was a really bad idea.
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 12:38:52PM +0200, Maxime Villard wrote: > Le 10/09/2017 à 12:22, Manuel Bouyer a écrit : > > On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: > > > Re-thinking about this again, it seems to me we could simply add a flags > > > field in modinfo_t, with a bit that says "if this module is builtin, then > > > don't load it". To use compat_xyz, you'll have to type modload, and the > > > kernel will load the module from the builtin list. > > > > If I compile a kernel with a built-in module, I expect this module to > > be active. Otherwise I don't compile it. > > This kind of all-or-nothing mindset just does not work if we want to reduce > the attack surface but still have features nearby. A level of indirection is > needed, and it didn't seem to me that having per-module flags was a really bad > idea. A secure system is also a system which is simple. Adding indirections doesn't keep the system simple. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Le 10/09/2017 à 12:24, Paul Goyette a écrit : On Sun, 10 Sep 2017, Maxime Villard wrote: Re-thinking about this again, it seems to me we could simply add a flags field in modinfo_t, with a bit that says "if this module is builtin, then don't load it". To use compat_xyz, you'll have to type modload, and the kernel will load the module from the builtin list. Something like [1] (from memory, not tested at all). Obviously this patch is not complete, since we need to update each MODULE(). While it is clear that it does not solve the cross-dependency issue we're having, it does reduce the attack surface almost as much as if the module was not builtin, with very little effort. Cheap, but relevant. [1] http://m00nbsd.net/garbage/module/noload.diff Well, probably not quite what you wanted, but if a module is built-in you can disable it by using modunload(8). Any built-in module which has been disabled in this manner needs to be explicitly reload manually, and you'll need to additionally specify the -f option to modload(8). I know. Perhaps /etc/rc.d/modules can be updated to have both a load and an unload phase, with appropriate syntax for the associated config file. Thought about this too, but it seemed bizarre to me to have the kernel load modules, then rc.d/modules unload them, and then the user reload them. This would be a lot cleaner IMHO than updating individual modules. I believe per-module flags can be useful in the future, and not just in the noload case; a module could want to tell the kernel how it wants to be loaded. Maxime
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, 10 Sep 2017, Maxime Villard wrote: Re-thinking about this again, it seems to me we could simply add a flags field in modinfo_t, with a bit that says "if this module is builtin, then don't load it". To use compat_xyz, you'll have to type modload, and the kernel will load the module from the builtin list. Something like [1] (from memory, not tested at all). Obviously this patch is not complete, since we need to update each MODULE(). While it is clear that it does not solve the cross-dependency issue we're having, it does reduce the attack surface almost as much as if the module was not builtin, with very little effort. Cheap, but relevant. [1] http://m00nbsd.net/garbage/module/noload.diff Well, probably not quite what you wanted, but if a module is built-in you can disable it by using modunload(8). Any built-in module which has been disabled in this manner needs to be explicitly reload manually, and you'll need to additionally specify the -f option to modload(8). Perhaps /etc/rc.d/modules can be updated to have both a load and an unload phase, with appropriate syntax for the associated config file. This would be a lot cleaner IMHO than updating individual modules. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Proposal: Disable autoload of compat_xyz modules
On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: > Re-thinking about this again, it seems to me we could simply add a flags > field in modinfo_t, with a bit that says "if this module is builtin, then > don't load it". To use compat_xyz, you'll have to type modload, and the > kernel will load the module from the builtin list. If I compile a kernel with a built-in module, I expect this module to be active. Otherwise I don't compile it. -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Proposal: Disable autoload of compat_xyz modules
Re-thinking about this again, it seems to me we could simply add a flags field in modinfo_t, with a bit that says "if this module is builtin, then don't load it". To use compat_xyz, you'll have to type modload, and the kernel will load the module from the builtin list. Something like [1] (from memory, not tested at all). Obviously this patch is not complete, since we need to update each MODULE(). While it is clear that it does not solve the cross-dependency issue we're having, it does reduce the attack surface almost as much as if the module was not builtin, with very little effort. Cheap, but relevant. [1] http://m00nbsd.net/garbage/module/noload.diff