Re: Proposal: Disable autoload of compat_xyz modules

2017-09-11 Thread Mouse
> A secure system is also a system which is simple.

That ship sailed long ago, back around "options LKM" time.  Indeed,
security is most of why I turn that off in my kernels (MODULAR too, for
OS revs recent enough to have it).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: performance issues during build.sh -j 40 kernel

2017-09-11 Thread Mateusz Guzik
On Sat, Sep 09, 2017 at 08:48:19PM +0200, Mateusz Guzik wrote:
>
> Here is a bunch of "./build.sh -j 40 kernel=MYCONF > /dev/null" on stock
> kernel:
>   618.65s user 1097.80s system 2502% cpu 1:08.60 total
[..]
>
> And on kernel with total hacks:
>   594.08s user 693.11s system 2459% cpu 52.331 total
[..]
>
> ==
>
> Here is a flamegraph from a fully patched kernel:
> https://people.freebsd.org/~mjg/netbsd/build-kernel-j40.svg
>
> And here are top mutex spinners:
>  59.42 1560022 184255.00 e40138351180   
>  57.52 1538978 178356.84 e40138351180   uvm_fault_internal+7e0
>   1.238884   3819.43 e40138351180   uvm_unmap_remove+101
>   0.67   12159   2078.61 e40138351180   cache_lookup+97
>
> (see https://people.freebsd.org/~mjg/netbsd/build-kernel-j40-lockstat.txt
)
>

So I added PoC batching to uvm_fault_lower_lookup and uvm_anon_dispose.

While real time barely moved and %sys is somewhat floating around 630,
I'm happy to inform that wait time on global locks locks dropped
significantly:

 46.03 1162651  85410.88 e40127167040   
 43.80 1146153  81273.38 e40127167040   uvm_fault_internal+7c0
  1.527112   2827.06 e40127167040   uvm_unmap_remove+101
  0.719385   1310.42 e40127167040   cache_lookup+a5
  0.00   1  0.01 e40127167040   vfs_vnode_iterator_next1+87

https://people.freebsd.org/~mjg/netbsd/build-kernel-j40-hacks2.svg

https://people.freebsd.org/~mjg/netbsd/build-kernel-j40-hacks2-lockstat.txt

You can see on the flamegraph that the entire time spent in the page
fault handler dropped and the non-user time shifted to syscall handling.

Specifically, now genfs_lock is a more significant player accounting for
about 8.7% total time (6.6% previously).

Batching can be enabled with:
sysctl -w use_anon_dispose_pagelocked=1
sysctl -w uvm_fault_batch_requeue=1

Mix of total hackery is here:
https://people.freebsd.org/~mjg/netbsd/hacks2.diff

I'm quite certain there are other trivial wins in the handler.

I also noted that mutex_spin_retry routine never lowered spl. I added
total crap support to changing it, but did not measure any difference.

There is also a currently wrong hack for the namecache: instead of
taking the interlock, first check if usecount if 0 and if not, try to
bump it by 1. This races with possible transitions to VS_BLLOCKED.

I think the general idea will work fine if the prohibited state will get
embedded into top bits of the v_usecount. Regular bumps will be
unaffected, while cmpxchg like here will automagically fail. There is
only a problem of code reading the count "by hand" which would have to
be updated to mask the bit.

The reason for the hack is that the interlock is in fact the vm obj
lock and it adds tad bit of contention.

Probably with few more fixes of the sort cranking up backoff will be
beneficial.

-- 
Mateusz Guzik
Swearing Maintenance Engineer