On 11/15/19, Andreas Gustafsson <g...@gson.org> wrote: > Mateusz Guzik wrote: >> Can you get a kernel-side flamegraph? > > Done, using sources from 2019.11.14.13.58.22: > > http://www.gson.org/netbsd/bugs/system-time/fg.svg >
Thanks. First thing which jumps at me is DIAGNOSTIC being on (seen with e.g., _vstate_assert). Did your older kernels have it? If you just compiled GENERIC from release branches it is presumably removed, so would be nice to retest without it. Then there is very minor stuff which in isolation wont make a difference but would be nice take care of: - pmap_page_copy uses memcpy, which performs a little bit extra work on top of just copying - the size is known at compilation time and both addresses are guaranteed to be aligned to 4096. Therefore it can just copy without trying to align. iow this should use a dedicated routine. - pmap_page_zero uses non-temporal stores, which are almost guaranteed to only add to cache misses later on - background page zeroing probably does not win anything and only adds to contention on uvm_fpageqlock. I don't know if I'm reading this right, but it seems the lock itself is only a spinlock to accomodate its use from the idle loop. Should the feature be eliminated on amd64, the lock can be converted to just a regular lock which would be faster single-threaded (no interrupt crappery) and multi-threaded (no need to read off IPL from the lock) Here I don't see what uvm_fault_internal is contending on, it's most likely aforementioned uvm_fpageqlock. A couple years back I wrote a patch to batch ops using the lock, can probably be reasonably easily forward-ported. That said, can you rerun without DIGANOSTIC but with lockstat? -- Mateusz Guzik <mjguzik gmail.com>