On Fri, Dec 23, 2016 at 07:42:40AM +1100, Dave Chinner wrote: > On Thu, Dec 22, 2016 at 09:24:12AM -0800, Linus Torvalds wrote: > > On Wed, Dec 21, 2016 at 10:28 PM, Dave Chinner <da...@fromorbit.com> wrote: > > > > > > This sort of thing is normally indicative of a memory reclaim or > > > lock contention problem. Profile showed unusual spinlock contention, > > > but then I realised there was only one kswapd thread running. > > > Yup, sure enough, it's caused by a major change in memory reclaim > > > behaviour: > > > > > > [ 0.000000] Zone ranges: > > > [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] > > > [ 0.000000] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] > > > [ 0.000000] Normal [mem 0x0000000100000000-0x000000083fffffff] > > > [ 0.000000] Movable zone start for each node > > > [ 0.000000] Early memory node ranges > > > [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] > > > [ 0.000000] node 0: [mem 0x0000000000100000-0x00000000bffdefff] > > > [ 0.000000] node 0: [mem 0x0000000100000000-0x00000003bfffffff] > > > [ 0.000000] node 0: [mem 0x00000005c0000000-0x00000005ffffffff] > > > [ 0.000000] node 0: [mem 0x0000000800000000-0x000000083fffffff] > > > [ 0.000000] Initmem setup node 0 [mem > > > 0x0000000000001000-0x000000083fffffff] > > > > > > the numa=fake=4 CLI option is broken. > > > > Ok, I think that is independent of anything else. Removing block > > people and adding the x86 people. > > > > I'm not seeing anything at all that would change the fake numa stuff, > > but maybe the cpu hotplug changes? > > > > Thomas/Ingo/Peter - Dave is going away for several months, so you > > won't get feedback from him, but can you look at this? Or maybe point > > me towards the right people - I'm seeing no possible relevant changes > > at all fir x85 numa since 4.9, so it must be some indirect breakage. > > > > Dave is using fake-numa to do performance testing in a VM, and it's a > > big deal for the node optimizations for writeback etc. Do you have any > > ideas? > > > > Dave, if you're still around, can you send out the kernel config file > > you used... > > Looking at this fresh this morning (i.e. not pissed off by having > everything I tried to do fail in different ways all afternoon) I > found this: > > $ grep NUMA .config > CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y > # CONFIG_NUMA is not set > $ > > The .config I was using for 4.9 got 'make oldconfig' upgraded, and > looking at it there's a bunch of stuff that has been turned off that > I know was set: > > # CONFIG_EXPERT is not set > # CONFIG_PARAVIRT_SPINLOCKS is not set > # CONFIG_COMPACTION is not set > > and stuff I never use so don't set was set, like kernel crash dump, > a bunch of stuff for AMD CPUs, susp/resume and power management > debug, every partition type and filesystem under the sun was > selected, heaps of network devices enabled, etc. > > So it looks like the problem has occurred during oldconfig, meaning > I have no idea exactly WTF I was testing. Rebuilding now with a > saner config, see what happens.
Better, but still bad. average files/s is not up to 200k files/s, so still a good 10-15% off where it should be. xfs_repair is back down to 10-15% off where it should be, too. bulkstat still fires off a bad page reference count warning, iscsi still panics immediately. Cheers, Dave. -- Dave Chinner da...@fromorbit.com