> -----Original Message----- > From: Dietmar Eggemann [mailto:dietmar.eggem...@arm.com] > Sent: Friday, January 22, 2021 7:54 AM > To: Valentin Schneider <valentin.schnei...@arm.com>; Meelis Roos > <mr...@linux.ee>; LKML <linux-kernel@vger.kernel.org> > Cc: Peter Zijlstra <pet...@infradead.org>; Vincent Guittot > <vincent.guit...@linaro.org>; Song Bao Hua (Barry Song) > <song.bao....@hisilicon.com>; Mel Gorman <mgor...@suse.de> > Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes > > On 21/01/2021 19:21, Valentin Schneider wrote: > > On 21/01/21 19:39, Meelis Roos wrote: > >>> Could you paste the output of the below? > >>> > >>> $ cat /sys/devices/system/node/node*/distance > >> > >> 10 12 12 14 14 14 14 16 > >> 12 10 14 12 14 14 12 14 > >> 12 14 10 14 12 12 14 14 > >> 14 12 14 10 12 12 14 14 > >> 14 14 12 12 10 14 12 14 > >> 14 14 12 12 14 10 14 12 > >> 14 12 14 14 12 14 10 12 > >> 16 14 14 14 14 12 12 10 > >> > > > > Thanks! > > > >> > >>> Additionally, booting your system with CONFIG_SCHED_DEBUG=y and > >>> appending 'sched_debug' to your cmdline should yield some extra data. > >> > >> [ 0.000000] Linux version 5.11.0-rc4-00015-g45dfb8a5659a (mroos@x4600m2) > (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) > 2.35.1) > #55 SMP Thu Jan 21 19:23:10 EET 2021 > >> [ 0.000000] Command line: > BOOT_IMAGE=/boot/vmlinuz-5.11.0-rc4-00015-g45dfb8a5659a root=/dev/sda1 ro > quiet > > > > This is missing 'sched_debug' to get the extra topology debug prints (yes > > it needs an extra cmdline argument on top of having CONFIG_SCHED_DEBUG=y), > > but I should be able to generate those locally by feeding QEMU the above > > distance table. > > Can be recreated with (simplified with only 1 CPU per node): > > $ qemu-system-aarch64 -kernel /opt/git/kernel_org/arch/arm64/boot/Image -hda > /opt/git/tools/qemu-imgs-manipulator/images/qemu-image-aarch64.img -append > 'root=/dev/vda console=ttyAMA0 loglevel=8 sched_debug' -nographic -machine > virt,gic-version=max -smp cores=8 -m 512 -cpu cortex-a57 -numa > node,cpus=0,nodeid=0 -numa node,cpus=1,nodeid=1, -numa node,cpus=2,nodeid=2, > -numa node,cpus=3,nodeid=3, -numa node,cpus=4,nodeid=4, -numa > node,cpus=5,nodeid=5, -numa node,cpus=6,nodeid=6, -numa node,cpus=7,nodeid=7, > -numa dist,src=0,dst=1,val=12, -numa dist,src=0,dst=2,val=12, -numa > dist,src=0,dst=3,val=14, -numa dist,src=0,dst=4,val=14, -numa > dist,src=0,dst=5,val=14, -numa dist,src=0,dst=6,val=14, -numa > dist,src=0,dst=7,val=16, -numa dist,src=1,dst=2,val=14, -numa > dist,src=1,dst=3,val=12, -numa dist,src=1,dst=4,val=14, -numa > dist,src=1,dst=5,val=14, -numa dist,src=1,dst=6,val=12, -numa > dist,src=1,dst=7,val=14, -numa dist,src=2,dst=3,val=14, -numa > dist,src=2,dst=4,val=12, -numa dist,src=2,dst=5,val=12, -numa > dist,src=2,dst=6,val=14, -numa dist,src=2,dst=7,val=14, -numa > dist,src=3,dst=4,val=12, -numa dist,src=3,dst=5,val=12, -numa > dist,src=3,dst=6,val=14, -numa dist,src=3,dst=7,val=14, -numa > dist,src=4,dst=5,val=14, -numa dist,src=4,dst=6,val=12, -numa > dist,src=4,dst=7,val=14, -numa dist,src=5,dst=6,val=14, -numa > dist,src=5,dst=7,val=12, -numa dist,src=6,dst=7,val=12 > > [ 0.206628] ------------[ cut here ]------------ > [ 0.206698] Shortest NUMA path spans too many nodes > [ 0.207119] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:753 > cpu_attach_domain+0x42c/0x87c > [ 0.207176] Modules linked in: > [ 0.207373] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.11.0-rc2-00010-g65bcf072e20e-dirty #81 > [ 0.207458] Hardware name: linux,dummy-virt (DT) > [ 0.207584] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > [ 0.207618] pc : cpu_attach_domain+0x42c/0x87c > [ 0.207646] lr : cpu_attach_domain+0x42c/0x87c > [ 0.207665] sp : ffff800011fcbbf0 > [ 0.207679] x29: ffff800011fcbbf0 x28: ffff0000024d8200 > [ 0.207735] x27: 0000000000001fef x26: 0000000000001917 > [ 0.207755] x25: ffff0000024d8000 x24: 0000000000001917 > [ 0.207772] x23: 0000000000000000 x22: ffff800011b69a40 > [ 0.207789] x21: ffff0000024d8320 x20: ffff8000116fda80 > [ 0.207806] x19: ffff0000024d8000 x18: 0000000000000000 > [ 0.207822] x17: 0000000000000000 x16: 00000000bd30d762 > [ 0.207838] x15: 0000000000000030 x14: ffffffffffffffff > [ 0.207855] x13: ffff800011b82e08 x12: 00000000000001b9 > [ 0.207871] x11: 0000000000000093 x10: ffff800011bdae08 > [ 0.207887] x9 : 00000000fffff000 x8 : ffff800011b82e08 > [ 0.207922] x7 : ffff800011bdae08 x6 : 0000000000000000 > [ 0.207939] x5 : 0000000000000000 x4 : 0000000000000000 > [ 0.207955] x3 : 00000000ffffffff x2 : 0000000000000000 > [ 0.207972] x1 : 0000000000000000 x0 : ffff000018020000 > [ 0.208125] Call trace: > [ 0.208230] cpu_attach_domain+0x42c/0x87c > [ 0.208256] build_sched_domains+0x1238/0x12f4 > [ 0.208271] sched_init_domains+0x80/0xb0 > [ 0.208283] sched_init_smp+0x30/0x80 > [ 0.208299] kernel_init_freeable+0xf4/0x238 > [ 0.208313] kernel_init+0x14/0x118 > [ 0.208328] ret_from_fork+0x10/0x34 > [ 0.208507] ---[ end trace 75cafa7c7d1a3d7e ]--- > [ 0.208706] CPU0 attaching sched-domain(s): > [ 0.208756] domain-0: span=0-2 level=NUMA > [ 0.209001] groups: 0:{ span=0 cap=1017 }, 1:{ span=1 cap=1016 }, 2:{ > span=2 > cap=1015 } > [ 0.209247] domain-1: span=0-6 level=NUMA > [ 0.209280] groups: 0:{ span=0-2 mask=0 cap=3048 }, 3:{ span=1,3-5 > mask=3 > cap=4073 }, 6:{ span=1,4,6-7 mask=6 cap=4084 } > [ 0.209693] ERROR: groups don't span domain->span > [ 0.209703] domain-2: span=0-7 level=NUMA > [ 0.209722] groups: 0:{ span=0-6 mask=0 cap=7114 }, 7:{ span=1-7 mask=7 > cap=7163 } > [ 0.210361] CPU1 attaching sched-domain(s): > [ 0.210376] domain-0: span=0-1,3,6 level=NUMA > [ 0.210411] groups: 1:{ span=1 cap=1016 }, 3:{ span=3 cap=1018 }, 6:{ > span=6 > cap=1017 }, 0:{ span=0 cap=1017 } > [ 0.210493] domain-1: span=0-7 level=NUMA > [ 0.210511] groups: 1:{ span=0-1,3,6 mask=1 cap=4075 }, 2:{ span=0,2,4-5 > mask=2 cap=4070 }, 7:{ span=5-7 mask=7 cap=3067 } > [ 0.210641] CPU2 attaching sched-domain(s): > [ 0.210653] domain-0: span=0,2,4-5 level=NUMA > [ 0.210672] groups: 2:{ span=2 cap=1015 }, 4:{ span=4 cap=1016 }, 5:{ > span=5 > cap=1015 }, 0:{ span=0 cap=1017 } > [ 0.210752] domain-1: span=0-7 level=NUMA > [ 0.210769] groups: 2:{ span=0,2,4-5 mask=2 cap=4070 }, 3:{ span=1,3-5 > mask=3 cap=4073 }, 6:{ span=1,4,6-7 mask=6 cap=4084 } > [ 0.210860] CPU3 attaching sched-domain(s): > [ 0.210870] domain-0: span=1,3-5 level=NUMA > [ 0.210887] groups: 3:{ span=3 cap=1018 }, 4:{ span=4 cap=1016 }, 5:{ > span=5 > cap=1015 }, 1:{ span=1 cap=1016 } > [ 0.210965] domain-1: span=0-7 level=NUMA > [ 0.210981] groups: 3:{ span=1,3-5 mask=3 cap=4073 }, 6:{ span=1,4,6-7 > mask=6 cap=4084 }, 0:{ span=0-2 mask=0 cap=3048 } > [ 0.211109] CPU4 attaching sched-domain(s): > [ 0.211134] domain-0: span=2-4,6 level=NUMA > [ 0.211151] groups: 4:{ span=4 cap=1016 }, 6:{ span=6 cap=1017 }, 2:{ > span=2 > cap=1015 }, 3:{ span=3 cap=1018 } > [ 0.211229] domain-1: span=0-7 level=NUMA > [ 0.211245] groups: 4:{ span=2-4,6 mask=4 cap=4081 }, 5:{ span=2-3,5,7 > mask=5 cap=4082 }, 0:{ span=0-2 mask=0 cap=3048 } > [ 0.211383] CPU5 attaching sched-domain(s): > [ 0.211393] domain-0: span=2-3,5,7 level=NUMA > [ 0.211425] groups: 5:{ span=5 cap=1015 }, 7:{ span=7 cap=1019 }, 2:{ > span=2 > cap=1015 }, 3:{ span=3 cap=1018 } > [ 0.211506] domain-1: span=0-7 level=NUMA > [ 0.211524] groups: 5:{ span=2-3,5,7 mask=5 cap=4082 }, 6:{ span=1,4,6-7 > mask=6 cap=4084 }, 0:{ span=0-2 mask=0 cap=3048 } > [ 0.211618] CPU6 attaching sched-domain(s): > [ 0.211628] domain-0: span=1,4,6-7 level=NUMA > [ 0.211645] groups: 6:{ span=6 cap=1017 }, 7:{ span=7 cap=1019 }, 1:{ > span=1 > cap=1016 }, 4:{ span=4 cap=1016 } > [ 0.211728] domain-1: span=0-7 level=NUMA > [ 0.211745] groups: 6:{ span=1,4,6-7 mask=6 cap=4084 }, 0:{ span=0-2 > mask=0 > cap=3048 }, 3:{ span=1,3-5 mask=3 cap=4073 } > [ 0.211855] CPU7 attaching sched-domain(s): > [ 0.211866] domain-0: span=5-7 level=NUMA > [ 0.211884] groups: 7:{ span=7 cap=1019 }, 5:{ span=5 cap=1015 }, 6:{ > span=6 > cap=1017 } > [ 0.211949] domain-1: span=1-7 level=NUMA > [ 0.211966] groups: 7:{ span=5-7 mask=7 cap=3067 }, 1:{ span=0-1,3,6 > mask=1 > cap=4075 }, 2:{ span=0,2,4-5 mask=2 cap=4070 } > [ 0.212047] ERROR: groups don't span domain->span > [ 0.212055] domain-2: span=0-7 level=NUMA > [ 0.212072] groups: 7:{ span=1-7 mask=7 cap=7163 }, 0:{ span=0-6 mask=0 > cap=7114 } > > # cat /sys/devices/system/node/node*/distance > 10 12 12 14 14 14 14 16 > 12 10 14 12 14 14 12 14 > 12 14 10 14 12 12 14 14 > 14 12 14 10 12 12 14 14 > 14 14 12 12 10 14 12 14 > 14 14 12 12 14 10 14 12 > 14 12 14 14 12 14 10 12 > 16 14 14 14 14 12 12 10 > > The '16' seems to be the culprit. How does such a topo look like?
Once we get a topology like this: +------+ +------+ +-------+ +------+ | node | |node | | node | |node | | +---------+ +--------+ +-------+ | +------+ +------+ +-------+ +------+ We can reproduce this issue. For example, every cpu with the below numa_distance can have "groups don't span domain->span": node 0 1 2 3 0: 10 12 20 22 1: 12 10 22 24 2: 20 22 10 12 3: 22 24 12 10 Qemu: qemu-system-aarch64 -M virt -nographic \ -smp cpus=8 \ -numa node,cpus=0-1,nodeid=0 \ -numa node,cpus=2-3,nodeid=1 \ -numa node,cpus=4-5,nodeid=2 \ -numa node,cpus=6-7,nodeid=3 \ -numa dist,src=0,dst=1,val=12 \ -numa dist,src=0,dst=2,val=20 \ -numa dist,src=0,dst=3,val=22 \ -numa dist,src=1,dst=2,val=22 \ -numa dist,src=2,dst=3,val=12 \ -numa dist,src=1,dst=3,val=24 \ Boot log: [ 0.834496] CPU0 attaching sched-domain(s): [ 0.834546] domain-0: span=0-1 level=MC [ 0.834754] groups: 0:{ span=0 cap=1011 }, 1:{ span=1 cap=970 } [ 0.835018] domain-1: span=0-3 level=NUMA [ 0.835052] groups: 0:{ span=0-1 cap=1981 }, 2:{ span=2-3 cap=1997 } [ 0.835128] domain-2: span=0-5 level=NUMA [ 0.835144] groups: 0:{ span=0-3 cap=3978 }, 4:{ span=4-7 cap=3864 } [ 0.835195] ERROR: groups don't span domain->span [ 0.835206] domain-3: span=0-7 level=NUMA [ 0.835222] groups: 0:{ span=0-5 mask=0-1 cap=5933 }, 6:{ span=4-7 mask=6-7 cap=3957 } [ 0.835959] CPU1 attaching sched-domain(s): [ 0.835974] domain-0: span=0-1 level=MC [ 0.835996] groups: 1:{ span=1 cap=970 }, 0:{ span=0 cap=1011 } [ 0.836049] domain-1: span=0-3 level=NUMA [ 0.836065] groups: 0:{ span=0-1 cap=1981 }, 2:{ span=2-3 cap=1997 } [ 0.836114] domain-2: span=0-5 level=NUMA [ 0.836130] groups: 0:{ span=0-3 cap=3978 }, 4:{ span=4-7 cap=3864 } [ 0.836178] ERROR: groups don't span domain->span [ 0.836188] domain-3: span=0-7 level=NUMA [ 0.836204] groups: 0:{ span=0-5 mask=0-1 cap=5933 }, 6:{ span=4-7 mask=6-7 cap=3957 } [ 0.836290] CPU2 attaching sched-domain(s): [ 0.836299] domain-0: span=2-3 level=MC [ 0.836316] groups: 2:{ span=2 cap=983 }, 3:{ span=3 cap=1014 } [ 0.836364] domain-1: span=0-3 level=NUMA [ 0.836379] groups: 2:{ span=2-3 cap=1997 }, 0:{ span=0-1 cap=1981 } [ 0.836427] domain-2: span=0-5 level=NUMA [ 0.836442] groups: 2:{ span=0-3 mask=2-3 cap=4045 }, 4:{ span=0-1,4-7 mask=4-5 cap=5912 } [ 0.836538] ERROR: groups don't span domain->span [ 0.836549] domain-3: span=0-7 level=NUMA [ 0.836580] groups: 2:{ span=0-5 mask=2-3 cap=6000 }, 6:{ span=0-1,4-7 mask=6-7 cap=6005 } [ 0.836667] CPU3 attaching sched-domain(s): [ 0.836675] domain-0: span=2-3 level=MC [ 0.836690] groups: 3:{ span=3 cap=1014 }, 2:{ span=2 cap=983 } [ 0.836734] domain-1: span=0-3 level=NUMA [ 0.836749] groups: 2:{ span=2-3 cap=1997 }, 0:{ span=0-1 cap=1981 } [ 0.836793] domain-2: span=0-5 level=NUMA [ 0.836822] groups: 2:{ span=0-3 mask=2-3 cap=4045 }, 4:{ span=0-1,4-7 mask=4-5 cap=5912 } [ 0.836879] ERROR: groups don't span domain->span [ 0.836888] domain-3: span=0-7 level=NUMA [ 0.836903] groups: 2:{ span=0-5 mask=2-3 cap=6000 }, 6:{ span=0-1,4-7 mask=6-7 cap=6005 } [ 0.836975] CPU4 attaching sched-domain(s): [ 0.836982] domain-0: span=4-5 level=MC [ 0.836997] groups: 4:{ span=4 cap=945 }, 5:{ span=5 cap=1010 } [ 0.837041] domain-1: span=4-7 level=NUMA [ 0.837057] groups: 4:{ span=4-5 cap=1955 }, 6:{ span=6-7 cap=1909 } [ 0.837102] domain-2: span=0-1,4-7 level=NUMA [ 0.837117] groups: 4:{ span=4-7 cap=3864 }, 0:{ span=0-3 cap=3978 } [ 0.837161] ERROR: groups don't span domain->span [ 0.837170] domain-3: span=0-7 level=NUMA [ 0.837185] groups: 4:{ span=0-1,4-7 mask=4-5 cap=5912 }, 2:{ span=0-3 mask=2-3 cap=4045 } [ 0.837252] CPU5 attaching sched-domain(s): [ 0.837260] domain-0: span=4-5 level=MC [ 0.837275] groups: 5:{ span=5 cap=1010 }, 4:{ span=4 cap=945 } [ 0.837320] domain-1: span=4-7 level=NUMA [ 0.837334] groups: 4:{ span=4-5 cap=1955 }, 6:{ span=6-7 cap=1909 } [ 0.837378] domain-2: span=0-1,4-7 level=NUMA [ 0.837393] groups: 4:{ span=4-7 cap=3864 }, 0:{ span=0-3 cap=3978 } [ 0.837437] ERROR: groups don't span domain->span [ 0.837445] domain-3: span=0-7 level=NUMA [ 0.837460] groups: 4:{ span=0-1,4-7 mask=4-5 cap=5912 }, 2:{ span=0-3 mask=2-3 cap=4045 } [ 0.837552] CPU6 attaching sched-domain(s): [ 0.837560] domain-0: span=6-7 level=MC [ 0.837576] groups: 6:{ span=6 cap=1002 }, 7:{ span=7 cap=907 } [ 0.837621] domain-1: span=4-7 level=NUMA [ 0.837635] groups: 6:{ span=6-7 cap=1909 }, 4:{ span=4-5 cap=1955 } [ 0.837679] domain-2: span=0-1,4-7 level=NUMA [ 0.837695] groups: 6:{ span=4-7 mask=6-7 cap=3957 }, 0:{ span=0-5 mask=0-1 cap=5933 } [ 0.837749] ERROR: groups don't span domain->span [ 0.837758] domain-3: span=0-7 level=NUMA [ 0.837774] groups: 6:{ span=0-1,4-7 mask=6-7 cap=6005 }, 2:{ span=0-5 mask=2-3 cap=6000 } [ 0.838055] CPU7 attaching sched-domain(s): [ 0.838066] domain-0: span=6-7 level=MC [ 0.838086] groups: 7:{ span=7 cap=907 }, 6:{ span=6 cap=1002 } [ 0.838135] domain-1: span=4-7 level=NUMA [ 0.838151] groups: 6:{ span=6-7 cap=1909 }, 4:{ span=4-5 cap=1955 } [ 0.838198] domain-2: span=0-1,4-7 level=NUMA [ 0.838214] groups: 6:{ span=4-7 mask=6-7 cap=3957 }, 0:{ span=0-5 mask=0-1 cap=5933 } [ 0.838272] ERROR: groups don't span domain->span [ 0.838282] domain-3: span=0-7 level=NUMA [ 0.838298] groups: 6:{ span=0-1,4-7 mask=6-7 cap=6005 }, 2:{ span=0-5 mask=2-3 cap=6000 } [ 0.838414] root domain span: 0-7 (max cpu_capacity = 1024) Thanks Barry