Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-13 Thread Satheesh Rajendran
On Fri, Sep 11, 2020 at 09:55:23PM +1000, Michael Ellerman wrote:
> Srikar Dronamraju  writes:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> > cpu_sibling_mask if and only if shared_caches is set.
> 
> I'm seeing oopses with this:
> 
> [0.117392][T1] smp: Bringing up secondary CPUs ...
> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
> [0.158265][T1] numa: Node 0 CPUs: 0
> [0.158520][T1] numa: Node 1 CPUs: 1
> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
> 0x800041228298
> [0.167992][T1] Faulting instruction address: 0xc018c128
> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.169417][T1] Modules linked in:
> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #209
> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
> c004dce0
> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
> (5.9.0-rc2-00095-g7430ad5aa700)
> [0.170602][T1] MSR:  82009033   CR: 
> 4400  XER: 
> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
> [0.170985][T1] GPR00:  c0007e343b10 
> c173e400 4000
> [0.170985][T1] GPR04:  0800 
> 0800 
> [0.170985][T1] GPR08:  c122c298 
> c0003fffc000 c0007fd05ce8
> [0.170985][T1] GPR12: c0007e0119f8 c193 
> 8ade 
> [0.170985][T1] GPR16: c0007e3c0640 0917 
> c0007e3c0658 0008
> [0.170985][T1] GPR20: c15d0bb8 8ade 
> c0f57400 c1817c28
> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
> c0007e3cfe00 
> [0.170985][T1] GPR28: c1772310 c0007e011900 
> c0007e3c0800 0001
> [0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
> [0.173186][T1] Call Trace:
> [0.173484][T1] [c0007e343b10] [c018bfe8] 
> build_sched_domains+0x6c8/0x14b0 (unreliable)
> [0.173821][T1] [c0007e343c50] [c018dcdc] 
> sched_init_domains+0xec/0x130
> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
> sched_init_smp+0x50/0xc4
> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
> kernel_init_freeable+0x1b4/0x378
> [0.174378][T1] [c0007e343db0] [c00129fc] 
> kernel_init+0x24/0x158
> [0.174740][T1] [c0007e343e20] [c000d9d0] 
> ret_from_kernel_thread+0x5c/0x6c
> [0.175050][T1] Instruction dump:
> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
> 913e002c 41820034
> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
> f93e0080 7d404828 314a0001
> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
> [0.178512][T1]
> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b
> 
> That's qemu:
> 
> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>   -kernel build~/vmlinux \
>   -m 2G,slots=2,maxmem=4G \
>   -object memory-backend-ram,size=1G,id=m0 \
>   -object memory-backend-ram,size=1G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 2,sockets=2,maxcpus=2  \

PowerKVM guest vCPUs does not yet have L2 and L3 cache elements
I had this bug raised some time ago, probably related?
https://bugs.launchpad.net/qemu/+bug/1774605


Regards,
-Satheesh.
> 
> 
> On mambo I get:
> 
> [0.005069][T1] smp: Bringing up secondary CPUs ...
> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
> [0.011682][T1] numa: Node 0 CPUs: 0-3
> [0.011709][T1] numa: Node 1 CPUs: 4-7
> [0.012015][T1] BUG: arch topology borken
> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
> 0x8001012e7398
> [0.012142][T1] Faulting instruction address: 0xc01aa4f0
> [0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [0.012236][T1] Modules linked in:
> [0.012264][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #1
> [0.012304][T1] NIP:  c01aa4f0 LR: c01aa498 CTR: 
> 
> [0.012341][T1] REGS: c000ef583880 TRAP: 0380   Not tainted  
> 

Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-13 Thread Srikar Dronamraju
Fix to make it work where CPUs dont have a l2-cache element.

>8-8<-

>From b25d47b01b7195b1df19083a4043fa6a87a901a3 Mon Sep 17 00:00:00 2001
From: Srikar Dronamraju 
Date: Thu, 9 Jul 2020 13:33:38 +0530
Subject: [PATCH v5.2 05/10] powerpc/smp: Dont assume l2-cache to be superset of
 sibling

Current code assumes that cpumask of cpus sharing a l2-cache mask will
always be a superset of cpu_sibling_mask.

Lets stop that assumption. cpu_l2_cache_mask is a superset of
cpu_sibling_mask if and only if shared_caches is set.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1 -> v2:
Set cpumask after verifying l2-cache. (Gautham)

Changelog v5 -> v5.2:
If cpu has no l2-cache set cpumask as per its
 sibling mask. (Michael Ellerman)

 arch/powerpc/kernel/smp.c | 43 +--
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 9f4333d..168532e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1186,9 +1186,23 @@ static bool update_mask_by_l2(int cpu, struct cpumask 
*(*mask_fn)(int))
int i;
 
l2_cache = cpu_to_l2cache(cpu);
-   if (!l2_cache)
+   if (!l2_cache) {
+   struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
+
+   /*
+* If no l2cache for this CPU, assume all siblings to share
+* cache with this CPU.
+*/
+   if (has_big_cores)
+   sibling_mask = cpu_smallcore_mask;
+
+   for_each_cpu(i, sibling_mask(cpu))
+   set_cpus_related(cpu, i, cpu_l2_cache_mask);
+
return false;
+   }
 
+   cpumask_set_cpu(cpu, mask_fn(cpu));
for_each_cpu(i, cpu_online_mask) {
/*
 * when updating the marks the current CPU has not been marked
@@ -1271,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
 * add it to it's own thread sibling mask.
 */
cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
+   cpumask_set_cpu(cpu, cpu_core_mask(cpu));
 
for (i = first_thread; i < first_thread + threads_per_core; i++)
if (cpu_online(i))
set_cpus_related(i, cpu, cpu_sibling_mask);
 
add_cpu_to_smallcore_masks(cpu);
-   /*
-* Copy the thread sibling mask into the cache sibling mask
-* and mark any CPUs that share an L2 with this CPU.
-*/
-   for_each_cpu(i, cpu_sibling_mask(cpu))
-   set_cpus_related(cpu, i, cpu_l2_cache_mask);
update_mask_by_l2(cpu, cpu_l2_cache_mask);
 
-   /*
-* Copy the cache sibling mask into core sibling mask and mark
-* any CPUs on the same chip as this CPU.
-*/
-   for_each_cpu(i, cpu_l2_cache_mask(cpu))
-   set_cpus_related(cpu, i, cpu_core_mask);
+   if (pkg_id == -1) {
+   struct cpumask *(*mask)(int) = cpu_sibling_mask;
+
+   /*
+* Copy the sibling mask into core sibling mask and
+* mark any CPUs on the same chip as this CPU.
+*/
+   if (shared_caches)
+   mask = cpu_l2_cache_mask;
+
+   for_each_cpu(i, mask(cpu))
+   set_cpus_related(cpu, i, cpu_core_mask);
 
-   if (pkg_id == -1)
return;
+   }
 
for_each_cpu(i, cpu_online_mask)
if (get_physical_package_id(i) == pkg_id)
-- 
2.17.1



Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-13 Thread Srikar Dronamraju
* Michael Ellerman  [2020-09-13 11:46:41]:

> Srikar Dronamraju  writes:
> > * Michael Ellerman  [2020-09-11 21:55:23]:
> >
> >> Srikar Dronamraju  writes:
> >> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> >> > always be a superset of cpu_sibling_mask.
> >> >
> >> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> >> > cpu_sibling_mask if and only if shared_caches is set.
> >> 
> >> I'm seeing oopses with this:
> >> 
> 
> The patch fixes qemu, and I don't see the crash on mambo, but I still
> see:
>   [0.010536] smp: Bringing up secondary CPUs ...
>   [0.019189] smp: Brought up 2 nodes, 8 CPUs
>   [0.019210] numa: Node 0 CPUs: 0-3
>   [0.019235] numa: Node 1 CPUs: 4-7
>   [0.02]  the CACHE domain not a subset of the MC domain
>   [0.024505] BUG: arch topology borken
>   [0.024527]  the SMT domain not a subset of the CACHE domain
>   [0.024563] BUG: arch topology borken
>   [0.024584]  the CACHE domain not a subset of the MC domain
>   [0.024645] BUG: arch topology borken
>   [0.024666]  the SMT domain not a subset of the CACHE domain
>   [0.024702] BUG: arch topology borken
>   [0.024723]  the CACHE domain not a subset of the MC domain
> 
> That's the p9 mambo model, using skiboot.tcl from skiboot, with CPUS=2,
> THREADS=4 and MAMBO_NUMA=1.
> 

I was able to reproduce with
 qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
   -kernel build~/vmlinux \
   -m 2G,slots=2,maxmem=4G \
   -object memory-backend-ram,size=1G,id=m0 \
   -object memory-backend-ram,size=1G,id=m1 \
   -numa node,nodeid=0,memdev=m0 \
   -numa node,nodeid=1,memdev=m1 \
   -smp 8,threads=4,sockets=2,maxcpus=8  \


If the CPU doesn't have a l2-cache element, then CPU not only has to set
itself in the cpu_l2_cache but also the siblings. Otherwise it will so
happen that the Siblings will have 4 Cpus set, and the Cache domain will
have just one cpu set, leading to this BUG message.

Patch follows this mail.

> Node layout is:
> 
> [0.00] Early memory node ranges
> [0.00]   node   0: [mem 0x-0x]
> [0.00]   node   1: [mem 0x2000-0x2000]
> [0.00] Initmem setup node 0 [mem 
> 0x-0x]
> [0.00] On node 0 totalpages: 65536
> [0.00] Initmem setup node 1 [mem 
> 0x2000-0x2000]
> [0.00] On node 1 totalpages: 65536
> 
> 
> There aren't any l2-cache properties in the device-tree under cpus.
> 
> I'll try and have a closer look tonight.
> 
> cheers

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-12 Thread Michael Ellerman
Srikar Dronamraju  writes:
> * Michael Ellerman  [2020-09-11 21:55:23]:
>
>> Srikar Dronamraju  writes:
>> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
>> > always be a superset of cpu_sibling_mask.
>> >
>> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
>> > cpu_sibling_mask if and only if shared_caches is set.
>> 
>> I'm seeing oopses with this:
>> 
>> [0.117392][T1] smp: Bringing up secondary CPUs ...
>> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
>> [0.158265][T1] numa: Node 0 CPUs: 0
>> [0.158520][T1] numa: Node 1 CPUs: 1
>> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
>> 0x800041228298
>> [0.167992][T1] Faulting instruction address: 0xc018c128
>> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
>> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA 
>> pSeries
>> [0.169417][T1] Modules linked in:
>> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
>> 5.9.0-rc2-00095-g7430ad5aa700 #209
>> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
>> c004dce0
>> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
>> (5.9.0-rc2-00095-g7430ad5aa700)
>> [0.170602][T1] MSR:  82009033   
>> CR: 4400  XER: 
>> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
>> [0.170985][T1] GPR00:  c0007e343b10 
>> c173e400 4000
>> [0.170985][T1] GPR04:  0800 
>> 0800 
>> [0.170985][T1] GPR08:  c122c298 
>> c0003fffc000 c0007fd05ce8
>> [0.170985][T1] GPR12: c0007e0119f8 c193 
>> 8ade 
>> [0.170985][T1] GPR16: c0007e3c0640 0917 
>> c0007e3c0658 0008
>> [0.170985][T1] GPR20: c15d0bb8 8ade 
>> c0f57400 c1817c28
>> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
>> c0007e3cfe00 
>> [0.170985][T1] GPR28: c1772310 c0007e011900 
>> c0007e3c0800 0001
>> [0.172750][T1] NIP [c018c128] 
>> build_sched_domains+0x808/0x14b0
>> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
>> [0.173186][T1] Call Trace:
>> [0.173484][T1] [c0007e343b10] [c018bfe8] 
>> build_sched_domains+0x6c8/0x14b0 (unreliable)
>> [0.173821][T1] [c0007e343c50] [c018dcdc] 
>> sched_init_domains+0xec/0x130
>> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
>> sched_init_smp+0x50/0xc4
>> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
>> kernel_init_freeable+0x1b4/0x378
>> [0.174378][T1] [c0007e343db0] [c00129fc] 
>> kernel_init+0x24/0x158
>> [0.174740][T1] [c0007e343e20] [c000d9d0] 
>> ret_from_kernel_thread+0x5c/0x6c
>> [0.175050][T1] Instruction dump:
>> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
>> 913e002c 41820034
>> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
>> f93e0080 7d404828 314a0001
>> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
>> [0.178512][T1]
>> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x000b
>> 
>> That's qemu:
>> 
>> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>>   -kernel build~/vmlinux \
>>   -m 2G,slots=2,maxmem=4G \
>>   -object memory-backend-ram,size=1G,id=m0 \
>>   -object memory-backend-ram,size=1G,id=m1 \
>>   -numa node,nodeid=0,memdev=m0 \
>>   -numa node,nodeid=1,memdev=m1 \
>>   -smp 2,sockets=2,maxcpus=2  \
>> 
>
> Thanks Michael for the report and also for identifying the patch and also
> giving an easy reproducer. That made my task easy. (My only problem was all
> my PowerKVM hosts had a old compiler that refuse to compile never kernels.)
>
> So in this setup, CPU doesn't have a l2-cache. And in that scenario, we
> miss updating the l2-cache domain. Actually the initial patch had this
> exact code. However it was my mistake. I should have reassessed it before
> making changes suggested by Gautham.
>
> Patch below. Do let me know if you want me to send the patch separately.

>> On mambo I get:
>> 
>> [0.005069][T1] smp: Bringing up secondary CPUs ...
>> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
>> [0.011682][T1] numa: Node 0 CPUs: 0-3
>> [0.011709][T1] numa: Node 1 CPUs: 4-7
>> [0.012015][T1] BUG: arch topology borken
>> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
>> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
>> 0x8001012e7398
>> [0.012142][T1] Faulting instruction address: 

Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-11 Thread Srikar Dronamraju
* Michael Ellerman  [2020-09-11 21:55:23]:

> Srikar Dronamraju  writes:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> > cpu_sibling_mask if and only if shared_caches is set.
> 
> I'm seeing oopses with this:
> 
> [0.117392][T1] smp: Bringing up secondary CPUs ...
> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
> [0.158265][T1] numa: Node 0 CPUs: 0
> [0.158520][T1] numa: Node 1 CPUs: 1
> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
> 0x800041228298
> [0.167992][T1] Faulting instruction address: 0xc018c128
> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.169417][T1] Modules linked in:
> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #209
> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
> c004dce0
> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
> (5.9.0-rc2-00095-g7430ad5aa700)
> [0.170602][T1] MSR:  82009033   CR: 
> 4400  XER: 
> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
> [0.170985][T1] GPR00:  c0007e343b10 
> c173e400 4000
> [0.170985][T1] GPR04:  0800 
> 0800 
> [0.170985][T1] GPR08:  c122c298 
> c0003fffc000 c0007fd05ce8
> [0.170985][T1] GPR12: c0007e0119f8 c193 
> 8ade 
> [0.170985][T1] GPR16: c0007e3c0640 0917 
> c0007e3c0658 0008
> [0.170985][T1] GPR20: c15d0bb8 8ade 
> c0f57400 c1817c28
> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
> c0007e3cfe00 
> [0.170985][T1] GPR28: c1772310 c0007e011900 
> c0007e3c0800 0001
> [0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
> [0.173186][T1] Call Trace:
> [0.173484][T1] [c0007e343b10] [c018bfe8] 
> build_sched_domains+0x6c8/0x14b0 (unreliable)
> [0.173821][T1] [c0007e343c50] [c018dcdc] 
> sched_init_domains+0xec/0x130
> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
> sched_init_smp+0x50/0xc4
> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
> kernel_init_freeable+0x1b4/0x378
> [0.174378][T1] [c0007e343db0] [c00129fc] 
> kernel_init+0x24/0x158
> [0.174740][T1] [c0007e343e20] [c000d9d0] 
> ret_from_kernel_thread+0x5c/0x6c
> [0.175050][T1] Instruction dump:
> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
> 913e002c 41820034
> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
> f93e0080 7d404828 314a0001
> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
> [0.178512][T1]
> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b
> 
> That's qemu:
> 
> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>   -kernel build~/vmlinux \
>   -m 2G,slots=2,maxmem=4G \
>   -object memory-backend-ram,size=1G,id=m0 \
>   -object memory-backend-ram,size=1G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 2,sockets=2,maxcpus=2  \
> 

Thanks Michael for the report and also for identifying the patch and also
giving an easy reproducer. That made my task easy. (My only problem was all
my PowerKVM hosts had a old compiler that refuse to compile never kernels.)

So in this setup, CPU doesn't have a l2-cache. And in that scenario, we
miss updating the l2-cache domain. Actually the initial patch had this
exact code. However it was my mistake. I should have reassessed it before
making changes suggested by Gautham.

Patch below. Do let me know if you want me to send the patch separately.

> 
> On mambo I get:
> 
> [0.005069][T1] smp: Bringing up secondary CPUs ...
> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
> [0.011682][T1] numa: Node 0 CPUs: 0-3
> [0.011709][T1] numa: Node 1 CPUs: 4-7
> [0.012015][T1] BUG: arch topology borken
> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
> 0x8001012e7398
> [0.012142][T1] Faulting instruction address: 0xc01aa4f0
> [0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP 

Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-11 Thread Michael Ellerman
Srikar Dronamraju  writes:
> Current code assumes that cpumask of cpus sharing a l2-cache mask will
> always be a superset of cpu_sibling_mask.
>
> Lets stop that assumption. cpu_l2_cache_mask is a superset of
> cpu_sibling_mask if and only if shared_caches is set.

I'm seeing oopses with this:

[0.117392][T1] smp: Bringing up secondary CPUs ...
[0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
[0.158265][T1] numa: Node 0 CPUs: 0
[0.158520][T1] numa: Node 1 CPUs: 1
[0.167453][T1] BUG: Unable to handle kernel data access on read at 
0x800041228298
[0.167992][T1] Faulting instruction address: 0xc018c128
[0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
[0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[0.169417][T1] Modules linked in:
[0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.9.0-rc2-00095-g7430ad5aa700 #209
[0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
c004dce0
[0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
(5.9.0-rc2-00095-g7430ad5aa700)
[0.170602][T1] MSR:  82009033   CR: 
4400  XER: 
[0.170985][T1] CFAR: c018c288 IRQMASK: 0
[0.170985][T1] GPR00:  c0007e343b10 
c173e400 4000
[0.170985][T1] GPR04:  0800 
0800 
[0.170985][T1] GPR08:  c122c298 
c0003fffc000 c0007fd05ce8
[0.170985][T1] GPR12: c0007e0119f8 c193 
8ade 
[0.170985][T1] GPR16: c0007e3c0640 0917 
c0007e3c0658 0008
[0.170985][T1] GPR20: c15d0bb8 8ade 
c0f57400 c1817c28
[0.170985][T1] GPR24: c176dc80 c0007e3c0890 
c0007e3cfe00 
[0.170985][T1] GPR28: c1772310 c0007e011900 
c0007e3c0800 0001
[0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
[0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
[0.173186][T1] Call Trace:
[0.173484][T1] [c0007e343b10] [c018bfe8] 
build_sched_domains+0x6c8/0x14b0 (unreliable)
[0.173821][T1] [c0007e343c50] [c018dcdc] 
sched_init_domains+0xec/0x130
[0.174037][T1] [c0007e343ca0] [c10d59d8] 
sched_init_smp+0x50/0xc4
[0.174207][T1] [c0007e343cd0] [c10b45c4] 
kernel_init_freeable+0x1b4/0x378
[0.174378][T1] [c0007e343db0] [c00129fc] 
kernel_init+0x24/0x158
[0.174740][T1] [c0007e343e20] [c000d9d0] 
ret_from_kernel_thread+0x5c/0x6c
[0.175050][T1] Instruction dump:
[0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
913e002c 41820034
[0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> f93e0080 
7d404828 314a0001
[0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
[0.178512][T1]
[1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b

That's qemu:

qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
  -kernel build~/vmlinux \
  -m 2G,slots=2,maxmem=4G \
  -object memory-backend-ram,size=1G,id=m0 \
  -object memory-backend-ram,size=1G,id=m1 \
  -numa node,nodeid=0,memdev=m0 \
  -numa node,nodeid=1,memdev=m1 \
  -smp 2,sockets=2,maxcpus=2  \


On mambo I get:

[0.005069][T1] smp: Bringing up secondary CPUs ...
[0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
[0.011682][T1] numa: Node 0 CPUs: 0-3
[0.011709][T1] numa: Node 1 CPUs: 4-7
[0.012015][T1] BUG: arch topology borken
[0.012040][T1]  the SMT domain not a subset of the CACHE domain
[0.012107][T1] BUG: Unable to handle kernel data access on read at 
0x8001012e7398
[0.012142][T1] Faulting instruction address: 0xc01aa4f0
[0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
[0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.012236][T1] Modules linked in:
[0.012264][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.9.0-rc2-00095-g7430ad5aa700 #1
[0.012304][T1] NIP:  c01aa4f0 LR: c01aa498 CTR: 

[0.012341][T1] REGS: c000ef583880 TRAP: 0380   Not tainted  
(5.9.0-rc2-00095-g7430ad5aa700)
[0.012379][T1] MSR:  92009033   
CR: 4400  XER: 0004
[0.012439][T1] CFAR: c00101b0 IRQMASK: 0
[0.012439][T1] GPR00:  c000ef583b10 
c17fd000 4000
[0.012439][T1] GPR04:  0800 
 
[0.012439][T1] GPR08:  c12eb398 
c000c000 

[PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-08-10 Thread Srikar Dronamraju
Current code assumes that cpumask of cpus sharing a l2-cache mask will
always be a superset of cpu_sibling_mask.

Lets stop that assumption. cpu_l2_cache_mask is a superset of
cpu_sibling_mask if and only if shared_caches is set.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1 -> v2:
Set cpumask after verifying l2-cache. (Gautham)

 arch/powerpc/kernel/smp.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b13161a5ffc3..0c960ce3be42 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1188,6 +1188,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask 
*(*mask_fn)(int))
if (!l2_cache)
return false;
 
+   cpumask_set_cpu(cpu, mask_fn(cpu));
for_each_cpu(i, cpu_online_mask) {
/*
 * when updating the marks the current CPU has not been marked
@@ -1270,29 +1271,30 @@ static void add_cpu_to_masks(int cpu)
 * add it to it's own thread sibling mask.
 */
cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
+   cpumask_set_cpu(cpu, cpu_core_mask(cpu));
 
for (i = first_thread; i < first_thread + threads_per_core; i++)
if (cpu_online(i))
set_cpus_related(i, cpu, cpu_sibling_mask);
 
add_cpu_to_smallcore_masks(cpu);
-   /*
-* Copy the thread sibling mask into the cache sibling mask
-* and mark any CPUs that share an L2 with this CPU.
-*/
-   for_each_cpu(i, cpu_sibling_mask(cpu))
-   set_cpus_related(cpu, i, cpu_l2_cache_mask);
update_mask_by_l2(cpu, cpu_l2_cache_mask);
 
-   /*
-* Copy the cache sibling mask into core sibling mask and mark
-* any CPUs on the same chip as this CPU.
-*/
-   for_each_cpu(i, cpu_l2_cache_mask(cpu))
-   set_cpus_related(cpu, i, cpu_core_mask);
+   if (pkg_id == -1) {
+   struct cpumask *(*mask)(int) = cpu_sibling_mask;
+
+   /*
+* Copy the sibling mask into core sibling mask and
+* mark any CPUs on the same chip as this CPU.
+*/
+   if (shared_caches)
+   mask = cpu_l2_cache_mask;
+
+   for_each_cpu(i, mask(cpu))
+   set_cpus_related(cpu, i, cpu_core_mask);
 
-   if (pkg_id == -1)
return;
+   }
 
for_each_cpu(i, cpu_online_mask)
if (get_physical_package_id(i) == pkg_id)
-- 
2.18.2