Hello Thiago, Wish you a happy 2019!
On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote: > > Gautham R Shenoy <e...@linux.vnet.ibm.com> writes: > > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > >> Sure. I will test the patch and report back. > > > > I added the following debug patch on top of your patch, and after an > > hour's run, the system crashed. Appending the log at the end. > > Thank you very much for testing! Your debug patch was very helpful as > well. > > > I suppose we still need to increase the number of tries since we wait > > only for 2.5ms looping before giving up. > > Do you think it would have helped? In the debug output you posted I > would have expected the following message to show up if the loop > finished too early, and it didn't: > > "Querying DEAD? cpu %i (%i) shows %i\n" > > So I don't think increasing the loop length would have made a > difference. In fact, the call to smp_query_cpu_stopped() always > succeeded on the first iteration. I did some testing during the holidays. Here are the observations: 1) With just your patch (without any additional debug patch), if I run DLPAR on /off operations on a system that has SMT=off, I am able to see a crash involving RTAS stack corruption within an hour's time. 2) With the debug patch (appended below) which has additional debug to capture the callers of stop-self, start-cpu, set-power-levels, the system is able to perform DLPAR on/off operations on a system with SMT=off for three days. And then, it crashed with the dead CPU showing a "Bad kernel stack pointer". From this log, I can clearly see that there were no concurrent calls to stop-self, start-cpu, set-power-levels. The only concurrent RTAS calls were the dying CPU calling "stop-self", and the CPU running the DLPAR operation calling "query-cpu-stopped-state". The crash signature is appended below as well. 3) Modifying your patch to remove the udelay and increase the loop count from 25 to 1000 doesn't improve the situation. We are still able to see the crash. 4) With my patch, even without any additional debug, I was able to observe the system run the tests successfully for over a week (I started the tests before the Christmas weekend, and forgot to turn it off!) It appears that there is a narrow race window involving rtas-stop-self and rtas-query-cpu-stopped-state calls that can be observed with your patch. Adding any printk's seems to reduce the probability of hitting this race window. It might be worth the while to check with RTAS folks, if they suspect something here. The Crash log with this debug patch is as follows [DEBUG] CPU 32 waiting for CPU 130 to enter rtas cpu 130 (hwid 130) Ready to die... [DEBUG] CPU 32 noticed CPU 130 enter rtas: tries=0, time=539 [DEBUG] CPU 32 waiting for CPU 131 to enter rtas cpu 131 (hwid 131) Ready to die... [DEBUG] CPU 32 noticed CPU 131 enter rtas: tries=0, time=137 [DEBUG] CPU 32 waiting for CPU 132 to enter rtas cpu 132 (hwid 132) Ready to die... [DEBUG] CPU 32 noticed CPU 132 enter rtas: tries=0, time=1238 [DEBUG] CPU 32 waiting for CPU 133 to enter rtas cpu 133 (hwid 133) Ready to die... [DEBUG] CPU 32 noticed CPU 133 enter rtas: tries=1, time=1259 [DEBUG] CPU 32 waiting for CPU 134 to enter rtas cpu 134 (hwid 134) Ready to die... [DEBUG] CPU 32 noticed CPU 134 enter rtas: tries=0, time=1141 [DEBUG] CPU 32 waiting for CPU 135 to enter rtas cpu 135 (hwid 135) Ready to die... [DEBUG] CPU 32 noticed CPU 135 enter rtas: tries=0, time=636 cpu 120 (hwid 120) Ready to die... [DEBUG] CPU 32 waiting for CPU 120 to enter rtas [DEBUG] CPU 32 noticed CPU 120 enter rtas: tries=0, time=53 [DEBUG] CPU 32 waiting for CPU 121 to enter rtas cpu 121 (hwid 121) Ready to die... Bad kernel stack pointer 1fafb400 at 1fafb4b0 Bad kernel stack pointer 1fafb4b0 at 1ec15270 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 121 PID: 0 Comm: swapper/121 Not tainted 4.20.0-thiago-original-with-debug+ #57 NIP: 000000001ec15270 LR: 000000001ec1526c CTR: 000000001ecd26c4 REGS: c00000001e577d30 TRAP: 0300 Not tainted (4.20.0-thiago-original-with-debug+) MSR: 8000000000001000 <SF,ME> CR: 48000000 XER: 00000020 CFAR: 000000001ecd27a8 DAR: ffffffffffff8108 DSISR: 40000000 IRQMASK: 8000000000009033 GPR00: 000000001ec1526c 000000001fafb4b0 0000000000000000 0000000000000000 GPR04: 0000000001a40968 00000000000000e0 000000000000dfe8 0000000000000018 GPR08: 000000001f82ae00 000000001ed025d4 000000001f827850 0000000000000000 GPR12: 0000000001b6d998 c00000001eb4e080 c0000003a1bdbf90 000000001eea3020 GPR16: 0000000000000000 c0000006fdc45100 c00000000004c8b0 c0000000013d5300 GPR20: c0000006fdc45300 0000000000000008 c0000000019d2cf8 c0000000013d6888 GPR24: 0000000000000079 c0000000013d688c 0000000000000002 c0000000013d688c GPR28: c0000000019cecf0 00000000000003c8 0000000000000000 000000001fafb4b0 NIP [000000001ec15270] 0x1ec15270 LR [000000001ec1526c] 0x1ec1526c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace dd774c4912df6c89 ]--- The debug patch over your patch is as follows: -----------------------x8------------------------------------------- >From e5cad5ff5b61e003810723e549f44a3cc0cef356 Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> Date: Fri, 7 Dec 2018 16:47:25 +0530 Subject: [PATCH] Add debug to Thiago's patch Also, add debug for callers of rtas-start-cpu and rtas-set-power-level Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> --- arch/powerpc/kernel/rtas.c | 1 + arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 ++++++++++ arch/powerpc/platforms/pseries/smp.c | 2 ++ 3 files changed, 13 insertions(+) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index de35bd8f..9ecf957 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -576,6 +576,7 @@ int rtas_set_power_level(int powerdomain, int level, int *setlevel) return -ENOENT; do { + printk("CPU %d calling rtas-set-power-level\n", smp_processor_id()); rc = rtas_call(token, 2, 2, setlevel, powerdomain, level); } while (rtas_busy_delay(rc)); diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index fad3ceb..3dab7c3 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -214,6 +214,10 @@ static void pseries_cpu_die(unsigned int cpu) msleep(1); } } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { + u64 tb_before = mftb(); + u64 tb_after; + + printk("[DEBUG] CPU %d waiting for CPU %d to enter rtas\n", smp_processor_id(), cpu); /* * If the current state is not offline yet, it means that the * dying CPU (which is in pseries_mach_cpu_die) didn't have a @@ -230,6 +234,12 @@ static void pseries_cpu_die(unsigned int cpu) break; udelay(100); } + + tb_after = mftb(); + + printk("[DEBUG] CPU %d noticed CPU %d enter rtas: tries=%d, time=%llu\n", + smp_processor_id(), cpu, tries, + div_u64(tb_after - tb_before, tb_ticks_per_usec)); } if (cpu_status != 0) { diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 3df4612..d4624c4 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -123,6 +123,8 @@ static inline int smp_startup_cpu(unsigned int lcpu) if (start_cpu == RTAS_UNKNOWN_SERVICE) return 1; + printk("CPU %d calling start-cpu for CPU %d\n", + smp_processor_id(), lcpu); status = rtas_call(start_cpu, 3, 1, NULL, pcpu, start_here, pcpu); if (status != 0) { printk(KERN_ERR "start-cpu failed: %i\n", status); -- 1.9.4 -- Thanks and Regards gautham.