Hari Bathini <hbath...@linux.ibm.com> writes: > In panic path, fadump is triggered via a panic notifier function. > Before calling panic notifier functions, smp_send_stop() gets called, > which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc > ("powerpc: stop_this_cpu: remove the cpu from the online map.") and > again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") > started marking CPUs as offline while stopping them. So, if a kernel > has either of the above commits, vmcore captured with fadump via panic > path would show only the panic'ing CPU as online. Sample output of > crash-utility with such vmcore: > > # crash vmlinux vmcore > ... > KERNEL: vmlinux > DUMPFILE: vmcore [PARTIAL DUMP] > CPUS: 1 > DATE: Wed Nov 10 09:56:34 EST 2021 > UPTIME: 00:00:42 > LOAD AVERAGE: 2.27, 0.69, 0.24 > TASKS: 183 > NODENAME: XXXXXXXXX > RELEASE: 5.15.0+ > VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 > MACHINE: ppc64le (2500 Mhz) > MEMORY: 8 GB > PANIC: "Kernel panic - not syncing: sysrq triggered crash" > PID: 3394 > COMMAND: "bash" > TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] > CPU: 1 > STATE: TASK_RUNNING (PANIC) > > crash> p -x __cpu_online_mask > __cpu_online_mask = $1 = { > bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} > } > crash> > crash> > crash> p -x __cpu_active_mask > __cpu_active_mask = $2 = { > bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} > } > crash> > > While this has been the case since fadump was introduced, the issue > was not identified for two probable reasons: > > - In general, the bulk of the vmcores analyzed were from crash > due to exception. > > - The above did change since commit 8341f2f222d7 ("sysrq: Use > panic() to force a crash") started using panic() instead of > deferencing NULL pointer to force a kernel crash. But then > commit de6e5d38417e ("powerpc: smp_send_stop do not offline > stopped CPUs") stopped marking CPUs as offline till kernel > commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") > reverted that change. > > To avoid vmcore from showing only one CPU as online in panic path, > skip marking non panic'ing CPUs as offline while stopping them > during fadump crash.
Is this really worth the added complexity/bug surface? Why does it matter if the vmcore shows only one CPU online? > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index c23ee842c4c3..20555d5d5966 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -61,6 +61,7 @@ > #include <asm/cpu_has_feature.h> > #include <asm/ftrace.h> > #include <asm/kup.h> > +#include <asm/fadump.h> > > #ifdef DEBUG > #include <asm/udbg.h> > @@ -626,7 +627,8 @@ static void nmi_stop_this_cpu(struct pt_regs *regs) > /* > * IRQs are already hard disabled by the smp_handle_nmi_ipi. > */ > - set_cpu_online(smp_processor_id(), false); > + if (!(oops_in_progress && should_fadump_crash())) > + set_cpu_online(smp_processor_id(), false); > > spin_begin(); > while (1) > @@ -650,7 +652,8 @@ static void stop_this_cpu(void *dummy) > * to know other CPUs are offline before it breaks locks to flush > * printk buffers, in case we panic()ed while holding the lock. > */ > - set_cpu_online(smp_processor_id(), false); > + if (!(oops_in_progress && should_fadump_crash())) > + set_cpu_online(smp_processor_id(), false); The comment talks about printk_safe_flush_on_panic(), and this change would presumably break that. Except that printk_safe_flush_on_panic() no longer exists. So do we need to set the CPU online here at all? ie. could we revert bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") now that printk_safe_flush_on_panic() no longer exists? cheers