On (07/16/19 09:28), Petr Mladek wrote: > Kernel tries hard to store and show printk messages when panicking. Even > logbuf_lock gets re-initialized when only one CPU is running after > smp_send_stop(). > > Unfortunately, smp_send_stop() might fail on architectures that do not > use NMI as a fallback. Then printk log buffer might stay locked and > a deadlock is almost inevitable.
I'd say that deadlock is still almost inevitable. panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP. If there is an active printing-CPU, which is looping in console_unlock(), taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU will spin on console_owner waiting for that printing-CPU to handover printing duties. pr_emerg("Kernel panic - not syncing"); smp_send_stop(); If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything else, then we will not progress with panic(). panic-CPU will deadlock. If not on pr_emerg("Kernel panic - not syncing") then on another pr_emerg(), right before the NMI-fallback. static void native_stop_other_cpus() { ... pr_emerg("Shutting down cpus with NMI\n"); ^^ deadlock here apic->send_IPI_allbutself(NMI_VECTOR); ^^ not going to happen ... } And it's not only x86. In many cases if we fail to SMP_STOP other CPUs, and one of hem is holding logbuf_lock then we are done with panic(). We will not return from smp_send_stop(). arm/kernel/smp.c void smp_send_stop(void) { ... if (num_online_cpus() > 1) pr_warn("SMP: failed to stop secondary CPUs\n"); } arm64/kernel/smp.c void crash_smp_send_stop(void) { ... pr_crit("SMP: stopping secondary CPUs\n"); smp_cross_call(&mask, IPI_CPU_CRASH_STOP); ... if (atomic_read(&waiting_for_crash_ipi) > 0) pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", cpumask_pr_args(&mask)); ... } arm64/kernel/smp.c void smp_send_stop(void) { ... if (num_online_cpus() > 1) pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", cpumask_pr_args(cpu_online_mask)); ... } riscv/kernel/smp.c void smp_send_stop(void) { ... if (num_online_cpus() > 1) pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", cpumask_pr_args(cpu_online_mask)); ... } And so on. -ss