On Wed 2019-07-17 18:56:15, Sergey Senozhatsky wrote: > On (07/16/19 09:28), Petr Mladek wrote: > > Kernel tries hard to store and show printk messages when panicking. Even > > logbuf_lock gets re-initialized when only one CPU is running after > > smp_send_stop(). > > > > Unfortunately, smp_send_stop() might fail on architectures that do not > > use NMI as a fallback. Then printk log buffer might stay locked and > > a deadlock is almost inevitable. > > I'd say that deadlock is still almost inevitable. > > panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP. > If there is an active printing-CPU, which is looping in console_unlock(), > taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU > will spin on console_owner waiting for that printing-CPU to handover > printing duties. > > pr_emerg("Kernel panic - not syncing"); > smp_send_stop();
Good point. I forgot the handover logic. Well, it is enabled only around call_console_drivers(). Therefore it is not under lockbuf_lock. I had in mind some infinite loop or deadlock in vprintk_store(). There was at least one long time ago (warning triggered by leap second). > If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything > else, then we will not progress with panic(). panic-CPU will deadlock. If > not on > pr_emerg("Kernel panic - not syncing") > > then on another pr_emerg(), right before the NMI-fallback. Nested printk() should not be problem thanks to printk_safe. Also printk_safe_flush_on_panic() is safe because it checks whether the lock is available. The problem is kmsg_dump() and console_unlock() called from console_unblank() and console_flush_on_panic(). They do not check whether the lock is available. This patch does not help in all possible scenarios. But I still believe that it will help in some. Well, I am primary interested into the 2nd patch. It fixes a real life bug report. Best Regards, Petr