On (07/16/19 09:28), Petr Mladek wrote:
> Kernel tries hard to store and show printk messages when panicking. Even
> logbuf_lock gets re-initialized when only one CPU is running after
> smp_send_stop().
> 
> Unfortunately, smp_send_stop() might fail on architectures that do not
> use NMI as a fallback. Then printk log buffer might stay locked and
> a deadlock is almost inevitable.

I'd say that deadlock is still almost inevitable.

panic-CPU syncs with the printing-CPU before it attempts to SMP_STOP.
If there is an active printing-CPU, which is looping in console_unlock(),
taking logbuf_lock in order to msg_print_text() and stuff, then panic-CPU
will spin on console_owner waiting for that printing-CPU to handover
printing duties.

        pr_emerg("Kernel panic - not syncing");
        smp_send_stop();


If printing-CPU goes nuts under logbuf_lock, has corrupted IDT or anything
else, then we will not progress with panic(). panic-CPU will deadlock. If
not on
        pr_emerg("Kernel panic - not syncing")

then on another pr_emerg(), right before the NMI-fallback.

        static void native_stop_other_cpus()
        {
        ...
                pr_emerg("Shutting down cpus with NMI\n");
                           ^^ deadlock here
                apic->send_IPI_allbutself(NMI_VECTOR);
                           ^^ not going to happen
        ...
        }

And it's not only x86. In many cases if we fail to SMP_STOP other
CPUs, and one of hem is holding logbuf_lock then we are done with
panic(). We will not return from smp_send_stop().

arm/kernel/smp.c

void smp_send_stop(void)
{
        ...
        if (num_online_cpus() > 1)
                pr_warn("SMP: failed to stop secondary CPUs\n");
}

arm64/kernel/smp.c

void crash_smp_send_stop(void)
{
        ...
        pr_crit("SMP: stopping secondary CPUs\n");
        smp_cross_call(&mask, IPI_CPU_CRASH_STOP);

        ...
        if (atomic_read(&waiting_for_crash_ipi) > 0)
                pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
                                cpumask_pr_args(&mask));
        ...
}

arm64/kernel/smp.c

void smp_send_stop(void)
{
        ...
        if (num_online_cpus() > 1)
                pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
                                cpumask_pr_args(cpu_online_mask));
        ...
}


riscv/kernel/smp.c

void smp_send_stop(void)
{
        ...
        if (num_online_cpus() > 1)
                pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
                        cpumask_pr_args(cpu_online_mask));
        ...
}

And so on.

        -ss

Reply via email to