On Fri 2025-08-29 16:18:28, John Ogness wrote:
> On 2025-08-29, Petr Mladek <[email protected]> wrote:
> >      c) kdb_msg_write() also writes the message on all other consoles
> >     registered by printk. I guess that this is what John meant
> >     by mirroring.
> 
> Yes.
> 
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 79d8c74378061..2c168eaf378ed 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -10,6 +10,7 @@
> >>  #include <linux/export.h>
> >>  #include <linux/init.h>
> >>  #include <linux/irqflags.h>
> >> +#include <linux/kgdb.h>
> >>  #include <linux/kthread.h>
> >>  #include <linux/minmax.h>
> >>  #include <linux/percpu.h>
> >> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct 
> >> nbcon_context *ctxt,
> >>             * Panic does not imply that the console is owned. However,
> >>             * since all non-panic CPUs are stopped during panic(), it
> >>             * is safer to have them avoid gaining console ownership.
> >> +           * The only exception is if kgdb is active, which may print
> >> +           * from multiple CPUs during a panic.
> >>             *
> >>             * If this acquire is a reacquire (and an unsafe takeover
> >>             * has not previously occurred) then it is allowed to attempt
> >> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct 
> >> nbcon_context *ctxt,
> >>             * interrupted by the panic CPU while printing.
> >>             */
> >>            if (other_cpu_in_panic() &&
> >> +              atomic_read(&kgdb_active) == -1 &&
> >
> > This would likely work for most kgdb_printk() calls. But what about
> > the one called from kgdb_panic()?
> 
> Nice catch.
> 
> > Alternative solution would be to allow it only for the CPU locked
> > by kdb, something like:
> >
> >                 READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
> 
> Yes, I like this.
>
> > Note that I used READ_ONCE() to guarantee an atomic read. The
> > condition will fail only when we are inside a code locked by
> > the kdb_printf_cpu().
> 
> Neither the READ_ONCE() nor any memory barriers are needed because the
> only interesting case is when the CPU sees that it is the one stored in
> @kdb_printf_cpu. In which case it was the one that did the storing and
> the value is always correctly loaded.

Let me play the devil advocate for a bit.
What about the following race?

kdb_printf_cpu = -1  (0xffffffff)

CPU 0xff                                CPU 0x1

                                        panic()

printk()
  nbcon_atomic_flush_pending()
     nbcon_context_try_acquire_direct()
        # load low byte of kdb_printf_cpu
        val = 0xff

                                        vkdb_printf()
                                          cmpxchg(&kdb_printf_cpu, ...)
                                          kdb_printf_cpu == 0x1

        # load higher byte of kdb_printf_cpu
        val = 0xff

Result: CPU 0xff would be allowed to acquire the nbcon context
        because it thinks that vkdb_printf() got locked on this CPU.

        It is not fully artificial, see
        https://lwn.net/Articles/793253/#Load%20Tearing

The above race is not critical. CPU 0x1 still could wait for CPU 0xff
and acquire the nbcon context later.

But it is something unexpected. I would feel more comfortable if
we used the READ_ONCE() and be on the safe side.

> >> [0] 
> >> https://lore.kernel.org/lkml/[email protected]
> >
> > Sigh, I have already forgotten that we discussed this in the past.
> 
> After so many years, I do not think there is a printk scenario we have
> not discussed. ;-)

;-)

Best Regards,
Petr


_______________________________________________
Kgdb-bugreport mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Reply via email to