Re: [Kgdb-bugreport] [PATCH] kgdb: Flush console before entering kgdb on panic

2023-08-25 Thread Daniel Thompson
On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> When entering kdb/kgdb on a kernel panic, it was be observed that the
> console isn't flushed before the `kdb` prompt came up. Specifically,
> when using the buddy lockup detector on arm64 and running:
>   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
>
> I could see:
>   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
>   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
>   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
>   [   32.557359] NMI backtrace for cpu 6
>   ... [backtrace for cpu 6] ...
>   [   32.558353] NMI backtrace for cpu 5
>   ... [backtrace for cpu 5] ...
>   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
>   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
>
>   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
>   [5]kdb>
>
> As you can see, backtraces for the other CPUs start printing and get
> interleaved with the kdb PANIC print.
>
> Let's replicate the commands to flush the console in the kdb panic
> entry point to avoid this.
>
> Signed-off-by: Douglas Anderson 
> ---
>
>  kernel/debug/debug_core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index d5e9ccde3ab8..3a904d8697c8 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
>   if (panic_timeout)
>   return;
>
> + debug_locks_off();
> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
>   if (dbg_kdb_mode)
>   kdb_printf("PANIC: %s\n", msg);

I'm somewhat included to say *this* (calling kdb_printf() when not
actually in the debugger) is the cause of the problem. kdb_printf()
does some pretty horid things to the console and isn't intended to
run while the system is active.

I'd therefore be more tempted to defer the print to the b.p. trap
handler itself and make this part of kgdb_panic() look more like:

kgdb_panic_msg = msg;
kgdb_breakpoint();
kgdb_panic_msg = NULL;


Daniel.


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 6/6] arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup

2023-08-25 Thread Daniel Thompson
On Thu, Aug 24, 2023 at 08:30:32AM -0700, Douglas Anderson wrote:
> Up until now we've been using the generic (weak) implementation for
> kgdb_roundup_cpus() when using kgdb on arm64. Let's move to a custom
> one. The advantage here is that, when pseudo-NMI is enabled on a
> device, we'll be able to round up CPUs using pseudo-NMI. This allows
> us to debug CPUs that are stuck with interrupts disabled. If
> pseudo-NMIs are not enabled then we'll fallback to just using an IPI,
> which is still slightly better than the generic implementation since
> it avoids the potential situation described in the generic
> kgdb_call_nmi_hook().
>
> Co-developed-by: Sumit Garg 
> Signed-off-by: Sumit Garg 
> Signed-off-by: Douglas Anderson 

>From a kgdb point of view:
Reviewed-by: Daniel Thompson 


Daniel.


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH] kgdb: Flush console before entering kgdb on panic

2023-08-25 Thread Doug Anderson
Hi,

On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
 wrote:
>
> On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > console isn't flushed before the `kdb` prompt came up. Specifically,
> > when using the buddy lockup detector on arm64 and running:
> >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > I could see:
> >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> >   [   32.557359] NMI backtrace for cpu 6
> >   ... [backtrace for cpu 6] ...
> >   [   32.558353] NMI backtrace for cpu 5
> >   ... [backtrace for cpu 5] ...
> >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> >
> >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> >   [5]kdb>
> >
> > As you can see, backtraces for the other CPUs start printing and get
> > interleaved with the kdb PANIC print.
> >
> > Let's replicate the commands to flush the console in the kdb panic
> > entry point to avoid this.
> >
> > Signed-off-by: Douglas Anderson 
> > ---
> >
> >  kernel/debug/debug_core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index d5e9ccde3ab8..3a904d8697c8 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> >   if (panic_timeout)
> >   return;
> >
> > + debug_locks_off();
> > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > +
> >   if (dbg_kdb_mode)
> >   kdb_printf("PANIC: %s\n", msg);
>
> I'm somewhat included to say *this* (calling kdb_printf() when not
> actually in the debugger) is the cause of the problem. kdb_printf()
> does some pretty horid things to the console and isn't intended to
> run while the system is active.
>
> I'd therefore be more tempted to defer the print to the b.p. trap
> handler itself and make this part of kgdb_panic() look more like:
>
> kgdb_panic_msg = msg;
> kgdb_breakpoint();
> kgdb_panic_msg = NULL;

Unfortunately I think that only solves half the problem. As a quick
test, I tried simply commenting out the "kdb_printf" line in
kgdb_panic(). While that avoids the interleaved panic message and
backtrace, it does nothing to actually get the backtraces printed out
before you end up in kdb. As an example, this is what happened when I
used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
had the "kdb_printf" in kgdb_panic() commented out:

[   72.658424] lkdtm: Performing direct entry HARDLOCKUP
[   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
...
[   82.234801] Sending NMI from CPU 5 to CPUs 6:
[   82.239296] NMI backtrace for cpu 6
... [ stack trace for CPU 6 ] ...
[   82.240294] NMI backtrace for cpu 5
... [ stack trace for CPU 5 ] ...
[   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
[   82.581291] NMI backtrace
Entering kdb (current=0xff80da5a1080, pid 6978) on processor 5 due
to Keyboard Entry
[5]kdb>

As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
show up if I use the "dmesg" command but it's a bit of a hassle to run
"dmesg" to look for any extra debug messages every time I drop in kdb.

I guess perhaps that part isn't obvious from the commit message?
Should I send a new version with an updated commit message indicating
that it's not just the jumbled text that's a problem but also the lack
of stack traces?

Thanks!

-Doug


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 3/6] arm64: smp: Remove dedicated wakeup IPI

2023-08-25 Thread Stephen Boyd
Quoting Douglas Anderson (2023-08-24 08:30:29)
> From: Mark Rutland 
>
> To enable NMI backtrace and KGDB's NMI cpu roundup, we need to free up
> at least one dedicated IPI.
>
> On arm64 the IPI_WAKEUP IPI is only used for the ACPI parking protocol,
> which itself is only used on some very early ARMv8 systems which
> couldn't implement PSCI.
>
> Remove the IPI_WAKEUP IPI, and rely on the IPI_RESCHEDULE IPI to wake
> CPUs from the parked state. This will cause a tiny amonut of redundant
> work to check the thread flags, but this is miniscule in relation to the
> cost of taking and handling the IPI in the first place. We can safely
> handle redundant IPI_RESCHEDULE IPIs, so there should be no functional
> impact as a result of this change.
>
> Signed-off-by: Mark Rutland 
> Signed-off-by: Douglas Anderson 
> Cc: Catalin Marinas 
> Cc: Marc Zyngier 
> Cc: Sumit Garg 
> Cc: Will Deacon 
> ---

Reviewed-by: Stephen Boyd 


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 4/6] arm64: smp: Add arch support for backtrace using pseudo-NMI

2023-08-25 Thread Stephen Boyd
Quoting Douglas Anderson (2023-08-24 08:30:30)
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index fac08e18bcd5..50ce8b697ff3 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -6,6 +6,9 @@
>
>  #include 
>
> +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu);

Some nits, but otherwise

Reviewed-by: Stephen Boyd 

> +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
> +
>  struct pt_regs;
>
>  int set_handle_irq(void (*handle_irq)(struct pt_regs *));
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index a5848f1ef817..c8896cbc5327 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -72,12 +73,18 @@ enum ipi_msg_type {
> IPI_CPU_CRASH_STOP,
> IPI_TIMER,
> IPI_IRQ_WORK,
> -   NR_IPI
> +   NR_IPI,
> +   /*
> +* Any enum >= NR_IPI and < MAX_IPI is special and not tracable
> +* with trace_ipi_*
> +*/
> +   IPI_CPU_BACKTRACE = NR_IPI,
> +   MAX_IPI
>  };
>
>  static int ipi_irq_base __read_mostly;
>  static int nr_ipi __read_mostly = NR_IPI;
> -static struct irq_desc *ipi_desc[NR_IPI] __read_mostly;
> +static struct irq_desc *ipi_desc[MAX_IPI] __read_mostly;

Side note: it would be nice to mark ipi_desc as __ro_after_init. Same
for nr_ipi and ipi_irq_base.

>
>  static void ipi_setup(int cpu);
>
> @@ -845,6 +852,22 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int 
> cpu, struct pt_regs *regs
>  #endif
>  }
>
> +static void arm64_backtrace_ipi(cpumask_t *mask)
> +{
> +   __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask);
> +}
> +
> +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)

Can this be 'bool exclude_self' instead of int? That matches all other
implementations from what I can tell.

> +{
> +   /*
> +* NOTE: though nmi_trigger_cpumask_backtrace has "nmi_" in the name,

USe nmi_trigger_cpumask_backtrace() to indicate function.

> +* nothing about it truly needs to be implemented using an NMI, it's
> +* just that it's _allowed_ to work with NMIs. If ipi_should_be_nmi()
> +* returned false our backtrace attempt will just use a regular IPI.
> +*/
> +   nmi_trigger_cpumask_backtrace(mask, exclude_cpu, arm64_backtrace_ipi);


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 6/6] arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup

2023-08-25 Thread Stephen Boyd
Quoting Douglas Anderson (2023-08-24 08:30:32)
> Up until now we've been using the generic (weak) implementation for
> kgdb_roundup_cpus() when using kgdb on arm64. Let's move to a custom
> one. The advantage here is that, when pseudo-NMI is enabled on a
> device, we'll be able to round up CPUs using pseudo-NMI. This allows
> us to debug CPUs that are stuck with interrupts disabled. If
> pseudo-NMIs are not enabled then we'll fallback to just using an IPI,
> which is still slightly better than the generic implementation since
> it avoids the potential situation described in the generic
> kgdb_call_nmi_hook().
>
> Co-developed-by: Sumit Garg 
> Signed-off-by: Sumit Garg 
> Signed-off-by: Douglas Anderson 
> ---

Reviewed-by: Stephen Boyd 


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 4/6] arm64: smp: Add arch support for backtrace using pseudo-NMI

2023-08-25 Thread Stephen Boyd
Quoting Doug Anderson (2023-08-25 16:02:46)
> On Fri, Aug 25, 2023 at 3:27 PM Stephen Boyd  wrote:
> >
> > Quoting Douglas Anderson (2023-08-24 08:30:30)
> > > diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> > >
> > >  static int ipi_irq_base __read_mostly;
> > >  static int nr_ipi __read_mostly = NR_IPI;
> > > -static struct irq_desc *ipi_desc[NR_IPI] __read_mostly;
> > > +static struct irq_desc *ipi_desc[MAX_IPI] __read_mostly;
> >
> > Side note: it would be nice to mark ipi_desc as __ro_after_init. Same
> > for nr_ipi and ipi_irq_base.
>
> I'd rather not change it in this patch since it's a pre-existing and
> separate issue, but I can add a patch to the end of the series for
> that if I end up spinning it. Otherwise I can send a follow-up patch
> for it.

Of course. Don't change it in this patch.

>
>
> > >  static void ipi_setup(int cpu);
> > >
> > > @@ -845,6 +852,22 @@ static void __noreturn ipi_cpu_crash_stop(unsigned 
> > > int cpu, struct pt_regs *regs
> > >  #endif
> > >  }
> > >
> > > +static void arm64_backtrace_ipi(cpumask_t *mask)
> > > +{
> > > +   __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask);
> > > +}
> > > +
> > > +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int 
> > > exclude_cpu)
> >
> > Can this be 'bool exclude_self' instead of int? That matches all other
> > implementations from what I can tell.
>
> Nope. See the part of the commit message that says:
>
> This patch depends on commit 36759e343ff9 ("nmi_backtrace: allow
> excluding an arbitrary CPU") since that commit changed the prototype
> of arch_trigger_cpumask_backtrace(), which this patch implements.

Ah, ok. Sounds fine then.


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


Re: [Kgdb-bugreport] [PATCH v11 4/6] arm64: smp: Add arch support for backtrace using pseudo-NMI

2023-08-25 Thread Doug Anderson
Hi,

On Fri, Aug 25, 2023 at 3:27 PM Stephen Boyd  wrote:
>
> Quoting Douglas Anderson (2023-08-24 08:30:30)
> > diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> > index fac08e18bcd5..50ce8b697ff3 100644
> > --- a/arch/arm64/include/asm/irq.h
> > +++ b/arch/arm64/include/asm/irq.h
> > @@ -6,6 +6,9 @@
> >
> >  #include 
> >
> > +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int 
> > exclude_cpu);
>
> Some nits, but otherwise
>
> Reviewed-by: Stephen Boyd 
>
> > +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
> > +
> >  struct pt_regs;
> >
> >  int set_handle_irq(void (*handle_irq)(struct pt_regs *));
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index a5848f1ef817..c8896cbc5327 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -72,12 +73,18 @@ enum ipi_msg_type {
> > IPI_CPU_CRASH_STOP,
> > IPI_TIMER,
> > IPI_IRQ_WORK,
> > -   NR_IPI
> > +   NR_IPI,
> > +   /*
> > +* Any enum >= NR_IPI and < MAX_IPI is special and not tracable
> > +* with trace_ipi_*
> > +*/
> > +   IPI_CPU_BACKTRACE = NR_IPI,
> > +   MAX_IPI
> >  };
> >
> >  static int ipi_irq_base __read_mostly;
> >  static int nr_ipi __read_mostly = NR_IPI;
> > -static struct irq_desc *ipi_desc[NR_IPI] __read_mostly;
> > +static struct irq_desc *ipi_desc[MAX_IPI] __read_mostly;
>
> Side note: it would be nice to mark ipi_desc as __ro_after_init. Same
> for nr_ipi and ipi_irq_base.

I'd rather not change it in this patch since it's a pre-existing and
separate issue, but I can add a patch to the end of the series for
that if I end up spinning it. Otherwise I can send a follow-up patch
for it.


> >  static void ipi_setup(int cpu);
> >
> > @@ -845,6 +852,22 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int 
> > cpu, struct pt_regs *regs
> >  #endif
> >  }
> >
> > +static void arm64_backtrace_ipi(cpumask_t *mask)
> > +{
> > +   __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask);
> > +}
> > +
> > +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
>
> Can this be 'bool exclude_self' instead of int? That matches all other
> implementations from what I can tell.

Nope. See the part of the commit message that says:

This patch depends on commit 36759e343ff9 ("nmi_backtrace: allow
excluding an arbitrary CPU") since that commit changed the prototype
of arch_trigger_cpumask_backtrace(), which this patch implements.

> > +{
> > +   /*
> > +* NOTE: though nmi_trigger_cpumask_backtrace has "nmi_" in the 
> > name,
>
> USe nmi_trigger_cpumask_backtrace() to indicate function.

I won't plan on doing an immediate spin for this and for now I'll wait
for additional feedback. If a maintainer is getting ready to land
this, I'm happy to post a new version with this fix or also happy if a
maintainer wants to add the "()" while applying.

-Doug


___
Kgdb-bugreport mailing list
Kgdb-bugreport@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport