On 2020-05-16 13:52, Anup Patel wrote:
-----Original Message-----
From: Marc Zyngier <[email protected]>
Sent: 16 May 2020 17:42
To: Anup Patel <[email protected]>
Cc: Palmer Dabbelt <[email protected]>; Paul Walmsley
<[email protected]>; Thomas Gleixner <[email protected]>; Jason Cooper <[email protected]>; Atish Patra <[email protected]>; Alistair Francis <[email protected]>; Anup Patel <[email protected]>; linux-
[email protected]; [email protected]
Subject: Re: [PATCH 1/4] irqchip/sifive-plic: Setup cpuhp once after current
handler is present

Hi Anup,

On 2020-05-16 07:38, Anup Patel wrote:
> For multiple PLIC instances, the plic_init() is called once for each
> PLIC instance. Due to this we have two issues:
> 1. cpuhp_setup_state() is called multiple times 2. plic_starting_cpu()
> can crash for boot CPU if cpuhp_setup_state()
>    is called before boot CPU PLIC handler is available.
>
> This patch fixes both above issues.
>
> Signed-off-by: Anup Patel <[email protected]>
> ---
>  drivers/irqchip/irq-sifive-plic.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c
> b/drivers/irqchip/irq-sifive-plic.c
> index 822e074c0600..7dc23edb3267 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -76,6 +76,7 @@ struct plic_handler {
>    void __iomem            *enable_base;
>    struct plic_priv        *priv;
>  };
> +static bool plic_cpuhp_setup_done;
>  static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
>
>  static inline void plic_toggle(struct plic_handler *handler, @@
> -282,6 +283,7 @@ static int __init plic_init(struct device_node *node,
>    int error = 0, nr_contexts, nr_handlers = 0, i;
>    u32 nr_irqs;
>    struct plic_priv *priv;
> +  struct plic_handler *handler;
>
>    priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>    if (!priv)
> @@ -310,7 +312,6 @@ static int __init plic_init(struct device_node
> *node,
>
>    for (i = 0; i < nr_contexts; i++) {
>            struct of_phandle_args parent;
> -          struct plic_handler *handler;
>            irq_hw_number_t hwirq;
>            int cpu, hartid;
>
> @@ -364,9 +365,18 @@ static int __init plic_init(struct device_node
> *node,
>            nr_handlers++;
>    }
>
> -  cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> +  /*
> +   * We can have multiple PLIC instances so setup cpuhp state only
> +   * when context handler for current/boot CPU is present.
> +   */
> +  handler = this_cpu_ptr(&plic_handlers);
> +  if (handler->present && !plic_cpuhp_setup_done) {

If there is no context handler for the boot CPU, the system is doomed, right? It isn't able to get any interrupt, and you don't register the hotplug notifier that
could allow secondary CPUs to boot.

So what is the point? It feels like you should just give up here.

Also, the boot CPU is always CPU 0. So checking that you only register the
hotplug notifier from CPU 0 should be enough.

The boot CPU is not fixed in RISC-V, the logical id of the boot CPU will always be zero but physical id (or HART id) can be something totally different.

So on riscv, smp_processor_id() can return a non-zero value on the
the boot CPU? Interesting... :-/


On RISC-V NUMA system, we will have a separate PLIC in each NUMA node.

Let's say we have a system with 2 NUMA nodes, each NUMA node having
4 CPUs (or 4 HARTs). In this case, the DTB passed to Linux will have two PLIC
DT nodes where each PLIC device targets only 4 CPUs (or 4 HARTs). The
plic_init() functions will setup handlers for only 4 CPUs (or 4 HARTs). In other words, plic_init() for "PLIC0" will setup handler for HART id 0 to 3 and plic_init() for "PLIC1" will setup handler for HART id 4 to 7. Now, any CPU can be boot CPU so it is possible that CPU with HART id 4 is boot CPU and when plic_init() is first called for "PLIC0" the handler for HART id 4 is not setup because it will be setup later when plic_init() is called for "PLIC1". This cause plic_starting_cpu() to crash when plic_init() is called for "PLIC0".

I hope above example helps understanding the issue.

It does, thanks. This pseudo NUMA thing really is a terrible hack...


I encounter this issue randomly when booting Linux on QEMU RISC-V
with multiple NUMA nodes.

Then why don't you defer the probing of the PLIC you can't initialize
from this CPU? If you're on CPU4-7, only initialize the PLIC that
matters to you, and not the the others. It would certainly make a lot
more sense, and be more robust.

        M.
--
Jazz is not dead. It just smells funny...

Reply via email to