In message <4c511216.30...@ozlabs.org> you wrote:
> When CPU hotplug is used, some CPUs may be offline at the time a kexec is
> performed.  The subsequent kernel may expect these CPUs to be already running
,
> and will declare them stuck.  On pseries, there's also a soft-offline (cede)
> state that CPUs may be in; this can also cause problems as the kexeced kernel
> may ask RTAS if they're online -- and RTAS would say they are.  Again, stuck.
> 
> This patch kicks each present offline CPU awake before the kexec, so that
> none are lost to these assumptions in the subsequent kernel.

There are a lot of cleanups in this patch.  The change you are making
would be a lot clearer without all the additional cleanups in there.  I
think I'd like to see this as two patches.  One for cleanups and one for
the addition of wake_offline_cpus().

Other than that, I'm not completely convinced this is the functionality
we want.  Do we really want to online these cpus?  Why where they
offlined in the first place?  I understand the stuck problem, but is the
solution to online them, or to change the device tree so that the second
kernel doesn't detect them as stuck?  

Mikey

> 
> Signed-off-by: Matt Evans <m...@ozlabs.org>
> ---
> v2:   Added FIXME comment noting a possible problem with incorrectly
>       started secondary CPUs, following feedback from Milton.
> 
>  arch/powerpc/kernel/machine_kexec_64.c |   55 ++++++++++++++++++++++++++++--
-
>  1 files changed, 49 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/mac
hine_kexec_64.c
> index 4fbb3be..37f805e 100644
> --- a/arch/powerpc/kernel/machine_kexec_64.c
> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> @@ -15,6 +15,8 @@
>  #include <linux/thread_info.h>
>  #include <linux/init_task.h>
>  #include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/page.h>
>  #include <asm/current.h>
> @@ -181,7 +183,20 @@ static void kexec_prepare_cpus_wait(int wait_state)
>       int my_cpu, i, notified=-1;
>  
>       my_cpu = get_cpu();
> -     /* Make sure each CPU has atleast made it to the state we need */
> +     /* Make sure each CPU has at least made it to the state we need.
> +      *
> +      * FIXME: There is a (slim) chance of a problem if not all of the CPUs
> +      * are correctly onlined.  If somehow we start a CPU on boot with RTAS
> +      * start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in
> +      * time, the boot CPU will timeout.  If it does eventually execute
> +      * stuff, the secondary will start up (paca[].cpu_start was written) an
d
> +      * get into a peculiar state.  If the platform supports
> +      * smp_ops->take_timebase(), the secondary CPU will probably be spinnin
g
> +      * in there.  If not (i.e. pseries), the secondary will continue on and
> +      * try to online itself/idle/etc. If it survives that, we need to find
> +      * these possible-but-not-online-but-should-be CPUs and chaperone them
> +      * into kexec_smp_wait().
> +      */
>       for_each_online_cpu(i) {
>               if (i == my_cpu)
>                       continue;
> @@ -189,9 +204,9 @@ static void kexec_prepare_cpus_wait(int wait_state)
>               while (paca[i].kexec_state < wait_state) {
>                       barrier();
>                       if (i != notified) {
> -                             printk( "kexec: waiting for cpu %d (physical"
> -                                             " %d) to enter %i state\n",
> -                                     i, paca[i].hw_cpu_id, wait_state);
> +                             printk(KERN_INFO "kexec: waiting for cpu %d "
> +                                    "(physical %d) to enter %i state\n",
> +                                    i, paca[i].hw_cpu_id, wait_state);
>                               notified = i;
>                       }
>               }
> @@ -199,9 +214,32 @@ static void kexec_prepare_cpus_wait(int wait_state)
>       mb();
>  }
>  
> -static void kexec_prepare_cpus(void)
> +/*
> + * We need to make sure each present CPU is online.  The next kernel will sc
an
> + * the device tree and assume primary threads are online and query secondary
> + * threads via RTAS to online them if required.  If we don't online primary
> + * threads, they will be stuck.  However, we also online secondary threads a
s we
> + * may be using 'cede offline'.  In this case RTAS doesn't see the secondary
> + * threads as offline -- and again, these CPUs will be stuck.
> + *
> + * So, we online all CPUs that should be running, including secondary thread
s.
> + */
> +static void wake_offline_cpus(void)
>  {
> +     int cpu = 0;
>  
> +     for_each_present_cpu(cpu) {
> +             if (!cpu_online(cpu)) {
> +                     printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
> +                            cpu);
> +                     cpu_up(cpu);
> +             }
> +     }
> +}
> +
> +static void kexec_prepare_cpus(void)
> +{
> +     wake_offline_cpus();
>       smp_call_function(kexec_smp_down, NULL, /* wait */0);
>       local_irq_disable();
>       mb(); /* make sure IRQs are disabled before we say they are */
> @@ -215,7 +253,10 @@ static void kexec_prepare_cpus(void)
>       if (ppc_md.kexec_cpu_down)
>               ppc_md.kexec_cpu_down(0, 0);
>  
> -     /* Before removing MMU mapings make sure all CPUs have entered real mod
e */
> +     /*
> +      * Before removing MMU mappings make sure all CPUs have entered real
> +      * mode:
> +      */
>       kexec_prepare_cpus_wait(KEXEC_STATE_REAL_MODE);
>  
>       put_cpu();
> @@ -284,6 +325,8 @@ void default_machine_kexec(struct kimage *image)
>       if (crashing_cpu == -1)
>               kexec_prepare_cpus();
>  
> +     pr_debug("kexec: Starting switchover sequence.\n");
> +
>       /* switch to a staticly allocated stack.  Based on irq stack code.
>        * XXX: the task struct will likely be invalid once we do the copy!
>        */
> -- 
> 1.6.3.3
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to