On Thu, Nov 10, 2016 at 06:44:45PM +0100, Sebastian Andrzej Siewior wrote:
> Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it
> become an independent early hotplug callback. The main problem here was that
> the init on the boot CPU may happen too late
> (device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving
> and MCE event at boot time leading to a shutdown (if the MCE feature is not 
> yet
> enabled).
> 
> Here is attempt two: the timming stays as-is but the ordering of the functions

                           timing

> is changed:
> - mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer
>   struct but won't fire the timer. This is moved to CPU_ONLINE since its
>   cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer 
> early
>   in the shutdown phase, it should be okay to start it late in the bring up 
> phase.
> 
> - CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in

                     disables the MCE error reporting...

>   mce_disable_cpu(). If a failure occures it would be re-enabled on all vendor

                                    occurs

>   CPUs (including Intel where it was not disabled during shutdown). To keep 
> this
>   working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped
>   beause the notifier runs nowdays on the target CPU.

"... because the notifier runs on the target CPU now."

Please run your commit messages text through a spellchecker.

> - CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device()
>   but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and
>   mce_device_remove()). In order to keep this symmetrical I am moving the 
> clean
>   up from CPU_DEAD to CPU_DOWN_PREPARE.

   cleanup

> 
> Cc: Tony Luck <tony.l...@intel.com>
> Cc: Borislav Petkov <b...@alien8.de>
> Cc: linux-e...@vger.kernel.org
> Cc: x...@kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 31 +++++++++++++++----------------
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 052b5e05c3c4..3da6fd94fa2e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1771,6 +1771,9 @@ void (*machine_check_vector)(struct pt_regs *, long 
> error_code) =
>   */
>  void mcheck_cpu_init(struct cpuinfo_x86 *c)
>  {
> +     struct timer_list *t = this_cpu_ptr(&mce_timer);
> +     unsigned int cpu = smp_processor_id();
> +
>       if (mca_cfg.disabled)
>               return;
>  
> @@ -1796,7 +1799,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
>       __mcheck_cpu_init_generic();
>       __mcheck_cpu_init_vendor(c);
>       __mcheck_cpu_init_clear_banks();
> -     __mcheck_cpu_init_timer();
> +     setup_pinned_timer(t, mce_timer_fn, cpu);

Why not leave all that setup stuff in __mcheck_cpu_init_timer() ?

...

> @@ -2517,11 +2518,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>                       mce_device_remove(cpu);
>                       return NOTIFY_BAD;
>               }
> -
> +             mce_reenable_cpu();
> +             mce_start_timer(cpu, t);
>               break;
>       case CPU_DEAD:
> -             mce_threshold_remove_device(cpu);
> -             mce_device_remove(cpu);
>               mce_intel_hcpu_update(cpu);
>  
>               /* intentionally ignoring frozen here */

There's another place for cpuhp_tasks_frozen replacement here:

                /* intentionally ignoring frozen here */
                if (!(action & CPU_TASKS_FROZEN))
                        cmci_rediscover();

into

                if (!cpuhp_tasks_frozen)
                        cmci_rediscover();

> @@ -2529,12 +2529,11 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>                       cmci_rediscover();
>               break;
>       case CPU_DOWN_PREPARE:
> -             smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> +             mce_disable_cpu();
>               del_timer_sync(t);
> -             break;
> -     case CPU_DOWN_FAILED:
> -             smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> -             mce_start_timer(cpu, t);
> +
> +             mce_threshold_remove_device(cpu);
> +             mce_device_remove(cpu);
>               break;
>       }
>  
> -- 
> 2.10.2
> 
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Reply via email to