On Thu, Nov 10, 2016 at 06:44:45PM +0100, Sebastian Andrzej Siewior wrote: > Initially I wanted to remove mcheck_cpu_init() from identify_cpu() and let it > become an independent early hotplug callback. The main problem here was that > the init on the boot CPU may happen too late > (device_initcall_sync(mcheck_init_device)) and nobody wanted to risk receiving > and MCE event at boot time leading to a shutdown (if the MCE feature is not > yet > enabled). > > Here is attempt two: the timming stays as-is but the ordering of the functions
timing > is changed: > - mcheck_cpu_init() (which is run from identify_cpu()) will setup the timer > struct but won't fire the timer. This is moved to CPU_ONLINE since its > cleanup part is in CPU_DOWN_PREPARE. So if it is okay to stop the timer > early > in the shutdown phase, it should be okay to start it late in the bring up > phase. > > - CPU_DOWN_PREPARE disables the MCE feature flags for !INTEL CPUs in disables the MCE error reporting... > mce_disable_cpu(). If a failure occures it would be re-enabled on all vendor occurs > CPUs (including Intel where it was not disabled during shutdown). To keep > this > working I am moving it to CPU_ONLINE. smp_call_function_single() is dropped > beause the notifier runs nowdays on the target CPU. "... because the notifier runs on the target CPU now." Please run your commit messages text through a spellchecker. > - CPU_ONLINE is invoking mce_device_create() + mce_threshold_create_device() > but its cleanup part is in CPU_DEAD (mce_threshold_remove_device() and > mce_device_remove()). In order to keep this symmetrical I am moving the > clean > up from CPU_DEAD to CPU_DOWN_PREPARE. cleanup > > Cc: Tony Luck <tony.l...@intel.com> > Cc: Borislav Petkov <b...@alien8.de> > Cc: linux-e...@vger.kernel.org > Cc: x...@kernel.org > Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 31 +++++++++++++++---------------- > 1 file changed, 15 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 052b5e05c3c4..3da6fd94fa2e 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -1771,6 +1771,9 @@ void (*machine_check_vector)(struct pt_regs *, long > error_code) = > */ > void mcheck_cpu_init(struct cpuinfo_x86 *c) > { > + struct timer_list *t = this_cpu_ptr(&mce_timer); > + unsigned int cpu = smp_processor_id(); > + > if (mca_cfg.disabled) > return; > > @@ -1796,7 +1799,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) > __mcheck_cpu_init_generic(); > __mcheck_cpu_init_vendor(c); > __mcheck_cpu_init_clear_banks(); > - __mcheck_cpu_init_timer(); > + setup_pinned_timer(t, mce_timer_fn, cpu); Why not leave all that setup stuff in __mcheck_cpu_init_timer() ? ... > @@ -2517,11 +2518,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > mce_device_remove(cpu); > return NOTIFY_BAD; > } > - > + mce_reenable_cpu(); > + mce_start_timer(cpu, t); > break; > case CPU_DEAD: > - mce_threshold_remove_device(cpu); > - mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > > /* intentionally ignoring frozen here */ There's another place for cpuhp_tasks_frozen replacement here: /* intentionally ignoring frozen here */ if (!(action & CPU_TASKS_FROZEN)) cmci_rediscover(); into if (!cpuhp_tasks_frozen) cmci_rediscover(); > @@ -2529,12 +2529,11 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > - smp_call_function_single(cpu, mce_disable_cpu, &action, 1); > + mce_disable_cpu(); > del_timer_sync(t); > - break; > - case CPU_DOWN_FAILED: > - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1); > - mce_start_timer(cpu, t); > + > + mce_threshold_remove_device(cpu); > + mce_device_remove(cpu); > break; > } > > -- > 2.10.2 > > -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.