> On Aug 23, 2019, at 3:41 PM, Nadav Amit <na...@vmware.com> wrote: > > Currently, on_each_cpu() and similar functions do not exploit the > potential of concurrency: the function is first executed remotely and > only then it is executed locally. Functions such as TLB flush can take > considerable time, so this provides an opportunity for performance > optimization. > > To do so, introduce __smp_call_function_many(), which allows the callers > to provide local and remote functions that should be executed, and run > them concurrently. Keep smp_call_function_many() semantic as it is today > for backward compatibility: the called function is not executed in this > case locally. > > __smp_call_function_many() does not use the optimized version for a > single remote target that smp_call_function_single() implements. For > synchronous function call, smp_call_function_single() keeps a > call_single_data (which is used for synchronization) on the stack. > Interestingly, it seems that not using this optimization provides > greater performance improvements (greater speedup with a single remote > target than with multiple ones). Presumably, holding data structures > that are intended for synchronization on the stack can introduce > overheads due to TLB misses and false-sharing when the stack is used for > other purposes. > > Adding support to run the functions concurrently required to remove a > micro-optimization in on_each_cpu() that disabled/enabled IRQs instead > of saving/restoring them. The benefit of running the local and remote > code concurrently is expected to be greater. > > Reviewed-by: Dave Hansen <dave.han...@linux.intel.com> > Cc: Peter Zijlstra <pet...@infradead.org> > Cc: Rik van Riel <r...@surriel.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Andy Lutomirski <l...@kernel.org> > Cc: Josh Poimboeuf <jpoim...@redhat.com> > Signed-off-by: Nadav Amit <na...@vmware.com> > --- > include/linux/smp.h | 34 ++++++++--- > kernel/smp.c | 138 +++++++++++++++++++++----------------------- > 2 files changed, 92 insertions(+), 80 deletions(-) > > diff --git a/include/linux/smp.h b/include/linux/smp.h > index 6fc856c9eda5..d18d54199635 100644 > --- a/include/linux/smp.h > +++ b/include/linux/smp.h > @@ -32,11 +32,6 @@ extern unsigned int total_cpus; > int smp_call_function_single(int cpuid, smp_call_func_t func, void *info, > int wait); > > -/* > - * Call a function on all processors > - */ > -void on_each_cpu(smp_call_func_t func, void *info, int wait); > - > /* > * Call a function on processors specified by mask, which might include > * the local one. > @@ -44,6 +39,17 @@ void on_each_cpu(smp_call_func_t func, void *info, int > wait); > void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func, > void *info, bool wait); > > +/* > + * Call a function on all processors. May be used during early boot while > + * early_boot_irqs_disabled is set. > + */ > +static inline void on_each_cpu(smp_call_func_t func, void *info, int wait) > +{ > + preempt_disable(); > + on_each_cpu_mask(cpu_online_mask, func, info, wait); > + preempt_enable(); > +}
Err.. I made this change the last minute before sending, and apparently forgot to build, since it does not build. Let me know if there is anything else with this version, though.