On Tue, Feb 09, 2021 at 02:16:48PM -0800, Nadav Amit wrote: > + /* > + * Although we could have used on_each_cpu_cond_mask(), > + * open-coding it has performance advantages, as it eliminates > + * the need for indirect calls or retpolines. In addition, it > + * allows to use a designated cpumask for evaluating the > + * condition, instead of allocating one. > + * > + * This code works under the assumption that there are no nested > + * TLB flushes, an assumption that is already made in > + * flush_tlb_mm_range(). > + * > + * cond_cpumask is logically a stack-local variable, but it is > + * more efficient to have it off the stack and not to allocate > + * it on demand. Preemption is disabled and this code is > + * non-reentrant. > + */ > + struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask); > + int cpu; > + > + cpumask_clear(cond_cpumask); > + > + for_each_cpu(cpu, cpumask) { > + if (tlb_is_not_lazy(cpu)) > + __cpumask_set_cpu(cpu, cond_cpumask); > + } > + smp_call_function_many(cond_cpumask, flush_tlb_func, (void > *)info, 1);
No need for the cast here, which would also avoid the pointlessly overly long line.