----- On Jul 28, 2017, at 1:15 PM, Andrew Hunter a...@google.com wrote: > On Thu, Jul 27, 2017 at 12:43 PM, Paul E. McKenney > <paul...@linux.vnet.ibm.com> wrote: >> On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote: >>> IPIing only running threads of my process would be perfect. In fact >>> I might even be able to make use of "membarrier these threads >>> please" to reduce IPIs, when I change the topology from fully >>> connected to something more sparse, on larger machines. >>> > > We do this as well--sometimes we only need RSEQ fences against > specific CPU(s), and thus pass a subset. > >> +static void membarrier_private_expedited_ipi_each(void) >> +{ >> + int cpu; >> + >> + for_each_online_cpu(cpu) { >> + struct task_struct *p; >> + >> + rcu_read_lock(); >> + p = task_rcu_dereference(&cpu_rq(cpu)->curr); >> + if (p && p->mm == current->mm) >> + smp_call_function_single(cpu, ipi_mb, NULL, 1); >> + rcu_read_unlock(); >> + } >> +} >> + > > We have the (simpler imho) > > const struct cpumask *mask = mm_cpumask(mm); > /* possibly AND it with a user requested mask */ > smp_call_function_many(mask, ipi_func, ....); > > which I think will be faster on some archs (that support broadcast) > and have fewer problems with out of sync values (though we do have to > check in our IPI function that we haven't context switched out. > > Am I missing why this won't work?
The mm cpumask is not populated on all architectures, unfortunately, so we need to do the generic implementation without it. Moreover, I recall that using this in addition to the rq->curr checks adds extra complexity wrt memory barriers vs updates of the mm_cpumask. The ipi_each loop you refer to here is only for the fallback case. The common case allocates a cpumask, populates it by looking at each rq->curr, and uses smp_call_function_many on that cpumask. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com