----- On Jul 28, 2017, at 1:31 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
> On Fri, Jul 28, 2017 at 10:15:49AM -0700, Andrew Hunter wrote: >> On Thu, Jul 27, 2017 at 12:43 PM, Paul E. McKenney >> <paul...@linux.vnet.ibm.com> wrote: >> > On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote: >> >> IPIing only running threads of my process would be perfect. In fact >> >> I might even be able to make use of "membarrier these threads >> >> please" to reduce IPIs, when I change the topology from fully >> >> connected to something more sparse, on larger machines. >> >> We do this as well--sometimes we only need RSEQ fences against >> specific CPU(s), and thus pass a subset. > > Sounds like a good future enhancement, probably requiring a new syscall > to accommodate the cpumask. > >> > +static void membarrier_private_expedited_ipi_each(void) >> > +{ >> > + int cpu; >> > + >> > + for_each_online_cpu(cpu) { >> > + struct task_struct *p; >> > + >> > + rcu_read_lock(); >> > + p = task_rcu_dereference(&cpu_rq(cpu)->curr); >> > + if (p && p->mm == current->mm) >> > + smp_call_function_single(cpu, ipi_mb, NULL, 1); >> > + rcu_read_unlock(); >> > + } >> > +} >> > + >> >> We have the (simpler imho) >> >> const struct cpumask *mask = mm_cpumask(mm); >> /* possibly AND it with a user requested mask */ >> smp_call_function_many(mask, ipi_func, ....); >> >> which I think will be faster on some archs (that support broadcast) >> and have fewer problems with out of sync values (though we do have to >> check in our IPI function that we haven't context switched out. >> >> Am I missing why this won't work? > > My impression is that some architectures don't provide the needed > ordering in this case, and also that some architectures support ASIDs > and would thus IPI CPUs that weren't actually running threads in the > process at the current time. > > Mathieu, anything I am missing? As per my other email, it's pretty much it, yes. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com