----- On Sep 3, 2019, at 4:27 PM, Linus Torvalds [email protected] wrote:
> On Tue, Sep 3, 2019 at 1:11 PM Mathieu Desnoyers > <[email protected]> wrote: >> >> + cpus_read_lock(); >> + for_each_online_cpu(cpu) { > > This would likely be better off using mm_cpumask(mm) instead of all > online CPU's. I've considered using mm_cpumask(mm) in the original implementation of the membarrier expedited private command, and chose to stick to online cpu mask instead. Here was my off-list justification to Peter Zijlstra and Paul E. McKenney: If we have an iteration on mm_cpumask in the membarrier code, then we additionally need to document that memory barriers are required before and/or after all updates to the mm_cpumask, otherwise I think we end up in the same situation as with the rq->curr update. [...] So we'd be sprinkling even more memory barrier comments all over. Considering the amount of comments that needed to be added around the scheduler rq->curr update for membarrier, I'm concerned that the amount of additional analysis, documentation, and design constraints required to safely use mm_cpumask() from membarrier is not really worth it compared to iterating on online cpus with cpu hotplug read lock held. > > Plus doing the rcu_read_lock() inside the loop seems pointless. Even > with a lot of cores, it's not going to loop _that_ many times for RCU > latency to be an issue. Good point! I'll keep that in mind for next round if we don't chose an entirely different way forward. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com

