----- On Sep 4, 2019, at 6:53 AM, Oleg Nesterov [email protected] wrote: > On 09/03, Mathieu Desnoyers wrote: >> >> @@ -1130,6 +1130,10 @@ struct task_struct { >> unsigned long numa_pages_migrated; >> #endif /* CONFIG_NUMA_BALANCING */ >> >> +#ifdef CONFIG_MEMBARRIER >> + atomic_t membarrier_state; >> +#endif > > ... > >> +static inline void membarrier_prepare_task_switch(struct task_struct *t) >> +{ >> + if (!t->mm) >> + return; >> + atomic_set(&t->membarrier_state, >> + atomic_read(&t->mm->membarrier_state)); >> +} > > Why not > > rq->membarrier_state = next->mm ? t->mm->membarrier_state : 0; > > and > > if (cpu_rq(cpu)->membarrier_state & MEMBARRIER_STATE_GLOBAL_EXPEDITED) { > ... > } > > in membarrier_global_expedited() ? (I removed atomic_ to simplify) > > IOW, why this new member has to live in task_struct, not in rq?
As replied to Linus, if we copy the membarrier_state into the rq, we'd need to ensure we have full memory barriers between: prior user-space memory accesses / setting the runqueue membarrier state and setting the runqueue membarrier state / following user-space memory accesses Because membarrier does not take any runqueue lock when it iterates over runqueues. I try to avoid putting too much memory barrier constraints on the scheduler for membarrier, but if it's really the way forward it could be done. And the basic question remains: it is acceptable performance-wise to load mm->membarrier_state from sched switch prepare ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com

