----- On Sep 4, 2019, at 2:26 PM, Peter Zijlstra pet...@infradead.org wrote:
> On Wed, Sep 04, 2019 at 01:12:53PM -0400, Mathieu Desnoyers wrote: >> ----- On Sep 4, 2019, at 12:09 PM, Peter Zijlstra pet...@infradead.org wrote: >> >> > On Wed, Sep 04, 2019 at 11:19:00AM -0400, Mathieu Desnoyers wrote: >> >> ----- On Sep 3, 2019, at 4:36 PM, Linus Torvalds >> >> torva...@linux-foundation.org >> >> wrote: >> > >> >> > I wonder if the easiest model might be to just use a percpu variable >> >> > instead for the membarrier stuff? It's not like it has to be in >> >> > 'struct task_struct' at all, I think. We only care about the current >> >> > runqueues, and those are percpu anyway. >> >> >> >> One issue here is that membarrier iterates over all runqueues without >> >> grabbing any runqueue lock. If we copy that state from mm to rq on >> >> sched switch prepare, we would need to ensure we have the proper >> >> memory barriers between: >> >> >> >> prior user-space memory accesses / setting the runqueue membarrier state >> >> >> >> and >> >> >> >> setting the runqueue membarrier state / following user-space memory >> >> accesses >> >> >> >> Copying the membarrier state into the task struct leverages the fact that >> >> we have documented and guaranteed those barriers around the rq->curr >> >> update >> >> in the scheduler. >> > >> > Should be the same as the barriers we already rely on for rq->curr, no? >> > That is, if we put this before switch_mm() then we have >> > smp_mb__after_spinlock() and switch_mm() itself. >> >> Yes, I think we can piggy-back on the already documented barriers documented >> around >> rq->curr store. >> >> > Also, if we place mm->membarrier_state in the same cacheline as mm->pgd >> > (which switch_mm() is bound to load) then we should be fine, I think. >> >> Yes, if we make sure membarrier_prepare_task_switch only updates the >> rq->membarrier_state if prev->mm != next->mm, we should be able to avoid >> loading next->mm->membarrier_state when switch_mm() is not invoked. >> >> I'll prepare RFC patch implementing this approach. > > Thinking about this a bit; switching it 'on' still requires some > thinking. Consider register on an already threaded process of which > multiple threads are 'current' on multiple CPUs. In that case none of > the rq bits will be set. > > Not even synchronize_rcu() is sufficient to force it either, since we > only update on switch_mm() and nothing guarantees we pass that. > > One possible approach would be to IPI broadcast (after setting the > ->mm->membarrier_State) and having the IPI update the rq state from > 'current->mm'. > > But possible I'm just confusing evryone again. I'm not having a good day > today. Indeed, switching 'on' requires some care. I implemented the IPI-based approach as per your suggestion, Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com