Hi Mathieu, On Fri, Aug 14, 2020 at 12:43:56PM -0400, Mathieu Desnoyers wrote: > exit_mm should issue memory barriers after user-space memory accesses, > before clearing current->mm, to order user-space memory accesses > performed prior to exit_mm before clearing tsk->mm, which has the > effect of skipping the membarrier private expedited IPIs. > > The membarrier system call can be issued concurrently with do_exit > if we have thread groups created with CLONE_VM but not CLONE_THREAD. > > Here is the scenario I have in mind: > > Two thread groups are created, A and B. Thread group B is created by > issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. > Let's assume we have a single thread within each thread group (Thread A > and Thread B). > > The AFAIU we can have: > > Userspace variables: > > int x = 0, y = 0; > > CPU 0 CPU 1 > Thread A Thread B > (in thread group A) (in thread group B) > > x = 1 > barrier() > y = 1 > exit() > exit_mm() > current->mm = NULL; > r1 = load y > membarrier() > skips CPU 0 (no IPI) because its current mm is NULL > r2 = load x > BUG_ON(r1 == 1 && r2 == 0) > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> > Cc: Peter Zijlstra (Intel) <pet...@infradead.org> > Cc: Will Deacon <w...@kernel.org> > Cc: Paul E. McKenney <paul...@kernel.org> > Cc: Nicholas Piggin <npig...@gmail.com> > Cc: Andy Lutomirski <l...@amacapital.net> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Alan Stern <st...@rowland.harvard.edu> > Cc: linux...@kvack.org > --- > Changes since v1: > - Use smp_mb__after_spinlock rather than smp_mb. > - Document race scenario in commit message. > --- > kernel/exit.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 733e80f334e7..fe64e6e28dd5 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -475,6 +475,14 @@ static void exit_mm(void) > BUG_ON(mm != current->active_mm); > /* more a memory barrier than a real lock */ > task_lock(current); > + /* > + * When a thread stops operating on an address space, the loop > + * in membarrier_{private,global}_expedited() may not observe
Is it accurate to say that the correctness of membarrier_global_expedited() relies on the observation of ->mm? Because IIUC membarrier_global_expedited() loop doesn't check ->mm. Regards, Boqun > + * that tsk->mm, and not issue an IPI. Membarrier requires a > + * memory barrier after accessing user-space memory, before > + * clearing tsk->mm. > + */ > + smp_mb__after_spinlock(); > current->mm = NULL; > mmap_read_unlock(mm); > enter_lazy_tlb(mm, current); > -- > 2.11.0 >