Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 25, 2017, at 8:25 AM, Peter Zijlstra pet...@infradead.org wrote: > On Mon, Sep 25, 2017 at 08:10:54PM +0800, Boqun Feng wrote: >> > static void membarrier_register_private_expedited(void) >> > { >> > struct task_struct *p = current; >> > >> > if (READ_ONCE(p->mm->membarrier_private_expedited)) >> > return; >> > membarrier_arch_register_private_expedited(p); > > Should we not then also do: > > barrier(); > >> > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); >> > } > > to avoid the compiler lifting that store? membarrier_arch_register_private_expedited() being a function call, I recall compilers cannot move load/stores across those. Moreover, even if that function would happen to be eventually inlined, synchronize_sched() is needed at the end of the function to ensure the scheduler will observe the thread flags before it returns. That too would then act as a compiler barrier if that function is ever inlined in the future. So do you think we should still add the barrier() as documentation, or is having synchronize_sched() in the callee enough ? By the way, I think I should add a READ_ONCE() in membarrier_private_expedited to pair with the WRITE_ONCE() in registration, such as: if (!READ_ONCE(current->mm->membarrier_private_expedited)) return -EPERM; Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 25, 2017, at 8:25 AM, Peter Zijlstra pet...@infradead.org wrote: > On Mon, Sep 25, 2017 at 08:10:54PM +0800, Boqun Feng wrote: >> > static void membarrier_register_private_expedited(void) >> > { >> > struct task_struct *p = current; >> > >> > if (READ_ONCE(p->mm->membarrier_private_expedited)) >> > return; >> > membarrier_arch_register_private_expedited(p); > > Should we not then also do: > > barrier(); > >> > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); >> > } > > to avoid the compiler lifting that store? membarrier_arch_register_private_expedited() being a function call, I recall compilers cannot move load/stores across those. Moreover, even if that function would happen to be eventually inlined, synchronize_sched() is needed at the end of the function to ensure the scheduler will observe the thread flags before it returns. That too would then act as a compiler barrier if that function is ever inlined in the future. So do you think we should still add the barrier() as documentation, or is having synchronize_sched() in the callee enough ? By the way, I think I should add a READ_ONCE() in membarrier_private_expedited to pair with the WRITE_ONCE() in registration, such as: if (!READ_ONCE(current->mm->membarrier_private_expedited)) return -EPERM; Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Mon, Sep 25, 2017 at 08:10:54PM +0800, Boqun Feng wrote: > > static void membarrier_register_private_expedited(void) > > { > > struct task_struct *p = current; > > > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > > return; > > membarrier_arch_register_private_expedited(p); Should we not then also do: barrier(); > > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > > } to avoid the compiler lifting that store?
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Mon, Sep 25, 2017 at 08:10:54PM +0800, Boqun Feng wrote: > > static void membarrier_register_private_expedited(void) > > { > > struct task_struct *p = current; > > > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > > return; > > membarrier_arch_register_private_expedited(p); Should we not then also do: barrier(); > > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > > } to avoid the compiler lifting that store?
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Sun, Sep 24, 2017 at 02:23:04PM +, Mathieu Desnoyers wrote: [...] > >> > >> copy_mm() is performed without holding current->sighand->siglock, so > >> it appears to be racing with concurrent membarrier register cmd. > > > > Speak of racing, I think we currently have a problem if we do a > > register_private_expedited in one thread and do a > > membarrer_private_expedited in another thread(sharing the same mm), as > > follow: > > > > {t1,t2,t3 sharing the same ->mm} > > CPU 0 CPU 1 CPU2 > > === > > > > {in thread t1} > > membarrier_register_private_expedited(): > > ... > > WRITE_ONCE(->mm->membarrier_private_expedited, 1); > > membarrier_arch_register_private_expedited(): > > ... > >> > > > {in thread t2} > > membarrier_private_expedited(): > > > > READ_ONCE(->mm->membarrier_private_expedited); // == 1 > > ... > > for_each_online_cpu() > > ... > > curr> > > if (p && p->mm == current->mm) // > > false > > > > > > {about > > to switch to t3} > > > > rq->curr = t3; > > > > > > context_switch(): > > ... > > > > finish_task_swtich(): > > > > membarrier_sched_in(): > > > > > > // > > no smp_mb() here. > > > > , and we will miss the smp_mb() on CPU2, right? And this could even > > happen if t2 has a membarrier_register_private_expedited() preceding the > > membarrier_private_expedited(). > > > > Am I missing something subtle here? > > I think the problem sits in this function: > > static void membarrier_register_private_expedited(void) > { > struct task_struct *p = current; > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > return; > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > membarrier_arch_register_private_expedited(p); > } > > I need to change the order between WRITE_ONCE() and invoking > membarrier_arch_register_private_expedited. If I issue the > WRITE_ONCE after the arch code (which sets the TIF flags), > then concurrent membarrier priv exped commands will simply > return an -EPERM error: > > static void membarrier_register_private_expedited(void) > { > struct task_struct *p = current; > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > return; > membarrier_arch_register_private_expedited(p); > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > } > > Do you agree that this would fix the race you identified ? > Yep, that will do the trick ;-) Regards, Boqun > Thanks, > > Mathieu > signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Sun, Sep 24, 2017 at 02:23:04PM +, Mathieu Desnoyers wrote: [...] > >> > >> copy_mm() is performed without holding current->sighand->siglock, so > >> it appears to be racing with concurrent membarrier register cmd. > > > > Speak of racing, I think we currently have a problem if we do a > > register_private_expedited in one thread and do a > > membarrer_private_expedited in another thread(sharing the same mm), as > > follow: > > > > {t1,t2,t3 sharing the same ->mm} > > CPU 0 CPU 1 CPU2 > > === > > > > {in thread t1} > > membarrier_register_private_expedited(): > > ... > > WRITE_ONCE(->mm->membarrier_private_expedited, 1); > > membarrier_arch_register_private_expedited(): > > ... > > > > > > {in thread t2} > > membarrier_private_expedited(): > > > > READ_ONCE(->mm->membarrier_private_expedited); // == 1 > > ... > > for_each_online_cpu() > > ... > > curr> > > if (p && p->mm == current->mm) // > > false > > > > > > {about > > to switch to t3} > > > > rq->curr = t3; > > > > > > context_switch(): > > ... > > > > finish_task_swtich(): > > > > membarrier_sched_in(): > > > > > > // > > no smp_mb() here. > > > > , and we will miss the smp_mb() on CPU2, right? And this could even > > happen if t2 has a membarrier_register_private_expedited() preceding the > > membarrier_private_expedited(). > > > > Am I missing something subtle here? > > I think the problem sits in this function: > > static void membarrier_register_private_expedited(void) > { > struct task_struct *p = current; > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > return; > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > membarrier_arch_register_private_expedited(p); > } > > I need to change the order between WRITE_ONCE() and invoking > membarrier_arch_register_private_expedited. If I issue the > WRITE_ONCE after the arch code (which sets the TIF flags), > then concurrent membarrier priv exped commands will simply > return an -EPERM error: > > static void membarrier_register_private_expedited(void) > { > struct task_struct *p = current; > > if (READ_ONCE(p->mm->membarrier_private_expedited)) > return; > membarrier_arch_register_private_expedited(p); > WRITE_ONCE(p->mm->membarrier_private_expedited, 1); > } > > Do you agree that this would fix the race you identified ? > Yep, that will do the trick ;-) Regards, Boqun > Thanks, > > Mathieu > signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 24, 2017, at 9:30 AM, Boqun Feng boqun.f...@gmail.com wrote: > On Fri, Sep 22, 2017 at 03:10:10PM +, Mathieu Desnoyers wrote: >> - On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: >> >> > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: >> > [...] >> >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, >> >> + struct task_struct *next) >> >> +{ >> >> + /* >> >> + * Only need the full barrier when switching between processes. >> >> + */ >> >> + if (likely(!test_ti_thread_flag(task_thread_info(next), >> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> >> + || prev->mm == next->mm)) >> > >> > And we also don't need the smp_mb() if !prev->mm, because switching from >> > kernel to user will have a smp_mb() implied by mmdrop()? >> >> Right. And we also don't need it when switching from userspace to kernel > > Yep, but this case is covered already, as I think we don't allow kernel > thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right? Good point. > >> thread neither. Something like this: >> >> static inline void membarrier_arch_sched_in(struct task_struct *prev, >> struct task_struct *next) >> { >> /* >> * Only need the full barrier when switching between processes. >> * Barrier when switching from kernel to userspace is not >> * required here, given that it is implied by mmdrop(). Barrier >> * when switching from userspace to kernel is not needed after >> * store to rq->curr. >> */ >> if (likely(!test_ti_thread_flag(task_thread_info(next), >> TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> || !prev->mm || !next->mm || prev->mm == next->mm)) > > , so no need to test next->mm here. > Right, it's redundant wrt testing the thread flag. >> return; >> >> /* >> * The membarrier system call requires a full memory barrier >> * after storing to rq->curr, before going back to user-space. >> */ >> smp_mb(); >> } >> >> > >> >> + return; >> >> + >> >> + /* >> >> + * The membarrier system call requires a full memory barrier >> >> + * after storing to rq->curr, before going back to user-space. >> >> + */ >> >> + smp_mb(); >> >> +} >> > >> > [...] >> > >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> + unsigned long clone_flags) >> >> +{ >> >> + if (!current->mm || !t->mm) >> >> + return; >> >> + t->mm->membarrier_private_expedited = >> >> + current->mm->membarrier_private_expedited; >> > >> > Have we already done the copy of ->membarrier_private_expedited in >> > copy_mm()? >> >> copy_mm() is performed without holding current->sighand->siglock, so >> it appears to be racing with concurrent membarrier register cmd. > > Speak of racing, I think we currently have a problem if we do a > register_private_expedited in one thread and do a > membarrer_private_expedited in another thread(sharing the same mm), as > follow: > > {t1,t2,t3 sharing the same ->mm} > CPU 0 CPU 1 CPU2 > === > > {in thread t1} > membarrier_register_private_expedited(): > ... > WRITE_ONCE(->mm->membarrier_private_expedited, 1); > membarrier_arch_register_private_expedited(): > ... >> > {in thread t2} > membarrier_private_expedited(): > > READ_ONCE(->mm->membarrier_private_expedited); // == 1 > ... > for_each_online_cpu() > ... > curr> > if (p && p->mm == current->mm) // > false > > > {about > to switch to t3} > > rq->curr = t3; > > > context_switch(): > ... > > finish_task_swtich(): > > membarrier_sched_in(): > > >
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 24, 2017, at 9:30 AM, Boqun Feng boqun.f...@gmail.com wrote: > On Fri, Sep 22, 2017 at 03:10:10PM +, Mathieu Desnoyers wrote: >> - On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: >> >> > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: >> > [...] >> >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, >> >> + struct task_struct *next) >> >> +{ >> >> + /* >> >> + * Only need the full barrier when switching between processes. >> >> + */ >> >> + if (likely(!test_ti_thread_flag(task_thread_info(next), >> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> >> + || prev->mm == next->mm)) >> > >> > And we also don't need the smp_mb() if !prev->mm, because switching from >> > kernel to user will have a smp_mb() implied by mmdrop()? >> >> Right. And we also don't need it when switching from userspace to kernel > > Yep, but this case is covered already, as I think we don't allow kernel > thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right? Good point. > >> thread neither. Something like this: >> >> static inline void membarrier_arch_sched_in(struct task_struct *prev, >> struct task_struct *next) >> { >> /* >> * Only need the full barrier when switching between processes. >> * Barrier when switching from kernel to userspace is not >> * required here, given that it is implied by mmdrop(). Barrier >> * when switching from userspace to kernel is not needed after >> * store to rq->curr. >> */ >> if (likely(!test_ti_thread_flag(task_thread_info(next), >> TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> || !prev->mm || !next->mm || prev->mm == next->mm)) > > , so no need to test next->mm here. > Right, it's redundant wrt testing the thread flag. >> return; >> >> /* >> * The membarrier system call requires a full memory barrier >> * after storing to rq->curr, before going back to user-space. >> */ >> smp_mb(); >> } >> >> > >> >> + return; >> >> + >> >> + /* >> >> + * The membarrier system call requires a full memory barrier >> >> + * after storing to rq->curr, before going back to user-space. >> >> + */ >> >> + smp_mb(); >> >> +} >> > >> > [...] >> > >> >> +static inline void membarrier_fork(struct task_struct *t, >> >> + unsigned long clone_flags) >> >> +{ >> >> + if (!current->mm || !t->mm) >> >> + return; >> >> + t->mm->membarrier_private_expedited = >> >> + current->mm->membarrier_private_expedited; >> > >> > Have we already done the copy of ->membarrier_private_expedited in >> > copy_mm()? >> >> copy_mm() is performed without holding current->sighand->siglock, so >> it appears to be racing with concurrent membarrier register cmd. > > Speak of racing, I think we currently have a problem if we do a > register_private_expedited in one thread and do a > membarrer_private_expedited in another thread(sharing the same mm), as > follow: > > {t1,t2,t3 sharing the same ->mm} > CPU 0 CPU 1 CPU2 > === > > {in thread t1} > membarrier_register_private_expedited(): > ... > WRITE_ONCE(->mm->membarrier_private_expedited, 1); > membarrier_arch_register_private_expedited(): > ... > > > {in thread t2} > membarrier_private_expedited(): > > READ_ONCE(->mm->membarrier_private_expedited); // == 1 > ... > for_each_online_cpu() > ... > curr> > if (p && p->mm == current->mm) // > false > > > {about > to switch to t3} > > rq->curr = t3; > > > context_switch(): > ... > > finish_task_swtich(): > > membarrier_sched_in(): > > >
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 03:10:10PM +, Mathieu Desnoyers wrote: > - On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: > > > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > > [...] > >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, > >> + struct task_struct *next) > >> +{ > >> + /* > >> + * Only need the full barrier when switching between processes. > >> + */ > >> + if (likely(!test_ti_thread_flag(task_thread_info(next), > >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED) > >> + || prev->mm == next->mm)) > > > > And we also don't need the smp_mb() if !prev->mm, because switching from > > kernel to user will have a smp_mb() implied by mmdrop()? > > Right. And we also don't need it when switching from userspace to kernel Yep, but this case is covered already, as I think we don't allow kernel thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right? > thread neither. Something like this: > > static inline void membarrier_arch_sched_in(struct task_struct *prev, > struct task_struct *next) > { > /* > * Only need the full barrier when switching between processes. > * Barrier when switching from kernel to userspace is not > * required here, given that it is implied by mmdrop(). Barrier > * when switching from userspace to kernel is not needed after > * store to rq->curr. > */ > if (likely(!test_ti_thread_flag(task_thread_info(next), > TIF_MEMBARRIER_PRIVATE_EXPEDITED) > || !prev->mm || !next->mm || prev->mm == next->mm)) , so no need to test next->mm here. > return; > > /* > * The membarrier system call requires a full memory barrier > * after storing to rq->curr, before going back to user-space. > */ > smp_mb(); > } > > > > >> + return; > >> + > >> + /* > >> + * The membarrier system call requires a full memory barrier > >> + * after storing to rq->curr, before going back to user-space. > >> + */ > >> + smp_mb(); > >> +} > > > > [...] > > > >> +static inline void membarrier_fork(struct task_struct *t, > >> + unsigned long clone_flags) > >> +{ > >> + if (!current->mm || !t->mm) > >> + return; > >> + t->mm->membarrier_private_expedited = > >> + current->mm->membarrier_private_expedited; > > > > Have we already done the copy of ->membarrier_private_expedited in > > copy_mm()? > > copy_mm() is performed without holding current->sighand->siglock, so > it appears to be racing with concurrent membarrier register cmd. Speak of racing, I think we currently have a problem if we do a register_private_expedited in one thread and do a membarrer_private_expedited in another thread(sharing the same mm), as follow: {t1,t2,t3 sharing the same ->mm} CPU 0 CPU 1 CPU2 === {in thread t1} membarrier_register_private_expedited(): ... WRITE_ONCE(->mm->membarrier_private_expedited, 1); membarrier_arch_register_private_expedited(): ...{in thread t2} membarrier_private_expedited(): READ_ONCE(->mm->membarrier_private_expedited); // == 1 ... for_each_online_cpu() ... curr> if (p && p->mm == current->mm) // false {about to switch to t3} rq->curr = t3; context_switch(): ... finish_task_swtich(): membarrier_sched_in(): // no smp_mb() here. , and we will miss the smp_mb() on CPU2, right? And this could even happen if t2 has a
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 03:10:10PM +, Mathieu Desnoyers wrote: > - On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: > > > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > > [...] > >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, > >> + struct task_struct *next) > >> +{ > >> + /* > >> + * Only need the full barrier when switching between processes. > >> + */ > >> + if (likely(!test_ti_thread_flag(task_thread_info(next), > >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED) > >> + || prev->mm == next->mm)) > > > > And we also don't need the smp_mb() if !prev->mm, because switching from > > kernel to user will have a smp_mb() implied by mmdrop()? > > Right. And we also don't need it when switching from userspace to kernel Yep, but this case is covered already, as I think we don't allow kernel thread to have TIF_MEMBARRIER_PRIVATE_EXPEDITED set, right? > thread neither. Something like this: > > static inline void membarrier_arch_sched_in(struct task_struct *prev, > struct task_struct *next) > { > /* > * Only need the full barrier when switching between processes. > * Barrier when switching from kernel to userspace is not > * required here, given that it is implied by mmdrop(). Barrier > * when switching from userspace to kernel is not needed after > * store to rq->curr. > */ > if (likely(!test_ti_thread_flag(task_thread_info(next), > TIF_MEMBARRIER_PRIVATE_EXPEDITED) > || !prev->mm || !next->mm || prev->mm == next->mm)) , so no need to test next->mm here. > return; > > /* > * The membarrier system call requires a full memory barrier > * after storing to rq->curr, before going back to user-space. > */ > smp_mb(); > } > > > > >> + return; > >> + > >> + /* > >> + * The membarrier system call requires a full memory barrier > >> + * after storing to rq->curr, before going back to user-space. > >> + */ > >> + smp_mb(); > >> +} > > > > [...] > > > >> +static inline void membarrier_fork(struct task_struct *t, > >> + unsigned long clone_flags) > >> +{ > >> + if (!current->mm || !t->mm) > >> + return; > >> + t->mm->membarrier_private_expedited = > >> + current->mm->membarrier_private_expedited; > > > > Have we already done the copy of ->membarrier_private_expedited in > > copy_mm()? > > copy_mm() is performed without holding current->sighand->siglock, so > it appears to be racing with concurrent membarrier register cmd. Speak of racing, I think we currently have a problem if we do a register_private_expedited in one thread and do a membarrer_private_expedited in another thread(sharing the same mm), as follow: {t1,t2,t3 sharing the same ->mm} CPU 0 CPU 1 CPU2 === {in thread t1} membarrier_register_private_expedited(): ... WRITE_ONCE(->mm->membarrier_private_expedited, 1); membarrier_arch_register_private_expedited(): ... {in thread t2} membarrier_private_expedited(): READ_ONCE(->mm->membarrier_private_expedited); // == 1 ... for_each_online_cpu() ... curr> if (p && p->mm == current->mm) // false {about to switch to t3} rq->curr = t3; context_switch(): ... finish_task_swtich(): membarrier_sched_in(): // no smp_mb() here. , and we will miss the smp_mb() on CPU2, right? And this could even happen if t2 has a membarrier_register_private_expedited() preceding the
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > [...] >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, >> +struct task_struct *next) >> +{ >> +/* >> + * Only need the full barrier when switching between processes. >> + */ >> +if (likely(!test_ti_thread_flag(task_thread_info(next), >> +TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> +|| prev->mm == next->mm)) > > And we also don't need the smp_mb() if !prev->mm, because switching from > kernel to user will have a smp_mb() implied by mmdrop()? Right. And we also don't need it when switching from userspace to kernel thread neither. Something like this: static inline void membarrier_arch_sched_in(struct task_struct *prev, struct task_struct *next) { /* * Only need the full barrier when switching between processes. * Barrier when switching from kernel to userspace is not * required here, given that it is implied by mmdrop(). Barrier * when switching from userspace to kernel is not needed after * store to rq->curr. */ if (likely(!test_ti_thread_flag(task_thread_info(next), TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev->mm || !next->mm || prev->mm == next->mm)) return; /* * The membarrier system call requires a full memory barrier * after storing to rq->curr, before going back to user-space. */ smp_mb(); } > >> +return; >> + >> +/* >> + * The membarrier system call requires a full memory barrier >> + * after storing to rq->curr, before going back to user-space. >> + */ >> +smp_mb(); >> +} > > [...] > >> +static inline void membarrier_fork(struct task_struct *t, >> +unsigned long clone_flags) >> +{ >> +if (!current->mm || !t->mm) >> +return; >> +t->mm->membarrier_private_expedited = >> +current->mm->membarrier_private_expedited; > > Have we already done the copy of ->membarrier_private_expedited in > copy_mm()? copy_mm() is performed without holding current->sighand->siglock, so it appears to be racing with concurrent membarrier register cmd. However, given that it is a single flag updated with WRITE_ONCE() and read with READ_ONCE(), it might be OK to rely on copy_mm there. If userspace runs registration concurrently with fork, they should not expect the child to be specifically registered or unregistered. So yes, I think you are right about removing this copy and relying on copy_mm() instead. I also think we can improve membarrier_arch_fork() on powerpc to test the current thread flag rather than using current->mm. Which leads to those two changes: static inline void membarrier_fork(struct task_struct *t, unsigned long clone_flags) { /* * Prior copy_mm() copies the membarrier_private_expedited field * from current->mm to t->mm. */ membarrier_arch_fork(t, clone_flags); } And on PowerPC: static inline void membarrier_arch_fork(struct task_struct *t, unsigned long clone_flags) { /* * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread * fork is protected by siglock. membarrier_arch_fork is called * with siglock held. */ if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED)) set_ti_thread_flag(task_thread_info(t), TIF_MEMBARRIER_PRIVATE_EXPEDITED); } Thanks, Mathieu > > Regards, > Boqun > >> +membarrier_arch_fork(t, clone_flags); >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> +t->mm->membarrier_private_expedited = 0; >> +membarrier_arch_execve(t); >> +} >> +#else >> +static inline void membarrier_sched_in(struct task_struct *prev, >> +struct task_struct *next) >> +{ >> +} >> +static inline void membarrier_fork(struct task_struct *t, >> +unsigned long clone_flags) >> +{ >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> +} >> +#endif >> + > [...] -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 22, 2017, at 4:59 AM, Boqun Feng boqun.f...@gmail.com wrote: > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > [...] >> +static inline void membarrier_arch_sched_in(struct task_struct *prev, >> +struct task_struct *next) >> +{ >> +/* >> + * Only need the full barrier when switching between processes. >> + */ >> +if (likely(!test_ti_thread_flag(task_thread_info(next), >> +TIF_MEMBARRIER_PRIVATE_EXPEDITED) >> +|| prev->mm == next->mm)) > > And we also don't need the smp_mb() if !prev->mm, because switching from > kernel to user will have a smp_mb() implied by mmdrop()? Right. And we also don't need it when switching from userspace to kernel thread neither. Something like this: static inline void membarrier_arch_sched_in(struct task_struct *prev, struct task_struct *next) { /* * Only need the full barrier when switching between processes. * Barrier when switching from kernel to userspace is not * required here, given that it is implied by mmdrop(). Barrier * when switching from userspace to kernel is not needed after * store to rq->curr. */ if (likely(!test_ti_thread_flag(task_thread_info(next), TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev->mm || !next->mm || prev->mm == next->mm)) return; /* * The membarrier system call requires a full memory barrier * after storing to rq->curr, before going back to user-space. */ smp_mb(); } > >> +return; >> + >> +/* >> + * The membarrier system call requires a full memory barrier >> + * after storing to rq->curr, before going back to user-space. >> + */ >> +smp_mb(); >> +} > > [...] > >> +static inline void membarrier_fork(struct task_struct *t, >> +unsigned long clone_flags) >> +{ >> +if (!current->mm || !t->mm) >> +return; >> +t->mm->membarrier_private_expedited = >> +current->mm->membarrier_private_expedited; > > Have we already done the copy of ->membarrier_private_expedited in > copy_mm()? copy_mm() is performed without holding current->sighand->siglock, so it appears to be racing with concurrent membarrier register cmd. However, given that it is a single flag updated with WRITE_ONCE() and read with READ_ONCE(), it might be OK to rely on copy_mm there. If userspace runs registration concurrently with fork, they should not expect the child to be specifically registered or unregistered. So yes, I think you are right about removing this copy and relying on copy_mm() instead. I also think we can improve membarrier_arch_fork() on powerpc to test the current thread flag rather than using current->mm. Which leads to those two changes: static inline void membarrier_fork(struct task_struct *t, unsigned long clone_flags) { /* * Prior copy_mm() copies the membarrier_private_expedited field * from current->mm to t->mm. */ membarrier_arch_fork(t, clone_flags); } And on PowerPC: static inline void membarrier_arch_fork(struct task_struct *t, unsigned long clone_flags) { /* * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread * fork is protected by siglock. membarrier_arch_fork is called * with siglock held. */ if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED)) set_ti_thread_flag(task_thread_info(t), TIF_MEMBARRIER_PRIVATE_EXPEDITED); } Thanks, Mathieu > > Regards, > Boqun > >> +membarrier_arch_fork(t, clone_flags); >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> +t->mm->membarrier_private_expedited = 0; >> +membarrier_arch_execve(t); >> +} >> +#else >> +static inline void membarrier_sched_in(struct task_struct *prev, >> +struct task_struct *next) >> +{ >> +} >> +static inline void membarrier_fork(struct task_struct *t, >> +unsigned long clone_flags) >> +{ >> +} >> +static inline void membarrier_execve(struct task_struct *t) >> +{ >> +} >> +#endif >> + > [...] -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: [...] > +static inline void membarrier_arch_sched_in(struct task_struct *prev, > + struct task_struct *next) > +{ > + /* > + * Only need the full barrier when switching between processes. > + */ > + if (likely(!test_ti_thread_flag(task_thread_info(next), > + TIF_MEMBARRIER_PRIVATE_EXPEDITED) > + || prev->mm == next->mm)) And we also don't need the smp_mb() if !prev->mm, because switching from kernel to user will have a smp_mb() implied by mmdrop()? > + return; > + > + /* > + * The membarrier system call requires a full memory barrier > + * after storing to rq->curr, before going back to user-space. > + */ > + smp_mb(); > +} [...] > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > + if (!current->mm || !t->mm) > + return; > + t->mm->membarrier_private_expedited = > + current->mm->membarrier_private_expedited; Have we already done the copy of ->membarrier_private_expedited in copy_mm()? Regards, Boqun > + membarrier_arch_fork(t, clone_flags); > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > + t->mm->membarrier_private_expedited = 0; > + membarrier_arch_execve(t); > +} > +#else > +static inline void membarrier_sched_in(struct task_struct *prev, > + struct task_struct *next) > +{ > +} > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > +} > +#endif > + [...] signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: [...] > +static inline void membarrier_arch_sched_in(struct task_struct *prev, > + struct task_struct *next) > +{ > + /* > + * Only need the full barrier when switching between processes. > + */ > + if (likely(!test_ti_thread_flag(task_thread_info(next), > + TIF_MEMBARRIER_PRIVATE_EXPEDITED) > + || prev->mm == next->mm)) And we also don't need the smp_mb() if !prev->mm, because switching from kernel to user will have a smp_mb() implied by mmdrop()? > + return; > + > + /* > + * The membarrier system call requires a full memory barrier > + * after storing to rq->curr, before going back to user-space. > + */ > + smp_mb(); > +} [...] > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > + if (!current->mm || !t->mm) > + return; > + t->mm->membarrier_private_expedited = > + current->mm->membarrier_private_expedited; Have we already done the copy of ->membarrier_private_expedited in copy_mm()? Regards, Boqun > + membarrier_arch_fork(t, clone_flags); > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > + t->mm->membarrier_private_expedited = 0; > + membarrier_arch_execve(t); > +} > +#else > +static inline void membarrier_sched_in(struct task_struct *prev, > + struct task_struct *next) > +{ > +} > +static inline void membarrier_fork(struct task_struct *t, > + unsigned long clone_flags) > +{ > +} > +static inline void membarrier_execve(struct task_struct *t) > +{ > +} > +#endif > + [...] signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 10:24:41AM +0200, Peter Zijlstra wrote: > On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > > > The idea is in membarrier_private_expedited(), we go through all ->curr > > on each CPU and > > > > 1) If it's a userspace task and its ->mm is matched, we send an ipi > > > > 2) If it's a kernel task, we skip > > > > (Because there will be a smp_mb() implied by mmdrop(), when it > > switchs to userspace task). > > > > 3) If it's a userspace task and its ->mm is not matched, we take > > the corresponding rq->lock and check rq->curr again, if its ->mm > > matched, we send an ipi, otherwise we do nothing. > > > > (Because if we observe rq->curr is not matched with rq->lock > > held, when a task having matched ->mm schedules in, the rq->lock > > pairing along with the smp_mb__after_spinlock() will guarantee > > it observes all memory ops before sys_membarrir()). > > 3) is an insta DoS. OK, fair enough. Regards, Boqun signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 10:24:41AM +0200, Peter Zijlstra wrote: > On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > > > The idea is in membarrier_private_expedited(), we go through all ->curr > > on each CPU and > > > > 1) If it's a userspace task and its ->mm is matched, we send an ipi > > > > 2) If it's a kernel task, we skip > > > > (Because there will be a smp_mb() implied by mmdrop(), when it > > switchs to userspace task). > > > > 3) If it's a userspace task and its ->mm is not matched, we take > > the corresponding rq->lock and check rq->curr again, if its ->mm > > matched, we send an ipi, otherwise we do nothing. > > > > (Because if we observe rq->curr is not matched with rq->lock > > held, when a task having matched ->mm schedules in, the rq->lock > > pairing along with the smp_mb__after_spinlock() will guarantee > > it observes all memory ops before sys_membarrir()). > > 3) is an insta DoS. OK, fair enough. Regards, Boqun signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > The idea is in membarrier_private_expedited(), we go through all ->curr > on each CPU and > > 1)If it's a userspace task and its ->mm is matched, we send an ipi > > 2)If it's a kernel task, we skip > > (Because there will be a smp_mb() implied by mmdrop(), when it > switchs to userspace task). > > 3)If it's a userspace task and its ->mm is not matched, we take > the corresponding rq->lock and check rq->curr again, if its ->mm > matched, we send an ipi, otherwise we do nothing. > > (Because if we observe rq->curr is not matched with rq->lock > held, when a task having matched ->mm schedules in, the rq->lock > pairing along with the smp_mb__after_spinlock() will guarantee > it observes all memory ops before sys_membarrir()). 3) is an insta DoS.
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > The idea is in membarrier_private_expedited(), we go through all ->curr > on each CPU and > > 1)If it's a userspace task and its ->mm is matched, we send an ipi > > 2)If it's a kernel task, we skip > > (Because there will be a smp_mb() implied by mmdrop(), when it > switchs to userspace task). > > 3)If it's a userspace task and its ->mm is not matched, we take > the corresponding rq->lock and check rq->curr again, if its ->mm > matched, we send an ipi, otherwise we do nothing. > > (Because if we observe rq->curr is not matched with rq->lock > held, when a task having matched ->mm schedules in, the rq->lock > pairing along with the smp_mb__after_spinlock() will guarantee > it observes all memory ops before sys_membarrir()). 3) is an insta DoS.
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 21, 2017, at 11:30 PM, Boqun Feng boqun.f...@gmail.com wrote: > On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: >> Hi Mathieu, >> >> On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: >> > Provide a new command allowing processes to register their intent to use >> > the private expedited command. >> > >> > This allows PowerPC to skip the full memory barrier in switch_mm(), and >> > only issue the barrier when scheduling into a task belonging to a >> > process that has registered to use expedited private. >> > >> > Processes are now required to register before using >> > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. >> > >> >> Sorry I'm late for the party, but I couldn't stop thinking whether we >> could avoid the register thing at all, because the registering makes >> sys_membarrier() more complex(both for the interface and the >> implementation). So how about we trade-off a little bit by taking >> some(not all) the rq->locks? >> >> The idea is in membarrier_private_expedited(), we go through all ->curr >> on each CPU and >> >> 1) If it's a userspace task and its ->mm is matched, we send an ipi >> >> 2) If it's a kernel task, we skip >> >> (Because there will be a smp_mb() implied by mmdrop(), when it >> switchs to userspace task). >> >> 3) If it's a userspace task and its ->mm is not matched, we take >> the corresponding rq->lock and check rq->curr again, if its ->mm >> matched, we send an ipi, otherwise we do nothing. >> >> (Because if we observe rq->curr is not matched with rq->lock >> held, when a task having matched ->mm schedules in, the rq->lock >> pairing along with the smp_mb__after_spinlock() will guarantee >> it observes all memory ops before sys_membarrir()). >> >> membarrier_private_expedited() will look like this if we choose this >> way: >> >> void membarrier_private_expedited() >> { >> int cpu; >> bool fallback = false; >> cpumask_var_t tmpmask; >> struct rq_flags rf; >> >> >> if (num_online_cpus() == 1) >> return; >> >> smp_mb(); >> >> if (!zalloc_cpumask_var(, GFP_NOWAIT)) { >> /* Fallback for OOM. */ >> fallback = true; >> } >> >> cpus_read_lock(); >> for_each_online_cpu(cpu) { >> struct task_struct *p; >> >> if (cpu == raw_smp_processor_id()) >> continue; >> >> rcu_read_lock(); >> p = task_rcu_dereference(_rq(cpu)->curr); >> >> if (!p) { >> rcu_read_unlock(); >> continue; >> } >> >> if (p->mm == current->mm) { >> if (!fallback) >> __cpumask_set_cpu(cpu, tmpmask); >> else >> smp_call_function_single(cpu, ipi_mb, NULL, 1); >> } >> >> if (p->mm == current->mm || !p->mm) { >> rcu_read_unlock(); >> continue; >> } >> >> rcu_read_unlock(); >> >> /* >> * This should be a arch-specific code, as we don't >> * need it at else place other than some archs without >> * a smp_mb() in switch_mm() (i.e. powerpc) >> */ >> rq_lock_irq(cpu_rq(cpu), ); >> if (p->mm == current->mm) { > > Oops, this one should be > > if (cpu_curr(cpu)->mm == current->mm) > >> if (!fallback) >> __cpumask_set_cpu(cpu, tmpmask); >> else >> smp_call_function_single(cpu, ipi_mb, NULL, 1); > > , and this better be moved out of the lock rq->lock critical section. > > Regards, > Boqun > >> } >> rq_unlock_irq(cpu_rq(cpu), ); >> } >> if (!fallback) { >> smp_call_function_many(tmpmask, ipi_mb, NULL, 1); >> free_cpumask_var(tmpmask); >> } >> cpus_read_unlock(); >> >> smp_mb(); >> } >> >> Thoughts? Hi Boqun, The main concern Peter has with the runqueue locking approach is interference with the scheduler by hitting all CPU's runqueue locks repeatedly if someone performs membarrier system calls in a short loop. Just reading the rq->curr pointer does not generate as much overhead as grabbing each rq lock. Thanks, Mathieu >> >> Regards, >> Boqun >> > [...] -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
- On Sep 21, 2017, at 11:30 PM, Boqun Feng boqun.f...@gmail.com wrote: > On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: >> Hi Mathieu, >> >> On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: >> > Provide a new command allowing processes to register their intent to use >> > the private expedited command. >> > >> > This allows PowerPC to skip the full memory barrier in switch_mm(), and >> > only issue the barrier when scheduling into a task belonging to a >> > process that has registered to use expedited private. >> > >> > Processes are now required to register before using >> > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. >> > >> >> Sorry I'm late for the party, but I couldn't stop thinking whether we >> could avoid the register thing at all, because the registering makes >> sys_membarrier() more complex(both for the interface and the >> implementation). So how about we trade-off a little bit by taking >> some(not all) the rq->locks? >> >> The idea is in membarrier_private_expedited(), we go through all ->curr >> on each CPU and >> >> 1) If it's a userspace task and its ->mm is matched, we send an ipi >> >> 2) If it's a kernel task, we skip >> >> (Because there will be a smp_mb() implied by mmdrop(), when it >> switchs to userspace task). >> >> 3) If it's a userspace task and its ->mm is not matched, we take >> the corresponding rq->lock and check rq->curr again, if its ->mm >> matched, we send an ipi, otherwise we do nothing. >> >> (Because if we observe rq->curr is not matched with rq->lock >> held, when a task having matched ->mm schedules in, the rq->lock >> pairing along with the smp_mb__after_spinlock() will guarantee >> it observes all memory ops before sys_membarrir()). >> >> membarrier_private_expedited() will look like this if we choose this >> way: >> >> void membarrier_private_expedited() >> { >> int cpu; >> bool fallback = false; >> cpumask_var_t tmpmask; >> struct rq_flags rf; >> >> >> if (num_online_cpus() == 1) >> return; >> >> smp_mb(); >> >> if (!zalloc_cpumask_var(, GFP_NOWAIT)) { >> /* Fallback for OOM. */ >> fallback = true; >> } >> >> cpus_read_lock(); >> for_each_online_cpu(cpu) { >> struct task_struct *p; >> >> if (cpu == raw_smp_processor_id()) >> continue; >> >> rcu_read_lock(); >> p = task_rcu_dereference(_rq(cpu)->curr); >> >> if (!p) { >> rcu_read_unlock(); >> continue; >> } >> >> if (p->mm == current->mm) { >> if (!fallback) >> __cpumask_set_cpu(cpu, tmpmask); >> else >> smp_call_function_single(cpu, ipi_mb, NULL, 1); >> } >> >> if (p->mm == current->mm || !p->mm) { >> rcu_read_unlock(); >> continue; >> } >> >> rcu_read_unlock(); >> >> /* >> * This should be a arch-specific code, as we don't >> * need it at else place other than some archs without >> * a smp_mb() in switch_mm() (i.e. powerpc) >> */ >> rq_lock_irq(cpu_rq(cpu), ); >> if (p->mm == current->mm) { > > Oops, this one should be > > if (cpu_curr(cpu)->mm == current->mm) > >> if (!fallback) >> __cpumask_set_cpu(cpu, tmpmask); >> else >> smp_call_function_single(cpu, ipi_mb, NULL, 1); > > , and this better be moved out of the lock rq->lock critical section. > > Regards, > Boqun > >> } >> rq_unlock_irq(cpu_rq(cpu), ); >> } >> if (!fallback) { >> smp_call_function_many(tmpmask, ipi_mb, NULL, 1); >> free_cpumask_var(tmpmask); >> } >> cpus_read_unlock(); >> >> smp_mb(); >> } >> >> Thoughts? Hi Boqun, The main concern Peter has with the runqueue locking approach is interference with the scheduler by hitting all CPU's runqueue locks repeatedly if someone performs membarrier system calls in a short loop. Just reading the rq->curr pointer does not generate as much overhead as grabbing each rq lock. Thanks, Mathieu >> >> Regards, >> Boqun >> > [...] -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > Hi Mathieu, > > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > > Provide a new command allowing processes to register their intent to use > > the private expedited command. > > > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > > only issue the barrier when scheduling into a task belonging to a > > process that has registered to use expedited private. > > > > Processes are now required to register before using > > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > > > > Sorry I'm late for the party, but I couldn't stop thinking whether we > could avoid the register thing at all, because the registering makes > sys_membarrier() more complex(both for the interface and the > implementation). So how about we trade-off a little bit by taking > some(not all) the rq->locks? > > The idea is in membarrier_private_expedited(), we go through all ->curr > on each CPU and > > 1)If it's a userspace task and its ->mm is matched, we send an ipi > > 2)If it's a kernel task, we skip > > (Because there will be a smp_mb() implied by mmdrop(), when it > switchs to userspace task). > > 3)If it's a userspace task and its ->mm is not matched, we take > the corresponding rq->lock and check rq->curr again, if its ->mm > matched, we send an ipi, otherwise we do nothing. > > (Because if we observe rq->curr is not matched with rq->lock > held, when a task having matched ->mm schedules in, the rq->lock > pairing along with the smp_mb__after_spinlock() will guarantee > it observes all memory ops before sys_membarrir()). > > membarrier_private_expedited() will look like this if we choose this > way: > > void membarrier_private_expedited() > { > int cpu; > bool fallback = false; > cpumask_var_t tmpmask; > struct rq_flags rf; > > > if (num_online_cpus() == 1) > return; > > smp_mb(); > > if (!zalloc_cpumask_var(, GFP_NOWAIT)) { > /* Fallback for OOM. */ > fallback = true; > } > > cpus_read_lock(); > for_each_online_cpu(cpu) { > struct task_struct *p; > > if (cpu == raw_smp_processor_id()) > continue; > > rcu_read_lock(); > p = task_rcu_dereference(_rq(cpu)->curr); > > if (!p) { > rcu_read_unlock(); > continue; > } > > if (p->mm == current->mm) { > if (!fallback) > __cpumask_set_cpu(cpu, tmpmask); > else > smp_call_function_single(cpu, ipi_mb, NULL, 1); > } > > if (p->mm == current->mm || !p->mm) { > rcu_read_unlock(); > continue; > } > > rcu_read_unlock(); > > /* >* This should be a arch-specific code, as we don't >* need it at else place other than some archs without >* a smp_mb() in switch_mm() (i.e. powerpc) >*/ > rq_lock_irq(cpu_rq(cpu), ); > if (p->mm == current->mm) { Oops, this one should be if (cpu_curr(cpu)->mm == current->mm) > if (!fallback) > __cpumask_set_cpu(cpu, tmpmask); > else > smp_call_function_single(cpu, ipi_mb, NULL, 1); , and this better be moved out of the lock rq->lock critical section. Regards, Boqun > } > rq_unlock_irq(cpu_rq(cpu), ); > } > if (!fallback) { > smp_call_function_many(tmpmask, ipi_mb, NULL, 1); > free_cpumask_var(tmpmask); > } > cpus_read_unlock(); > > smp_mb(); > } > > Thoughts? > > Regards, > Boqun > [...] signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote: > Hi Mathieu, > > On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > > Provide a new command allowing processes to register their intent to use > > the private expedited command. > > > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > > only issue the barrier when scheduling into a task belonging to a > > process that has registered to use expedited private. > > > > Processes are now required to register before using > > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > > > > Sorry I'm late for the party, but I couldn't stop thinking whether we > could avoid the register thing at all, because the registering makes > sys_membarrier() more complex(both for the interface and the > implementation). So how about we trade-off a little bit by taking > some(not all) the rq->locks? > > The idea is in membarrier_private_expedited(), we go through all ->curr > on each CPU and > > 1)If it's a userspace task and its ->mm is matched, we send an ipi > > 2)If it's a kernel task, we skip > > (Because there will be a smp_mb() implied by mmdrop(), when it > switchs to userspace task). > > 3)If it's a userspace task and its ->mm is not matched, we take > the corresponding rq->lock and check rq->curr again, if its ->mm > matched, we send an ipi, otherwise we do nothing. > > (Because if we observe rq->curr is not matched with rq->lock > held, when a task having matched ->mm schedules in, the rq->lock > pairing along with the smp_mb__after_spinlock() will guarantee > it observes all memory ops before sys_membarrir()). > > membarrier_private_expedited() will look like this if we choose this > way: > > void membarrier_private_expedited() > { > int cpu; > bool fallback = false; > cpumask_var_t tmpmask; > struct rq_flags rf; > > > if (num_online_cpus() == 1) > return; > > smp_mb(); > > if (!zalloc_cpumask_var(, GFP_NOWAIT)) { > /* Fallback for OOM. */ > fallback = true; > } > > cpus_read_lock(); > for_each_online_cpu(cpu) { > struct task_struct *p; > > if (cpu == raw_smp_processor_id()) > continue; > > rcu_read_lock(); > p = task_rcu_dereference(_rq(cpu)->curr); > > if (!p) { > rcu_read_unlock(); > continue; > } > > if (p->mm == current->mm) { > if (!fallback) > __cpumask_set_cpu(cpu, tmpmask); > else > smp_call_function_single(cpu, ipi_mb, NULL, 1); > } > > if (p->mm == current->mm || !p->mm) { > rcu_read_unlock(); > continue; > } > > rcu_read_unlock(); > > /* >* This should be a arch-specific code, as we don't >* need it at else place other than some archs without >* a smp_mb() in switch_mm() (i.e. powerpc) >*/ > rq_lock_irq(cpu_rq(cpu), ); > if (p->mm == current->mm) { Oops, this one should be if (cpu_curr(cpu)->mm == current->mm) > if (!fallback) > __cpumask_set_cpu(cpu, tmpmask); > else > smp_call_function_single(cpu, ipi_mb, NULL, 1); , and this better be moved out of the lock rq->lock critical section. Regards, Boqun > } > rq_unlock_irq(cpu_rq(cpu), ); > } > if (!fallback) { > smp_call_function_many(tmpmask, ipi_mb, NULL, 1); > free_cpumask_var(tmpmask); > } > cpus_read_unlock(); > > smp_mb(); > } > > Thoughts? > > Regards, > Boqun > [...] signature.asc Description: PGP signature
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
Hi Mathieu, On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > Provide a new command allowing processes to register their intent to use > the private expedited command. > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > only issue the barrier when scheduling into a task belonging to a > process that has registered to use expedited private. > > Processes are now required to register before using > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > Sorry I'm late for the party, but I couldn't stop thinking whether we could avoid the register thing at all, because the registering makes sys_membarrier() more complex(both for the interface and the implementation). So how about we trade-off a little bit by taking some(not all) the rq->locks? The idea is in membarrier_private_expedited(), we go through all ->curr on each CPU and 1) If it's a userspace task and its ->mm is matched, we send an ipi 2) If it's a kernel task, we skip (Because there will be a smp_mb() implied by mmdrop(), when it switchs to userspace task). 3) If it's a userspace task and its ->mm is not matched, we take the corresponding rq->lock and check rq->curr again, if its ->mm matched, we send an ipi, otherwise we do nothing. (Because if we observe rq->curr is not matched with rq->lock held, when a task having matched ->mm schedules in, the rq->lock pairing along with the smp_mb__after_spinlock() will guarantee it observes all memory ops before sys_membarrir()). membarrier_private_expedited() will look like this if we choose this way: void membarrier_private_expedited() { int cpu; bool fallback = false; cpumask_var_t tmpmask; struct rq_flags rf; if (num_online_cpus() == 1) return; smp_mb(); if (!zalloc_cpumask_var(, GFP_NOWAIT)) { /* Fallback for OOM. */ fallback = true; } cpus_read_lock(); for_each_online_cpu(cpu) { struct task_struct *p; if (cpu == raw_smp_processor_id()) continue; rcu_read_lock(); p = task_rcu_dereference(_rq(cpu)->curr); if (!p) { rcu_read_unlock(); continue; } if (p->mm == current->mm) { if (!fallback) __cpumask_set_cpu(cpu, tmpmask); else smp_call_function_single(cpu, ipi_mb, NULL, 1); } if (p->mm == current->mm || !p->mm) { rcu_read_unlock(); continue; } rcu_read_unlock(); /* * This should be a arch-specific code, as we don't * need it at else place other than some archs without * a smp_mb() in switch_mm() (i.e. powerpc) */ rq_lock_irq(cpu_rq(cpu), ); if (p->mm == current->mm) { if (!fallback) __cpumask_set_cpu(cpu, tmpmask); else smp_call_function_single(cpu, ipi_mb, NULL, 1); } rq_unlock_irq(cpu_rq(cpu), ); } if (!fallback) { smp_call_function_many(tmpmask, ipi_mb, NULL, 1); free_cpumask_var(tmpmask); } cpus_read_unlock(); smp_mb(); } Thoughts? Regards, Boqun > Changes since v1: > - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in > powerpc membarrier_arch_sched_in(), given that we want to specifically > check the next thread state. > - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. > - Use task_thread_info() to pass thread_info from task to > *_ti_thread_flag(). > > Changes since v2: > - Move membarrier_arch_sched_in() call to finish_task_switch(). > - Check for NULL t->mm in membarrier_arch_fork(). > - Use membarrier_sched_in() in generic code, which invokes the > arch-specific membarrier_arch_sched_in(). This fixes allnoconfig > build on PowerPC. > - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing > allnoconfig build on PowerPC. > - Build and runtime tested on PowerPC. > > Signed-off-by: Mathieu Desnoyers> CC: Peter Zijlstra > CC: Paul E. McKenney > CC: Boqun Feng > CC: Andrew Hunter > CC: Maged Michael > CC: gro...@google.com > CC: Avi Kivity > CC: Benjamin Herrenschmidt > CC: Paul Mackerras > CC: Michael
Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command
Hi Mathieu, On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote: > Provide a new command allowing processes to register their intent to use > the private expedited command. > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > only issue the barrier when scheduling into a task belonging to a > process that has registered to use expedited private. > > Processes are now required to register before using > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > Sorry I'm late for the party, but I couldn't stop thinking whether we could avoid the register thing at all, because the registering makes sys_membarrier() more complex(both for the interface and the implementation). So how about we trade-off a little bit by taking some(not all) the rq->locks? The idea is in membarrier_private_expedited(), we go through all ->curr on each CPU and 1) If it's a userspace task and its ->mm is matched, we send an ipi 2) If it's a kernel task, we skip (Because there will be a smp_mb() implied by mmdrop(), when it switchs to userspace task). 3) If it's a userspace task and its ->mm is not matched, we take the corresponding rq->lock and check rq->curr again, if its ->mm matched, we send an ipi, otherwise we do nothing. (Because if we observe rq->curr is not matched with rq->lock held, when a task having matched ->mm schedules in, the rq->lock pairing along with the smp_mb__after_spinlock() will guarantee it observes all memory ops before sys_membarrir()). membarrier_private_expedited() will look like this if we choose this way: void membarrier_private_expedited() { int cpu; bool fallback = false; cpumask_var_t tmpmask; struct rq_flags rf; if (num_online_cpus() == 1) return; smp_mb(); if (!zalloc_cpumask_var(, GFP_NOWAIT)) { /* Fallback for OOM. */ fallback = true; } cpus_read_lock(); for_each_online_cpu(cpu) { struct task_struct *p; if (cpu == raw_smp_processor_id()) continue; rcu_read_lock(); p = task_rcu_dereference(_rq(cpu)->curr); if (!p) { rcu_read_unlock(); continue; } if (p->mm == current->mm) { if (!fallback) __cpumask_set_cpu(cpu, tmpmask); else smp_call_function_single(cpu, ipi_mb, NULL, 1); } if (p->mm == current->mm || !p->mm) { rcu_read_unlock(); continue; } rcu_read_unlock(); /* * This should be a arch-specific code, as we don't * need it at else place other than some archs without * a smp_mb() in switch_mm() (i.e. powerpc) */ rq_lock_irq(cpu_rq(cpu), ); if (p->mm == current->mm) { if (!fallback) __cpumask_set_cpu(cpu, tmpmask); else smp_call_function_single(cpu, ipi_mb, NULL, 1); } rq_unlock_irq(cpu_rq(cpu), ); } if (!fallback) { smp_call_function_many(tmpmask, ipi_mb, NULL, 1); free_cpumask_var(tmpmask); } cpus_read_unlock(); smp_mb(); } Thoughts? Regards, Boqun > Changes since v1: > - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in > powerpc membarrier_arch_sched_in(), given that we want to specifically > check the next thread state. > - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. > - Use task_thread_info() to pass thread_info from task to > *_ti_thread_flag(). > > Changes since v2: > - Move membarrier_arch_sched_in() call to finish_task_switch(). > - Check for NULL t->mm in membarrier_arch_fork(). > - Use membarrier_sched_in() in generic code, which invokes the > arch-specific membarrier_arch_sched_in(). This fixes allnoconfig > build on PowerPC. > - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing > allnoconfig build on PowerPC. > - Build and runtime tested on PowerPC. > > Signed-off-by: Mathieu Desnoyers > CC: Peter Zijlstra > CC: Paul E. McKenney > CC: Boqun Feng > CC: Andrew Hunter > CC: Maged Michael > CC: gro...@google.com > CC: Avi Kivity > CC: Benjamin Herrenschmidt > CC: Paul Mackerras > CC: Michael Ellerman > CC: Dave Watson > CC: Alan Stern > CC: Will Deacon > CC: Andy Lutomirski > CC: linux-a...@vger.kernel.org > --- > MAINTAINERS| 2 ++ > arch/powerpc/Kconfig