On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote: > On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <pet...@infradead.org> > wrote: > > > > > For SMP systems using IPI based TLB invalidation, looking at > > current->active_mm is entirely reasonable. This then presents the > > following race condition: > > > > > > CPU0 CPU1 > > > > flush_tlb_mm(mm) use_mm(mm) > > <send-IPI> > > tsk->active_mm = mm; > > <IPI> > > if (tsk->active_mm == mm) > > // flush TLBs > > </IPI> > > switch_mm(old_mm,mm,tsk); > > > > > > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, > > because the IPI lands before we actually switched. > > > > Avoid this by disabling IRQs across changing ->active_mm and > > switch_mm(). > > > > [ There are all sorts of reasons this might be harmless for various > > architecture specific reasons, but best not leave the door open at > > all. ] > > Can we give the -stable maintainers (and others) more explanation of > why they might choose to merge this?
Like so then? --- Subject: mm: Fix kthread_use_mm() vs TLB invalidate From: Peter Zijlstra <pet...@infradead.org> Date: Tue, 11 Feb 2020 10:25:19 +0100 For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). Of the (SMP) architectures that have IPI based TLB invalidate: Alpha - checks active_mm ARC - ASID specific IA64 - checks active_mm MIPS - ASID specific flush OpenRISC - shoots down world PARISC - shoots down world SH - ASID specific SPARC - ASID specific x86 - N/A xtensa - checks active_mm So at the very least Alpha, IA64 and Xtensa are suspect. On top of this, for scheduler consistency we need at least preemption disabled across changing tsk->mm and doing switch_mm(), which is currently provided by task_lock(), but that's not sufficient for PREEMPT_RT. Reported-by: Andy Lutomirski <l...@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> Cc: sta...@kernel.org --- kernel/kthread.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm WARN_ON_ONCE(tsk->mm); task_lock(tsk); + /* + * Serialize the tsk->mm store and switch_mm() against TLB invalidation + * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race + * against the scheduler writing to these variables. + */ + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct * task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(kthread_unuse_mm);