On Tue, Jul 21, 2020 at 02:06:23PM -0700, Andrew Morton wrote:
> On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <pet...@infradead.org> 
> wrote:
> 
> > 
> > For SMP systems using IPI based TLB invalidation, looking at
> > current->active_mm is entirely reasonable. This then presents the
> > following race condition:
> > 
> > 
> >   CPU0                      CPU1
> > 
> >   flush_tlb_mm(mm)  use_mm(mm)
> >     <send-IPI>
> >                       tsk->active_mm = mm;
> >                       <IPI>
> >                         if (tsk->active_mm == mm)
> >                           // flush TLBs
> >                       </IPI>
> >                       switch_mm(old_mm,mm,tsk);
> > 
> > 
> > Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> > because the IPI lands before we actually switched.
> > 
> > Avoid this by disabling IRQs across changing ->active_mm and
> > switch_mm().
> > 
> > [ There are all sorts of reasons this might be harmless for various
> > architecture specific reasons, but best not leave the door open at
> > all. ]
> 
> Can we give the -stable maintainers (and others) more explanation of
> why they might choose to merge this?

Like so then?

---
Subject: mm: Fix kthread_use_mm() vs TLB invalidate
From: Peter Zijlstra <pet...@infradead.org>
Date: Tue, 11 Feb 2020 10:25:19 +0100

For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:


  CPU0                  CPU1

  flush_tlb_mm(mm)      use_mm(mm)
    <send-IPI>
                          tsk->active_mm = mm;
                          <IPI>
                            if (tsk->active_mm == mm)
                              // flush TLBs
                          </IPI>
                          switch_mm(old_mm,mm,tsk);


Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.

Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().

Of the (SMP) architectures that have IPI based TLB invalidate:

  Alpha    - checks active_mm
  ARC      - ASID specific
  IA64     - checks active_mm
  MIPS     - ASID specific flush
  OpenRISC - shoots down world
  PARISC   - shoots down world
  SH       - ASID specific
  SPARC    - ASID specific
  x86      - N/A
  xtensa   - checks active_mm

So at the very least Alpha, IA64 and Xtensa are suspect.

On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.

Reported-by: Andy Lutomirski <l...@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: sta...@kernel.org
---
 kernel/kthread.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1241,13 +1241,20 @@ void kthread_use_mm(struct mm_struct *mm
        WARN_ON_ONCE(tsk->mm);
 
        task_lock(tsk);
+       /*
+        * Serialize the tsk->mm store and switch_mm() against TLB invalidation
+        * IPIs. Also make sure we're non-preemptible on PREEMPT_RT to not race
+        * against the scheduler writing to these variables.
+        */
+       local_irq_disable();
        active_mm = tsk->active_mm;
        if (active_mm != mm) {
                mmgrab(mm);
                tsk->active_mm = mm;
        }
        tsk->mm = mm;
-       switch_mm(active_mm, mm, tsk);
+       switch_mm_irqs_off(active_mm, mm, tsk);
+       local_irq_enable();
        task_unlock(tsk);
 #ifdef finish_arch_post_lock_switch
        finish_arch_post_lock_switch();
@@ -1276,9 +1283,11 @@ void kthread_unuse_mm(struct mm_struct *
 
        task_lock(tsk);
        sync_mm_rss(mm);
+       local_irq_disable();
        tsk->mm = NULL;
        /* active_mm is still 'mm' */
        enter_lazy_tlb(mm, tsk);
+       local_irq_enable();
        task_unlock(tsk);
 }
 EXPORT_SYMBOL_GPL(kthread_unuse_mm);

Reply via email to