On Tue, 21 Jul 2020 17:41:06 +0200 Peter Zijlstra <pet...@infradead.org> wrote:

> 
> For SMP systems using IPI based TLB invalidation, looking at
> current->active_mm is entirely reasonable. This then presents the
> following race condition:
> 
> 
>   CPU0                        CPU1
> 
>   flush_tlb_mm(mm)    use_mm(mm)
>     <send-IPI>
>                         tsk->active_mm = mm;
>                         <IPI>
>                           if (tsk->active_mm == mm)
>                             // flush TLBs
>                         </IPI>
>                         switch_mm(old_mm,mm,tsk);
> 
> 
> Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
> because the IPI lands before we actually switched.
> 
> Avoid this by disabling IRQs across changing ->active_mm and
> switch_mm().
> 
> [ There are all sorts of reasons this might be harmless for various
> architecture specific reasons, but best not leave the door open at
> all. ]

Can we give the -stable maintainers (and others) more explanation of
why they might choose to merge this?

> ...
>
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -1241,13 +1241,15 @@ void kthread_use_mm(struct mm_struct *mm)
>       WARN_ON_ONCE(tsk->mm);
>  
>       task_lock(tsk);
> +     local_irq_disable();

A bare local_irq_disable() is one of those "what the heck is this
protecting" things.  It's the new lock_kernel().

So a little comment will help readers to understand why we did it. 
Something like this?

--- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate-fix
+++ a/kernel/kthread.c
@@ -1239,6 +1239,7 @@ void kthread_use_mm(struct mm_struct *mm
        WARN_ON_ONCE(tsk->mm);
 
        task_lock(tsk);
+       /* Hold off tlb flush IPIs while switching mm's */
        local_irq_disable();
        active_mm = tsk->active_mm;
        if (active_mm != mm) {
_

>       active_mm = tsk->active_mm;
>       if (active_mm != mm) {
>               mmgrab(mm);
>               tsk->active_mm = mm;
>       }
>       tsk->mm = mm;
> -     switch_mm(active_mm, mm, tsk);
> +     switch_mm_irqs_off(active_mm, mm, tsk);
> +     local_irq_enable();
>       task_unlock(tsk);
>  #ifdef finish_arch_post_lock_switch
>       finish_arch_post_lock_switch();
>
> ...
>

Reply via email to