[tip: sched/core] sched/membarrier: fix missing local execution of ipi_sync_rq_state()

2021-03-06 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/core branch of tip:

Commit-ID: ce29ddc47b91f97e7f69a0fb7cbb5845f52a9825
Gitweb:
https://git.kernel.org/tip/ce29ddc47b91f97e7f69a0fb7cbb5845f52a9825
Author:Mathieu Desnoyers 
AuthorDate:Wed, 17 Feb 2021 11:56:51 -05:00
Committer: Ingo Molnar 
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/membarrier: fix missing local execution of ipi_sync_rq_state()

The function sync_runqueues_membarrier_state() should copy the
membarrier state from the @mm received as parameter to each runqueue
currently running tasks using that mm.

However, the use of smp_call_function_many() skips the current runqueue,
which is unintended. Replace by a call to on_each_cpu_mask().

Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Reported-by: Nadav Amit 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org # 5.4.x+
Link: https://lore.kernel.org/r/74f1e842-4a84-47bf-b6c2-5407dfdd4...@gmail.com
---
 kernel/sched/membarrier.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index acdae62..b5add64 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -471,9 +471,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct 
*mm)
}
rcu_read_unlock();
 
-   preempt_disable();
-   smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1);
-   preempt_enable();
+   on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true);
 
free_cpumask_var(tmpmask);
cpus_read_unlock();


[tip: sched/urgent] sched/membarrier: fix missing local execution of ipi_sync_rq_state()

2021-03-01 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: fba111913e51a934eaad85734254eab801343836
Gitweb:
https://git.kernel.org/tip/fba111913e51a934eaad85734254eab801343836
Author:Mathieu Desnoyers 
AuthorDate:Wed, 17 Feb 2021 11:56:51 -05:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 01 Mar 2021 11:02:15 +01:00

sched/membarrier: fix missing local execution of ipi_sync_rq_state()

The function sync_runqueues_membarrier_state() should copy the
membarrier state from the @mm received as parameter to each runqueue
currently running tasks using that mm.

However, the use of smp_call_function_many() skips the current runqueue,
which is unintended. Replace by a call to on_each_cpu_mask().

Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Reported-by: Nadav Amit 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: sta...@vger.kernel.org # 5.4.x+
Link: https://lore.kernel.org/r/74f1e842-4a84-47bf-b6c2-5407dfdd4...@gmail.com
---
 kernel/sched/membarrier.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index acdae62..b5add64 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -471,9 +471,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct 
*mm)
}
rcu_read_unlock();
 
-   preempt_disable();
-   smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1);
-   preempt_enable();
+   on_each_cpu_mask(tmpmask, ipi_sync_rq_state, mm, true);
 
free_cpumask_var(tmpmask);
cpus_read_unlock();


[tip: sched/core] sched: fix exit_mm vs membarrier (v4)

2020-10-29 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/core branch of tip:

Commit-ID: 5bc78502322a5e4eef3f1b2a2813751dc6434143
Gitweb:
https://git.kernel.org/tip/5bc78502322a5e4eef3f1b2a2813751dc6434143
Author:Mathieu Desnoyers 
AuthorDate:Tue, 20 Oct 2020 09:47:13 -04:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 29 Oct 2020 11:00:30 +01:00

sched: fix exit_mm vs membarrier (v4)

exit_mm should issue memory barriers after user-space memory accesses,
before clearing current->mm, to order user-space memory accesses
performed prior to exit_mm before clearing tsk->mm, which has the
effect of skipping the membarrier private expedited IPIs.

exit_mm should also update the runqueue's membarrier_state so
membarrier global expedited IPIs are not sent when they are not
needed.

The membarrier system call can be issued concurrently with do_exit
if we have thread groups created with CLONE_VM but not CLONE_THREAD.

Here is the scenario I have in mind:

Two thread groups are created, A and B. Thread group B is created by
issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD.
Let's assume we have a single thread within each thread group (Thread A
and Thread B).

The AFAIU we can have:

Userspace variables:

int x = 0, y = 0;

CPU 0   CPU 1
Thread AThread B
(in thread group A) (in thread group B)

x = 1
barrier()
y = 1
exit()
exit_mm()
current->mm = NULL;
r1 = load y
membarrier()
  skips CPU 0 (no IPI) because its current mm is NULL
r2 = load x
BUG_ON(r1 == 1 && r2 == 0)

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20201020134715.13909-2-mathieu.desnoy...@efficios.com
---
 include/linux/sched/mm.h  |  5 +
 kernel/exit.c | 16 +++-
 kernel/sched/membarrier.c | 12 
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a..a91fb3a 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -347,6 +347,8 @@ static inline void 
membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
 
 extern void membarrier_exec_mmap(struct mm_struct *mm);
 
+extern void membarrier_update_current_mm(struct mm_struct *next_mm);
+
 #else
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
@@ -361,6 +363,9 @@ static inline void membarrier_exec_mmap(struct mm_struct 
*mm)
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct 
*mm)
 {
 }
+static inline void membarrier_update_current_mm(struct mm_struct *next_mm)
+{
+}
 #endif
 
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 87a2d51..a3dd6b3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -475,10 +475,24 @@ static void exit_mm(void)
BUG_ON(mm != current->active_mm);
/* more a memory barrier than a real lock */
task_lock(current);
+   /*
+* When a thread stops operating on an address space, the loop
+* in membarrier_private_expedited() may not observe that
+* tsk->mm, and the loop in membarrier_global_expedited() may
+* not observe a MEMBARRIER_STATE_GLOBAL_EXPEDITED
+* rq->membarrier_state, so those would not issue an IPI.
+* Membarrier requires a memory barrier after accessing
+* user-space memory, before clearing tsk->mm or the
+* rq->membarrier_state.
+*/
+   smp_mb__after_spinlock();
+   local_irq_disable();
current->mm = NULL;
-   mmap_read_unlock(mm);
+   membarrier_update_current_mm(NULL);
enter_lazy_tlb(mm, current);
+   local_irq_enable();
task_unlock(current);
+   mmap_read_unlock(mm);
mm_update_next_owner(mm);
mmput(mm);
if (test_thread_flag(TIF_MEMDIE))
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index e23e74d..aac3292 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -76,6 +76,18 @@ void membarrier_exec_mmap(struct mm_struct *mm)
this_cpu_write(runqueues.membarrier_state, 0);
 }
 
+void membarrier_update_current_mm(struct mm_struct *next_mm)
+{
+   struct rq *rq = this_rq();
+   int membarrier_state = 0;
+
+   if (next_mm)
+   membarrier_state = atomic_read(&next_mm->membarrier_state);
+   if (READ_ONCE(rq->membarrier_state) == membarrier_state)
+   return;
+   WRITE_ONCE(rq->membarrier_state, membarrier_state);
+}
+
 static int membarrier_global_expedited(void)
 {
int cpu;


[tip: sched/core] sched: membarrier: cover kthread_use_mm (v4)

2020-10-29 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/core branch of tip:

Commit-ID: 618758ed3a4f7d790414d020b362111748ebbf9f
Gitweb:
https://git.kernel.org/tip/618758ed3a4f7d790414d020b362111748ebbf9f
Author:Mathieu Desnoyers 
AuthorDate:Tue, 20 Oct 2020 09:47:14 -04:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 29 Oct 2020 11:00:31 +01:00

sched: membarrier: cover kthread_use_mm (v4)

Add comments and memory barrier to kthread_use_mm and kthread_unuse_mm
to allow the effect of membarrier(2) to apply to kthreads accessing
user-space memory as well.

Given that no prior kthread use this guarantee and that it only affects
kthreads, adding this guarantee does not affect user-space ABI.

Refine the check in membarrier_global_expedited to exclude runqueues
running the idle thread rather than all kthreads from the IPI cpumask.

Now that membarrier_global_expedited can IPI kthreads, the scheduler
also needs to update the runqueue's membarrier_state when entering lazy
TLB state.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20201020134715.13909-3-mathieu.desnoy...@efficios.com
---
 kernel/kthread.c  | 21 +
 kernel/sched/idle.c   |  1 +
 kernel/sched/membarrier.c |  7 +++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index e29773c..481428f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1248,6 +1248,7 @@ void kthread_use_mm(struct mm_struct *mm)
tsk->active_mm = mm;
}
tsk->mm = mm;
+   membarrier_update_current_mm(mm);
switch_mm_irqs_off(active_mm, mm, tsk);
local_irq_enable();
task_unlock(tsk);
@@ -1255,8 +1256,19 @@ void kthread_use_mm(struct mm_struct *mm)
finish_arch_post_lock_switch();
 #endif
 
+   /*
+* When a kthread starts operating on an address space, the loop
+* in membarrier_{private,global}_expedited() may not observe
+* that tsk->mm, and not issue an IPI. Membarrier requires a
+* memory barrier after storing to tsk->mm, before accessing
+* user-space memory. A full memory barrier for membarrier
+* {PRIVATE,GLOBAL}_EXPEDITED is implicitly provided by
+* mmdrop(), or explicitly with smp_mb().
+*/
if (active_mm != mm)
mmdrop(active_mm);
+   else
+   smp_mb();
 
to_kthread(tsk)->oldfs = force_uaccess_begin();
 }
@@ -1276,9 +1288,18 @@ void kthread_unuse_mm(struct mm_struct *mm)
force_uaccess_end(to_kthread(tsk)->oldfs);
 
task_lock(tsk);
+   /*
+* When a kthread stops operating on an address space, the loop
+* in membarrier_{private,global}_expedited() may not observe
+* that tsk->mm, and not issue an IPI. Membarrier requires a
+* memory barrier after accessing user-space memory, before
+* clearing tsk->mm.
+*/
+   smp_mb__after_spinlock();
sync_mm_rss(mm);
local_irq_disable();
tsk->mm = NULL;
+   membarrier_update_current_mm(NULL);
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
local_irq_enable();
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 24d0ee2..846743e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -338,6 +338,7 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
WARN_ON_ONCE(!(current->flags & PF_KTHREAD));
WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY));
WARN_ON_ONCE(!duration_ns);
+   WARN_ON_ONCE(current->mm);
 
rcu_sleep_check();
preempt_disable();
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index aac3292..f223f35 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -126,12 +126,11 @@ static int membarrier_global_expedited(void)
continue;
 
/*
-* Skip the CPU if it runs a kernel thread. The scheduler
-* leaves the prior task mm in place as an optimization when
-* scheduling a kthread.
+* Skip the CPU if it runs a kernel thread which is not using
+* a task mm.
 */
p = rcu_dereference(cpu_rq(cpu)->curr);
-   if (p->flags & PF_KTHREAD)
+   if (!p->mm)
continue;
 
__cpumask_set_cpu(cpu, tmpmask);


[tip: sched/core] sched: membarrier: document memory ordering scenarios

2020-10-29 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/core branch of tip:

Commit-ID: 25595eb6aaa9fbb31330f1e0b400642694bc6574
Gitweb:
https://git.kernel.org/tip/25595eb6aaa9fbb31330f1e0b400642694bc6574
Author:Mathieu Desnoyers 
AuthorDate:Tue, 20 Oct 2020 09:47:15 -04:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 29 Oct 2020 11:00:31 +01:00

sched: membarrier: document memory ordering scenarios

Document membarrier ordering scenarios in membarrier.c. Thanks to Alan
Stern for refreshing my memory. Now that I have those in mind, it seems
appropriate to serialize them to comments for posterity.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20201020134715.13909-4-mathieu.desnoy...@efficios.com
---
 kernel/sched/membarrier.c | 128 +-
 1 file changed, 128 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index f223f35..5a40b38 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -7,6 +7,134 @@
 #include "sched.h"
 
 /*
+ * For documentation purposes, here are some membarrier ordering
+ * scenarios to keep in mind:
+ *
+ * A) Userspace thread execution after IPI vs membarrier's memory
+ *barrier before sending the IPI
+ *
+ * Userspace variables:
+ *
+ * int x = 0, y = 0;
+ *
+ * The memory barrier at the start of membarrier() on CPU0 is necessary in
+ * order to enforce the guarantee that any writes occurring on CPU0 before
+ * the membarrier() is executed will be visible to any code executing on
+ * CPU1 after the IPI-induced memory barrier:
+ *
+ * CPU0  CPU1
+ *
+ * x = 1
+ * membarrier():
+ *   a: smp_mb()
+ *   b: send IPI   IPI-induced mb
+ *   c: smp_mb()
+ * r2 = y
+ *   y = 1
+ *   barrier()
+ *   r1 = x
+ *
+ * BUG_ON(r1 == 0 && r2 == 0)
+ *
+ * The write to y and load from x by CPU1 are unordered by the hardware,
+ * so it's possible to have "r1 = x" reordered before "y = 1" at any
+ * point after (b).  If the memory barrier at (a) is omitted, then "x = 1"
+ * can be reordered after (a) (although not after (c)), so we get r1 == 0
+ * and r2 == 0.  This violates the guarantee that membarrier() is
+ * supposed by provide.
+ *
+ * The timing of the memory barrier at (a) has to ensure that it executes
+ * before the IPI-induced memory barrier on CPU1.
+ *
+ * B) Userspace thread execution before IPI vs membarrier's memory
+ *barrier after completing the IPI
+ *
+ * Userspace variables:
+ *
+ * int x = 0, y = 0;
+ *
+ * The memory barrier at the end of membarrier() on CPU0 is necessary in
+ * order to enforce the guarantee that any writes occurring on CPU1 before
+ * the membarrier() is executed will be visible to any code executing on
+ * CPU0 after the membarrier():
+ *
+ * CPU0  CPU1
+ *
+ *   x = 1
+ *   barrier()
+ *   y = 1
+ * r2 = y
+ * membarrier():
+ *   a: smp_mb()
+ *   b: send IPI   IPI-induced mb
+ *   c: smp_mb()
+ * r1 = x
+ * BUG_ON(r1 == 0 && r2 == 1)
+ *
+ * The writes to x and y are unordered by the hardware, so it's possible to
+ * have "r2 = 1" even though the write to x doesn't execute until (b).  If
+ * the memory barrier at (c) is omitted then "r1 = x" can be reordered
+ * before (b) (although not before (a)), so we get "r1 = 0".  This violates
+ * the guarantee that membarrier() is supposed to provide.
+ *
+ * The timing of the memory barrier at (c) has to ensure that it executes
+ * after the IPI-induced memory barrier on CPU1.
+ *
+ * C) Scheduling userspace thread -> kthread -> userspace thread vs membarrier
+ *
+ *   CPU0CPU1
+ *
+ *   membarrier():
+ *   a: smp_mb()
+ *   d: switch to kthread (includes mb)
+ *   b: read rq->curr->mm == NULL
+ *   e: switch to user (includes mb)
+ *   c: smp_mb()
+ *
+ * Using the scenario from (A), we can show that (a) needs to be paired
+ * with (e). Using the scenario from (B), we can show that (c) needs to
+ * be paired with (d).
+ *
+ * D) exit_mm vs membarrier
+ *
+ * Two thread groups are created, A and B.  Thread group B is created by
+ * issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD.
+ * Let's assume we have a single thread within each thread group (Thread A
+ * and Thread B).  Thread A runs on CPU0, Thread B runs on CPU1.
+ *
+ *   CPU0CPU1
+ *
+ *   membarrier():
+ *

[tip: sched/urgent] sched: Fix unreliable rseq cpu_id for new tasks

2020-07-08 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db
Gitweb:
https://git.kernel.org/tip/ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db
Author:Mathieu Desnoyers 
AuthorDate:Mon, 06 Jul 2020 16:49:10 -04:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 08 Jul 2020 11:38:50 +02:00

sched: Fix unreliable rseq cpu_id for new tasks

While integrating rseq into glibc and replacing glibc's sched_getcpu
implementation with rseq, glibc's tests discovered an issue with
incorrect __rseq_abi.cpu_id field value right after the first time
a newly created process issues sched_setaffinity.

For the records, it triggers after building glibc and running tests, and
then issuing:

  for x in {1..2000} ; do posix/tst-affinity-static  & done

and shows up as:

error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0

This is caused by the scheduler invoking __set_task_cpu() directly from
sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
is done by set_task_cpu().

Add the missing rseq_migrate() to both functions. The only other direct
use of __set_task_cpu() is done by init_idle(), which does not involve a
user-space task.

Based on my testing with the glibc test-case, just adding rseq_migrate()
to wake_up_new_task() is sufficient to fix the observed issue. Also add
it to sched_fork() to keep things consistent.

The reason why this never triggered so far with the rseq/basic_test
selftest is unclear.

The current use of sched_getcpu(3) does not typically require it to be
always accurate. However, use of the __rseq_abi.cpu_id field within rseq
critical sections requires it to be accurate. If it is not accurate, it
can cause corruption in the per-cpu data targeted by rseq critical
sections in user-space.

Reported-By: Florian Weimer 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Tested-By: Florian Weimer 
Cc: sta...@vger.kernel.org # v4.18+
Link: 
https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoy...@efficios.com
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 950ac45..e15543c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2965,6 +2965,7 @@ int sched_fork(unsigned long clone_flags, struct 
task_struct *p)
 * Silence PROVE_RCU.
 */
raw_spin_lock_irqsave(&p->pi_lock, flags);
+   rseq_migrate(p);
/*
 * We're setting the CPU for the first time, we don't migrate,
 * so use __set_task_cpu().
@@ -3029,6 +3030,7 @@ void wake_up_new_task(struct task_struct *p)
 * as we're not fully set-up yet.
 */
p->recent_used_cpu = task_cpu(p);
+   rseq_migrate(p);
__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
rq = __task_rq_lock(p, &rf);


[tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
Gitweb:
https://git.kernel.org/tip/227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:02 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Fix p->mm->membarrier_state racy load

The membarrier_state field is located within the mm_struct, which
is not guaranteed to exist when used from runqueue-lock-free iteration
on runqueues by the membarrier system call.

Copy the membarrier_state from the mm_struct into the scheduler runqueue
when the scheduler switches between mm.

When registering membarrier for mm, after setting the registration bit
in the mm membarrier state, issue a synchronize_rcu() to ensure the
scheduler observes the change. In order to take care of the case
where a runqueue keeps executing the target mm without swapping to
other mm, iterate over each runqueue and issue an IPI to copy the
membarrier_state from the mm_struct into each runqueue which have the
same mm which state has just been modified.

Move the mm membarrier_state field closer to pgd in mm_struct to use
a cache line already touched by the scheduler switch_mm.

The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
clear the runqueue's membarrier state in addition to clear the mm
membarrier state, so move its implementation into the scheduler
membarrier code so it can access the runqueue structure.

Add memory barrier in membarrier_exec_mmap() prior to clearing
the membarrier state, ensuring memory accesses executed prior to exec
are not reordered with the stores clearing the membarrier state.

As suggested by Linus, move all membarrier.c RCU read-side locks outside
of the for each cpu loops.

Suggested-by: Linus Torvalds 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 fs/exec.c |   2 +-
 include/linux/mm_types.h  |  14 ++-
 include/linux/sched/mm.h  |   8 +--
 kernel/sched/core.c   |   4 +-
 kernel/sched/membarrier.c | 175 +++--
 kernel/sched/sched.h  |  34 +++-
 6 files changed, 183 insertions(+), 54 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index f7f6a14..555e93c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1033,6 +1033,7 @@ static int exec_mmap(struct mm_struct *mm)
}
task_lock(tsk);
active_mm = tsk->active_mm;
+   membarrier_exec_mmap(mm);
tsk->mm = mm;
tsk->active_mm = mm;
activate_mm(active_mm, mm);
@@ -1825,7 +1826,6 @@ static int __do_execve_file(int fd, struct filename 
*filename,
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
-   membarrier_execve(current);
rseq_execve(current);
acct_update_integrals(current);
task_numa_free(current, false);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6a7a108..ec9bd3a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -383,6 +383,16 @@ struct mm_struct {
unsigned long highest_vm_end;   /* highest vma end address */
pgd_t * pgd;
 
+#ifdef CONFIG_MEMBARRIER
+   /**
+* @membarrier_state: Flags controlling membarrier behavior.
+*
+* This field is close to @pgd to hopefully fit in the same
+* cache-line, which needs to be touched by switch_mm().
+*/
+   atomic_t membarrier_state;
+#endif
+
/**
 * @mm_users: The number of users including userspace.
 *
@@ -452,9 +462,7 @@ struct mm_struct {
unsigned long flags; /* Must use atomic bitops to access */
 
struct core_state *core_state; /* coredumping support */
-#ifdef CONFIG_MEMBARRIER
-   atomic_t membarrier_state;
-#endif
+
 #ifdef CONFIG_AIO
spinlock_t  ioctx_lock;
struct kioctx_table __rcu   *ioctx_table;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 8557ec6..e677001 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -370,10 +370,8 @@ static inline void 
membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
sync_core_before_usermode();
 }
 
-static inline void membarrier_execve(struct task_struct *t)
-{
-   atomic_set(&t->mm->membarrier_state, 0);
-}
+extern void membarrier_exec_mmap(struct mm_struct *mm);
+
 #else
 #ifdef CONFIG_ARCH_HAS_M

[tip: sched/urgent] selftests, sched/membarrier: Add multi-threaded test

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 19a4ff534bb09686f53800564cb977bad2177c00
Gitweb:
https://git.kernel.org/tip/19a4ff534bb09686f53800564cb977bad2177c00
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:03 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

selftests, sched/membarrier: Add multi-threaded test

membarrier commands cover very different code paths if they are in
a single-threaded vs multi-threaded process. Therefore, exercise both
scenarios in the kernel selftests to increase coverage of this selftest.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Shuah Khan 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-6-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 tools/testing/selftests/membarrier/.gitignore  |   3 +-
 tools/testing/selftests/membarrier/Makefile|   5 +-
 tools/testing/selftests/membarrier/membarrier_test.c   | 313 
+-
 tools/testing/selftests/membarrier/membarrier_test_impl.h  | 317 
++-
 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c  |  73 
-
 tools/testing/selftests/membarrier/membarrier_test_single_thread.c |  24 +-
 6 files changed, 419 insertions(+), 316 deletions(-)
 delete mode 100644 tools/testing/selftests/membarrier/membarrier_test.c
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_impl.h
 create mode 100644 
tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
 create mode 100644 
tools/testing/selftests/membarrier/membarrier_test_single_thread.c

diff --git a/tools/testing/selftests/membarrier/.gitignore 
b/tools/testing/selftests/membarrier/.gitignore
index 020c44f..f2f7ec0 100644
--- a/tools/testing/selftests/membarrier/.gitignore
+++ b/tools/testing/selftests/membarrier/.gitignore
@@ -1 +1,2 @@
-membarrier_test
+membarrier_test_multi_thread
+membarrier_test_single_thread
diff --git a/tools/testing/selftests/membarrier/Makefile 
b/tools/testing/selftests/membarrier/Makefile
index 97e3bdf..34d1c81 100644
--- a/tools/testing/selftests/membarrier/Makefile
+++ b/tools/testing/selftests/membarrier/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -g -I../../../../usr/include/
+LDLIBS += -lpthread
 
-TEST_GEN_PROGS := membarrier_test
+TEST_GEN_PROGS := membarrier_test_single_thread \
+   membarrier_test_multi_thread
 
 include ../lib.mk
-
diff --git a/tools/testing/selftests/membarrier/membarrier_test.c 
b/tools/testing/selftests/membarrier/membarrier_test.c
deleted file mode 100644
index 70b4ddb..000
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ /dev/null
@@ -1,313 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#define _GNU_SOURCE
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "../kselftest.h"
-
-static int sys_membarrier(int cmd, int flags)
-{
-   return syscall(__NR_membarrier, cmd, flags);
-}
-
-static int test_membarrier_cmd_fail(void)
-{
-   int cmd = -1, flags = 0;
-   const char *test_name = "sys membarrier invalid command";
-
-   if (sys_membarrier(cmd, flags) != -1) {
-   ksft_exit_fail_msg(
-   "%s test: command = %d, flags = %d. Should fail, but 
passed\n",
-   test_name, cmd, flags);
-   }
-   if (errno != EINVAL) {
-   ksft_exit_fail_msg(
-   "%s test: flags = %d. Should return (%d: \"%s\"), but 
returned (%d: \"%s\").\n",
-   test_name, flags, EINVAL, strerror(EINVAL),
-   errno, strerror(errno));
-   }
-
-   ksft_test_result_pass(
-   "%s test: command = %d, flags = %d, errno = %d. Failed as 
expected\n",
-   test_name, cmd, flags, errno);
-   return 0;
-}
-
-static int test_membarrier_flags_fail(void)
-{
-   int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
-   const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid 
flags";
-
-   if (sys_membarrier(cmd, flags) != -1) {
-   ksft_exit_fail_msg(
-   "%s test: flags = %d. Should fail, but passed\n",
-   test_name, flags);
-   }
-   if (errno != EINVAL) {
-   ksft_exit_fail_msg(
-   "%s test: flags = %d. Should return (%d: \"%s\"), but 
returned (%d: \"%s\").\n",
-   test_name, flags, EINVAL, strerror(EINVAL),
-   errno, strerror(errno)

[tip: sched/urgent] sched/membarrier: Call sync_core only before usermode for same mm

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 2840cf02fae627860156737e83326df354ee4ec6
Gitweb:
https://git.kernel.org/tip/2840cf02fae627860156737e83326df354ee4ec6
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:01 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Call sync_core only before usermode for same mm

When the prev and next task's mm change, switch_mm() provides the core
serializing guarantees before returning to usermode. The only case
where an explicit core serialization is needed is when the scheduler
keeps the same mm for prev and next.

Suggested-by: Oleg Nesterov 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 include/linux/sched/mm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4a79440..8557ec6 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -362,6 +362,8 @@ enum {
 
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct 
*mm)
 {
+   if (current->mm != mm)
+   return;
if (likely(!(atomic_read(&mm->membarrier_state) &
 MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
return;


[tip: sched/urgent] sched/membarrier: Skip IPIs when mm->mm_users == 1

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: c6d68c1c4a4d6611fc0f8145d764226571d737ca
Gitweb:
https://git.kernel.org/tip/c6d68c1c4a4d6611fc0f8145d764226571d737ca
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:04 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

sched/membarrier: Skip IPIs when mm->mm_users == 1

If there is only a single mm_user for the mm, the private expedited
membarrier command can skip the IPIs, because only a single thread
is using the mm.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-7-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/membarrier.c |  9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 070cf43..fced54a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -145,20 +145,21 @@ static int membarrier_private_expedited(int flags)
int cpu;
bool fallback = false;
cpumask_var_t tmpmask;
+   struct mm_struct *mm = current->mm;
 
if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
-   if (!(atomic_read(¤t->mm->membarrier_state) &
+   if (!(atomic_read(&mm->membarrier_state) &
  MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
} else {
-   if (!(atomic_read(¤t->mm->membarrier_state) &
+   if (!(atomic_read(&mm->membarrier_state) &
  MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
return -EPERM;
}
 
-   if (num_online_cpus() == 1)
+   if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
return 0;
 
/*
@@ -194,7 +195,7 @@ static int membarrier_private_expedited(int flags)
continue;
rcu_read_lock();
p = rcu_dereference(cpu_rq(cpu)->curr);
-   if (p && p->mm == current->mm) {
+   if (p && p->mm == mm) {
if (!fallback)
__cpumask_set_cpu(cpu, tmpmask);
else


[tip: sched/urgent] sched/membarrier: Remove redundant check

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 09554009c0cad4cb2223dd943c813c9257c6883a
Gitweb:
https://git.kernel.org/tip/09554009c0cad4cb2223dd943c813c9257c6883a
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:00 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Remove redundant check

Checking that the number of threads is 1 is redundant with checking
mm_users == 1.

No change in functionality intended.

Suggested-by: Oleg Nesterov 
Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-3-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/membarrier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index d48b95f..7ccbd0e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -186,7 +186,7 @@ static int membarrier_register_global_expedited(void)
MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
return 0;
atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
-   if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+   if (atomic_read(&mm->mm_users) == 1) {
/*
 * For single mm user, single threaded process, we can
 * simply issue a memory barrier after setting
@@ -232,7 +232,7 @@ static int membarrier_register_private_expedited(int flags)
if (flags & MEMBARRIER_FLAG_SYNC_CORE)
atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
  &mm->membarrier_state);
-   if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+   if (atomic_read(&mm->mm_users) != 1) {
/*
 * Ensure all future scheduler executions will observe the
 * new thread flag state for this process.


[tip: sched/urgent] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: c172e0a3e8e65a4c6fffec5bc4d6de08d6f894f7
Gitweb:
https://git.kernel.org/tip/c172e0a3e8e65a4c6fffec5bc4d6de08d6f894f7
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:37:05 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

sched/membarrier: Return -ENOMEM to userspace on memory allocation failure

Remove the IPI fallback code from membarrier to deal with very
infrequent cpumask memory allocation failure. Use GFP_KERNEL rather
than GFP_NOWAIT, and relax the blocking guarantees for the expedited
membarrier system call commands, allowing it to block if waiting for
memory to be made available.

In addition, now -ENOMEM can be returned to user-space if the cpumask
memory allocation fails.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-8-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/membarrier.c | 63 --
 1 file changed, 20 insertions(+), 43 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index fced54a..a39bed2 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -66,7 +66,6 @@ void membarrier_exec_mmap(struct mm_struct *mm)
 static int membarrier_global_expedited(void)
 {
int cpu;
-   bool fallback = false;
cpumask_var_t tmpmask;
 
if (num_online_cpus() == 1)
@@ -78,15 +77,8 @@ static int membarrier_global_expedited(void)
 */
smp_mb();   /* system call entry is not a mb. */
 
-   /*
-* Expedited membarrier commands guarantee that they won't
-* block, hence the GFP_NOWAIT allocation flag and fallback
-* implementation.
-*/
-   if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-   /* Fallback for OOM. */
-   fallback = true;
-   }
+   if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+   return -ENOMEM;
 
cpus_read_lock();
rcu_read_lock();
@@ -117,18 +109,15 @@ static int membarrier_global_expedited(void)
if (p->flags & PF_KTHREAD)
continue;
 
-   if (!fallback)
-   __cpumask_set_cpu(cpu, tmpmask);
-   else
-   smp_call_function_single(cpu, ipi_mb, NULL, 1);
+   __cpumask_set_cpu(cpu, tmpmask);
}
rcu_read_unlock();
-   if (!fallback) {
-   preempt_disable();
-   smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-   preempt_enable();
-   free_cpumask_var(tmpmask);
-   }
+
+   preempt_disable();
+   smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+   preempt_enable();
+
+   free_cpumask_var(tmpmask);
cpus_read_unlock();
 
/*
@@ -143,7 +132,6 @@ static int membarrier_global_expedited(void)
 static int membarrier_private_expedited(int flags)
 {
int cpu;
-   bool fallback = false;
cpumask_var_t tmpmask;
struct mm_struct *mm = current->mm;
 
@@ -168,15 +156,8 @@ static int membarrier_private_expedited(int flags)
 */
smp_mb();   /* system call entry is not a mb. */
 
-   /*
-* Expedited membarrier commands guarantee that they won't
-* block, hence the GFP_NOWAIT allocation flag and fallback
-* implementation.
-*/
-   if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-   /* Fallback for OOM. */
-   fallback = true;
-   }
+   if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+   return -ENOMEM;
 
cpus_read_lock();
rcu_read_lock();
@@ -195,20 +176,16 @@ static int membarrier_private_expedited(int flags)
continue;
rcu_read_lock();
p = rcu_dereference(cpu_rq(cpu)->curr);
-   if (p && p->mm == mm) {
-   if (!fallback)
-   __cpumask_set_cpu(cpu, tmpmask);
-   else
-   smp_call_function_single(cpu, ipi_mb, NULL, 1);
-   }
+   if (p && p->mm == mm)
+   __cpumask_set_cpu(cpu, tmpmask);
}
rcu_read_unlock();
-   if (!fallback) {
-   preempt_disable();
-   smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-   preempt_enable();
-   free_cpumask_var(tmpmask);
-   }
+
+   preempt_disable();
+   smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+   preempt_enable();
+
+

[tip: sched/urgent] sched/membarrier: Fix private expedited registration check

2019-09-27 Thread tip-bot2 for Mathieu Desnoyers
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: fc0d77387cb5ae883fd774fc559e056a8dde024c
Gitweb:
https://git.kernel.org/tip/fc0d77387cb5ae883fd774fc559e056a8dde024c
Author:Mathieu Desnoyers 
AuthorDate:Thu, 19 Sep 2019 13:36:59 -04:00
Committer: Ingo Molnar 
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Fix private expedited registration check

Fix a logic flaw in the way membarrier_register_private_expedited()
handles ready state checks for private expedited sync core and private
expedited registrations.

If a private expedited membarrier registration is first performed, and
then a private expedited sync_core registration is performed, the ready
state check will skip the second registration when it really should not.

Signed-off-by: Mathieu Desnoyers 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Eric W. Biederman 
Cc: Kirill Tkhai 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Oleg Nesterov 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Russell King - ARM Linux admin 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoy...@efficios.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/membarrier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b14250a..d48b95f 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -226,7 +226,7 @@ static int membarrier_register_private_expedited(int flags)
 * groups, which use the same mm. (CLONE_VM but not
 * CLONE_THREAD).
 */
-   if (atomic_read(&mm->membarrier_state) & state)
+   if ((atomic_read(&mm->membarrier_state) & state) == state)
return 0;
atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
if (flags & MEMBARRIER_FLAG_SYNC_CORE)