Re: [PATCH 4.4 1/3] futex: Change locking rules

2021-03-09 Thread Greg KH
On Tue, Mar 09, 2021 at 11:06:03AM +0800, Zheng Yejian wrote:
> From: Peter Zijlstra 
> 
> Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
> creates another set of problems, especially priority inversions on RT where
> hb->lock becomes a rt_mutex itself.
> 
> The rt_mutex::wait_lock is the most obvious protection for keeping the
> futex user space value and the kernel internal pi_state in sync.
> 
> Rework and document the locking so rt_mutex::wait_lock is held accross all
> operations which modify the user space value and the pi state.
> 
> This allows to invoke rt_mutex_unlock() (including deboost) without holding
> hb->lock as a next step.
> 
> Nothing yet relies on the new locking rules.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: juri.le...@arm.com
> Cc: bige...@linutronix.de
> Cc: xlp...@redhat.com
> Cc: rost...@goodmis.org
> Cc: mathieu.desnoy...@efficios.com
> Cc: jdesfos...@efficios.com
> Cc: dvh...@infradead.org
> Cc: bris...@redhat.com
> Link: http://lkml.kernel.org/r/20170322104151.751993...@infradead.org
> Signed-off-by: Thomas Gleixner 
> [Lee: Back-ported in support of a previous futex back-port attempt]
> Signed-off-by: Lee Jones 
> Signed-off-by: Greg Kroah-Hartman 
> Signed-off-by: Zheng Yejian 
> ---
>  kernel/futex.c | 138 +++--
>  1 file changed, 112 insertions(+), 26 deletions(-)

What is the git commit id of this patch in Linus's tree?

thanks,

greg k-h


[PATCH 4.4 1/3] futex: Change locking rules

2021-03-08 Thread Zheng Yejian
From: Peter Zijlstra 

Currently futex-pi relies on hb->lock to serialize everything. But hb->lock
creates another set of problems, especially priority inversions on RT where
hb->lock becomes a rt_mutex itself.

The rt_mutex::wait_lock is the most obvious protection for keeping the
futex user space value and the kernel internal pi_state in sync.

Rework and document the locking so rt_mutex::wait_lock is held accross all
operations which modify the user space value and the pi state.

This allows to invoke rt_mutex_unlock() (including deboost) without holding
hb->lock as a next step.

Nothing yet relies on the new locking rules.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: juri.le...@arm.com
Cc: bige...@linutronix.de
Cc: xlp...@redhat.com
Cc: rost...@goodmis.org
Cc: mathieu.desnoy...@efficios.com
Cc: jdesfos...@efficios.com
Cc: dvh...@infradead.org
Cc: bris...@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.751993...@infradead.org
Signed-off-by: Thomas Gleixner 
[Lee: Back-ported in support of a previous futex back-port attempt]
Signed-off-by: Lee Jones 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Zheng Yejian 
---
 kernel/futex.c | 138 +++--
 1 file changed, 112 insertions(+), 26 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index a14b7ef90e5c..b410752f5ad1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1014,6 +1014,39 @@ static void exit_pi_state_list(struct task_struct *curr)
  * [10] There is no transient state which leaves owner and user space
  * TID out of sync. Except one error case where the kernel is denied
  * write access to the user address, see fixup_pi_state_owner().
+ *
+ *
+ * Serialization and lifetime rules:
+ *
+ * hb->lock:
+ *
+ * hb -> futex_q, relation
+ * futex_q -> pi_state, relation
+ *
+ * (cannot be raw because hb can contain arbitrary amount
+ *  of futex_q's)
+ *
+ * pi_mutex->wait_lock:
+ *
+ * {uval, pi_state}
+ *
+ * (and pi_mutex 'obviously')
+ *
+ * p->pi_lock:
+ *
+ * p->pi_state_list -> pi_state->list, relation
+ *
+ * pi_state->refcount:
+ *
+ * pi_state lifetime
+ *
+ *
+ * Lock order:
+ *
+ *   hb->lock
+ * pi_mutex->wait_lock
+ *   p->pi_lock
+ *
  */
 
 /*
@@ -1021,10 +1054,12 @@ static void exit_pi_state_list(struct task_struct *curr)
  * the pi_state against the user space value. If correct, attach to
  * it.
  */
-static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state,
+static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
+ struct futex_pi_state *pi_state,
  struct futex_pi_state **ps)
 {
pid_t pid = uval & FUTEX_TID_MASK;
+   int ret, uval2;
 
/*
 * Userspace might have messed up non-PI and PI futexes [3]
@@ -1032,8 +1067,33 @@ static int attach_to_pi_state(u32 uval, struct 
futex_pi_state *pi_state,
if (unlikely(!pi_state))
return -EINVAL;
 
+   /*
+* We get here with hb->lock held, and having found a
+* futex_top_waiter(). This means that futex_lock_pi() of said futex_q
+* has dropped the hb->lock in between queue_me() and unqueue_me_pi(),
+* which in turn means that futex_lock_pi() still has a reference on
+* our pi_state.
+*/
WARN_ON(!atomic_read(_state->refcount));
 
+   /*
+* Now that we have a pi_state, we can acquire wait_lock
+* and do the state validation.
+*/
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
+
+   /*
+* Since {uval, pi_state} is serialized by wait_lock, and our current
+* uval was read without holding it, it can have changed. Verify it
+* still is what we expect it to be, otherwise retry the entire
+* operation.
+*/
+   if (get_futex_value_locked(, uaddr))
+   goto out_efault;
+
+   if (uval != uval2)
+   goto out_eagain;
+
/*
 * Handle the owner died case:
 */
@@ -1049,11 +1109,11 @@ static int attach_to_pi_state(u32 uval, struct 
futex_pi_state *pi_state,
 * is not 0. Inconsistent state. [5]
 */
if (pid)
-   return -EINVAL;
+   goto out_einval;
/*
 * Take a ref on the state and return success. [4]
 */
-   goto out_state;
+   goto out_attach;
}
 
/*
@@ -1065,14 +1125,14 @@ static int attach_to_pi_state(u32 uval, struct 
futex_pi_state *pi_state,
 * Take a ref on the state and return success. [6]
 */
if (!pid)
-   goto out_state;
+   goto out_attach;
} else {
/*
 * If the owner died bit is not set,