On Wed, Jun 15, 2016 at 08:25:07AM +0100, Juri Lelli wrote:
> I guess it's not that likely, but yes it could potentially happen that a
> waiter is optimistically spinning, depletes its runtime, gets throttled
> and then replenished when still spinning. Maybe it doesn't really make
> sense continuing spinning in this situation, but I guess things get
> really complicated. :-/
> 
> Anyway, as said, I think this patch is OK. Maybe we want to add a
> comment just to remember what situation can cause an issue if we don't
> do this? Patch changelog would be OK as well for such a comment IMHO.


OK, so I went to write a simple comment and ended up with the below :/

While writing the comment I noticed two issues:

 - we update the waiter order fields while the entry is still enqueued
   on the pi_waiters tree, which is also sorted by these exact fields.

 - another one of these pure ->prio comparisons

Please double check, there be dragons here.

---

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -158,6 +158,9 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
+#define cmp_task(p)    \
+       &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = 
(p)->dl.deadline }
+
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
                     struct rt_mutex_waiter *right)
@@ -583,8 +586,26 @@ static int rt_mutex_adjust_prio_chain(st
 
        /* [7] Requeue the waiter in the lock waiter tree. */
        rt_mutex_dequeue(lock, waiter);
+
+       /*
+        * Update the waiter prio fields now that we're dequeued.
+        *
+        * These values can have changed through either:
+        *
+        *   sys_sched_set_scheduler() / sys_sched_setattr()
+        *
+        * or
+        *
+        *   DL CBS enforcement advancing the effective deadline.
+        *
+        * Even though pi_waiters also uses these fields, and that tree is only
+        * updated in [11], we can do this here, since we hold [L], which
+        * serializes all pi_waiters access and rb_erase() does not care about
+        * the values of the node being removed.
+        */
        waiter->prio = task->prio;
        waiter->deadline = task->dl.deadline;
+
        rt_mutex_enqueue(lock, waiter);
 
        /* [8] Release the task */
@@ -711,6 +732,8 @@ static int rt_mutex_adjust_prio_chain(st
 static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct 
*task,
                                struct rt_mutex_waiter *waiter)
 {
+       lockdep_assert_held(&lock->wait_lock);
+
        /*
         * Before testing whether we can acquire @lock, we set the
         * RT_MUTEX_HAS_WAITERS bit in @lock->owner. This forces all
@@ -770,7 +793,8 @@ static int try_to_take_rt_mutex(struct r
                         * the top waiter priority (kernel view),
                         * @task lost.
                         */
-                       if (task->prio >= rt_mutex_top_waiter(lock)->prio)
+                       if (!rt_mutex_waiter_less(cmp_task(task),
+                                                 rt_mutex_top_waiter(lock)))
                                return 0;
 
                        /*
@@ -838,6 +862,8 @@ static int task_blocks_on_rt_mutex(struc
        struct rt_mutex *next_lock;
        int chain_walk = 0, res;
 
+       lockdep_assert_held(&lock->wait_lock);
+
        /*
         * Early deadlock detection. We really don't want the task to
         * enqueue on itself just to untangle the mess later. It's not
@@ -973,6 +999,8 @@ static void remove_waiter(struct rt_mute
        struct task_struct *owner = rt_mutex_owner(lock);
        struct rt_mutex *next_lock;
 
+       lockdep_assert_held(&lock->wait_lock);
+
        raw_spin_lock(&current->pi_lock);
        rt_mutex_dequeue(lock, waiter);
        current->pi_blocked_on = NULL;

Reply via email to