On Wed, Jun 15, 2016 at 08:25:07AM +0100, Juri Lelli wrote: > I guess it's not that likely, but yes it could potentially happen that a > waiter is optimistically spinning, depletes its runtime, gets throttled > and then replenished when still spinning. Maybe it doesn't really make > sense continuing spinning in this situation, but I guess things get > really complicated. :-/ > > Anyway, as said, I think this patch is OK. Maybe we want to add a > comment just to remember what situation can cause an issue if we don't > do this? Patch changelog would be OK as well for such a comment IMHO.
OK, so I went to write a simple comment and ended up with the below :/ While writing the comment I noticed two issues: - we update the waiter order fields while the entry is still enqueued on the pi_waiters tree, which is also sorted by these exact fields. - another one of these pure ->prio comparisons Please double check, there be dragons here. --- --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -158,6 +158,9 @@ static inline bool unlock_rt_mutex_safe( } #endif +#define cmp_task(p) \ + &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } + static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, struct rt_mutex_waiter *right) @@ -583,8 +586,26 @@ static int rt_mutex_adjust_prio_chain(st /* [7] Requeue the waiter in the lock waiter tree. */ rt_mutex_dequeue(lock, waiter); + + /* + * Update the waiter prio fields now that we're dequeued. + * + * These values can have changed through either: + * + * sys_sched_set_scheduler() / sys_sched_setattr() + * + * or + * + * DL CBS enforcement advancing the effective deadline. + * + * Even though pi_waiters also uses these fields, and that tree is only + * updated in [11], we can do this here, since we hold [L], which + * serializes all pi_waiters access and rb_erase() does not care about + * the values of the node being removed. + */ waiter->prio = task->prio; waiter->deadline = task->dl.deadline; + rt_mutex_enqueue(lock, waiter); /* [8] Release the task */ @@ -711,6 +732,8 @@ static int rt_mutex_adjust_prio_chain(st static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, struct rt_mutex_waiter *waiter) { + lockdep_assert_held(&lock->wait_lock); + /* * Before testing whether we can acquire @lock, we set the * RT_MUTEX_HAS_WAITERS bit in @lock->owner. This forces all @@ -770,7 +793,8 @@ static int try_to_take_rt_mutex(struct r * the top waiter priority (kernel view), * @task lost. */ - if (task->prio >= rt_mutex_top_waiter(lock)->prio) + if (!rt_mutex_waiter_less(cmp_task(task), + rt_mutex_top_waiter(lock))) return 0; /* @@ -838,6 +862,8 @@ static int task_blocks_on_rt_mutex(struc struct rt_mutex *next_lock; int chain_walk = 0, res; + lockdep_assert_held(&lock->wait_lock); + /* * Early deadlock detection. We really don't want the task to * enqueue on itself just to untangle the mess later. It's not @@ -973,6 +999,8 @@ static void remove_waiter(struct rt_mute struct task_struct *owner = rt_mutex_owner(lock); struct rt_mutex *next_lock; + lockdep_assert_held(&lock->wait_lock); + raw_spin_lock(¤t->pi_lock); rt_mutex_dequeue(lock, waiter); current->pi_blocked_on = NULL;