Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
Hi Peter, On 2016/04/20 at 21:49, Xunlei Pang wrote: > On 2016/04/20 at 21:19, Peter Zijlstra wrote: >> On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: >>> + /* Updated under pi_lock and rtmutex lock */ >>> struct rb_node *pi_waiters_leftmost; >>> + struct rb_node *pi_waiters_leftmost_copy; >>> struct task_struct *rt_mutex_get_top_task(struct task_struct *task) >>> { >>> + if (!task->pi_waiters_leftmost_copy) >>> return NULL; >>> >>> + return rb_entry(task->pi_waiters_leftmost_copy, >>> + struct rt_mutex_waiter, pi_tree_entry)->task; >>> } >> why ?! Why not keep a regular task_struct pointer and avoid this stuff? > I meant to make it semantically consistent, but I can change it since you > think task_struct is better. What do you think this version of PATCH1 and PATCH2? If you are fine with it, I can sent it out as v4 separately, then we can focus on the issue in PATCH5. Thanks! > > Regards, > Xunlei
Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
Hi Peter, On 2016/04/20 at 21:49, Xunlei Pang wrote: > On 2016/04/20 at 21:19, Peter Zijlstra wrote: >> On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: >>> + /* Updated under pi_lock and rtmutex lock */ >>> struct rb_node *pi_waiters_leftmost; >>> + struct rb_node *pi_waiters_leftmost_copy; >>> struct task_struct *rt_mutex_get_top_task(struct task_struct *task) >>> { >>> + if (!task->pi_waiters_leftmost_copy) >>> return NULL; >>> >>> + return rb_entry(task->pi_waiters_leftmost_copy, >>> + struct rt_mutex_waiter, pi_tree_entry)->task; >>> } >> why ?! Why not keep a regular task_struct pointer and avoid this stuff? > I meant to make it semantically consistent, but I can change it since you > think task_struct is better. What do you think this version of PATCH1 and PATCH2? If you are fine with it, I can sent it out as v4 separately, then we can focus on the issue in PATCH5. Thanks! > > Regards, > Xunlei
Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
On 2016/04/20 at 21:19, Peter Zijlstra wrote: > On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: >> +/* Updated under pi_lock and rtmutex lock */ >> struct rb_node *pi_waiters_leftmost; >> +struct rb_node *pi_waiters_leftmost_copy; >> struct task_struct *rt_mutex_get_top_task(struct task_struct *task) >> { >> +if (!task->pi_waiters_leftmost_copy) >> return NULL; >> >> +return rb_entry(task->pi_waiters_leftmost_copy, >> +struct rt_mutex_waiter, pi_tree_entry)->task; >> } > why ?! Why not keep a regular task_struct pointer and avoid this stuff? I meant to make it semantically consistent, but I can change it since you think task_struct is better. Regards, Xunlei
Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
On 2016/04/20 at 21:19, Peter Zijlstra wrote: > On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: >> +/* Updated under pi_lock and rtmutex lock */ >> struct rb_node *pi_waiters_leftmost; >> +struct rb_node *pi_waiters_leftmost_copy; >> struct task_struct *rt_mutex_get_top_task(struct task_struct *task) >> { >> +if (!task->pi_waiters_leftmost_copy) >> return NULL; >> >> +return rb_entry(task->pi_waiters_leftmost_copy, >> +struct rt_mutex_waiter, pi_tree_entry)->task; >> } > why ?! Why not keep a regular task_struct pointer and avoid this stuff? I meant to make it semantically consistent, but I can change it since you think task_struct is better. Regards, Xunlei
Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: > + /* Updated under pi_lock and rtmutex lock */ > struct rb_node *pi_waiters_leftmost; > + struct rb_node *pi_waiters_leftmost_copy; > struct task_struct *rt_mutex_get_top_task(struct task_struct *task) > { > + if (!task->pi_waiters_leftmost_copy) > return NULL; > > + return rb_entry(task->pi_waiters_leftmost_copy, > + struct rt_mutex_waiter, pi_tree_entry)->task; > } why ?! Why not keep a regular task_struct pointer and avoid this stuff?
Re: [PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
On Thu, Apr 14, 2016 at 07:37:03PM +0800, Xunlei Pang wrote: > + /* Updated under pi_lock and rtmutex lock */ > struct rb_node *pi_waiters_leftmost; > + struct rb_node *pi_waiters_leftmost_copy; > struct task_struct *rt_mutex_get_top_task(struct task_struct *task) > { > + if (!task->pi_waiters_leftmost_copy) > return NULL; > > + return rb_entry(task->pi_waiters_leftmost_copy, > + struct rt_mutex_waiter, pi_tree_entry)->task; > } why ?! Why not keep a regular task_struct pointer and avoid this stuff?
[PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
A crash happened while I was playing with deadline PI rtmutex. BUG: unable to handle kernel NULL pointer dereference at 0018 IP: [] rt_mutex_get_top_task+0x1f/0x30 PGD 232a75067 PUD 230947067 PMD 0 Oops: [#1] SMP CPU: 1 PID: 10994 Comm: a.out Not tainted Call Trace: [] ? enqueue_task_dl+0x2a/0x320 [] enqueue_task+0x2c/0x80 [] activate_task+0x23/0x30 [] pull_dl_task+0x1d5/0x260 [] pre_schedule_dl+0x16/0x20 [] __schedule+0xd3/0x900 [] schedule+0x29/0x70 [] __rt_mutex_slowlock+0x4b/0xc0 [] rt_mutex_slowlock+0xd1/0x190 [] rt_mutex_timed_lock+0x53/0x60 [] futex_lock_pi.isra.18+0x28c/0x390 [] ? enqueue_task_dl+0x195/0x320 [] ? prio_changed_dl+0x6c/0x90 [] do_futex+0x190/0x5b0 [] SyS_futex+0x80/0x180 [] system_call_fastpath+0x16/0x1b RIP [] rt_mutex_get_top_task+0x1f/0x30 This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() are only protected by pi_lock when operating pi waiters, while rt_mutex_get_top_task()(now replaced by rt_mutex_get_top_waiter() by previous patches, same issue), will access them with rq lock held but not holding pi_lock. In order to tackle it, we introduce new "pi_waiters_leftmost_copy" in task_struct, and add a new function named rt_mutex_update_copy() to do the copy, it can be called by rt_mutex_setprio() which held owner's pi_lock, rq lock, and rt_mutex lock. But one exception is that rt_mutex_setprio() can be called without rtmutex locked after mark_wakeup_next_waiter() by rt_mutex_adjust_prio(), so we need a new function in mark_wakeup_next_waiter() to lock rq first then call rt_mutex_update_copy(). We avoid adding new logic by calling __rt_mutex_adjust_prio() directly in mark_wakeup_next_waiter()and remove the old rt_mutex_adjust_prio() outside the lock. Since we moved the deboost point, in order to avoid current process to be preempted(due to deboost earlier) before finishing wake_up_q(), we also move preempt_disable() before unlocking rtmutex. Originally-From: Peter ZijlstraSigned-off-by: Xunlei Pang --- include/linux/init_task.h | 3 ++- include/linux/sched.h | 3 +++ include/linux/sched/deadline.h | 2 ++ kernel/fork.c | 1 + kernel/locking/rtmutex.c | 61 +- kernel/sched/core.c| 2 ++ 6 files changed, 35 insertions(+), 37 deletions(-) diff --git a/include/linux/init_task.h b/include/linux/init_task.h index f2cb8d4..7be5a83 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -162,7 +162,8 @@ extern struct task_group root_task_group; #ifdef CONFIG_RT_MUTEXES # define INIT_RT_MUTEXES(tsk) \ .pi_waiters = RB_ROOT, \ - .pi_waiters_leftmost = NULL, + .pi_waiters_leftmost = NULL,\ + .pi_waiters_leftmost_copy = NULL, #else # define INIT_RT_MUTEXES(tsk) #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 45e848c..332a6b5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1621,7 +1621,10 @@ struct task_struct { #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task */ struct rb_root pi_waiters; + /* Updated under pi_lock and rtmutex lock */ struct rb_node *pi_waiters_leftmost; + /* Updated under pi_lock, rq_lock, and rtmutex lock */ + struct rb_node *pi_waiters_leftmost_copy; /* Deadlock detection and priority inheritance handling */ struct rt_mutex_waiter *pi_blocked_on; #endif diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h index 9089a2a..e8304d4 100644 --- a/include/linux/sched/deadline.h +++ b/include/linux/sched/deadline.h @@ -26,4 +26,6 @@ static inline bool dl_time_before(u64 a, u64 b) return (s64)(a - b) < 0; } +extern void rt_mutex_update_copy(struct task_struct *p); + #endif /* _SCHED_DEADLINE_H */ diff --git a/kernel/fork.c b/kernel/fork.c index a453546..4b847e4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1225,6 +1225,7 @@ static void rt_mutex_init_task(struct task_struct *p) #ifdef CONFIG_RT_MUTEXES p->pi_waiters = RB_ROOT; p->pi_waiters_leftmost = NULL; + p->pi_waiters_leftmost_copy = NULL; p->pi_blocked_on = NULL; #endif } diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 42bc59b..00c6560 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -273,10 +273,11 @@ int rt_mutex_getprio(struct task_struct *task) struct task_struct *rt_mutex_get_top_task(struct task_struct *task) { - if (likely(!task_has_pi_waiters(task))) + if (!task->pi_waiters_leftmost_copy) return NULL; - return task_top_pi_waiter(task)->task; + return rb_entry(task->pi_waiters_leftmost_copy, +
[PATCH v3 2/6] sched/rtmutex/deadline: Fix a PI crash for deadline tasks
A crash happened while I was playing with deadline PI rtmutex. BUG: unable to handle kernel NULL pointer dereference at 0018 IP: [] rt_mutex_get_top_task+0x1f/0x30 PGD 232a75067 PUD 230947067 PMD 0 Oops: [#1] SMP CPU: 1 PID: 10994 Comm: a.out Not tainted Call Trace: [] ? enqueue_task_dl+0x2a/0x320 [] enqueue_task+0x2c/0x80 [] activate_task+0x23/0x30 [] pull_dl_task+0x1d5/0x260 [] pre_schedule_dl+0x16/0x20 [] __schedule+0xd3/0x900 [] schedule+0x29/0x70 [] __rt_mutex_slowlock+0x4b/0xc0 [] rt_mutex_slowlock+0xd1/0x190 [] rt_mutex_timed_lock+0x53/0x60 [] futex_lock_pi.isra.18+0x28c/0x390 [] ? enqueue_task_dl+0x195/0x320 [] ? prio_changed_dl+0x6c/0x90 [] do_futex+0x190/0x5b0 [] SyS_futex+0x80/0x180 [] system_call_fastpath+0x16/0x1b RIP [] rt_mutex_get_top_task+0x1f/0x30 This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() are only protected by pi_lock when operating pi waiters, while rt_mutex_get_top_task()(now replaced by rt_mutex_get_top_waiter() by previous patches, same issue), will access them with rq lock held but not holding pi_lock. In order to tackle it, we introduce new "pi_waiters_leftmost_copy" in task_struct, and add a new function named rt_mutex_update_copy() to do the copy, it can be called by rt_mutex_setprio() which held owner's pi_lock, rq lock, and rt_mutex lock. But one exception is that rt_mutex_setprio() can be called without rtmutex locked after mark_wakeup_next_waiter() by rt_mutex_adjust_prio(), so we need a new function in mark_wakeup_next_waiter() to lock rq first then call rt_mutex_update_copy(). We avoid adding new logic by calling __rt_mutex_adjust_prio() directly in mark_wakeup_next_waiter()and remove the old rt_mutex_adjust_prio() outside the lock. Since we moved the deboost point, in order to avoid current process to be preempted(due to deboost earlier) before finishing wake_up_q(), we also move preempt_disable() before unlocking rtmutex. Originally-From: Peter Zijlstra Signed-off-by: Xunlei Pang --- include/linux/init_task.h | 3 ++- include/linux/sched.h | 3 +++ include/linux/sched/deadline.h | 2 ++ kernel/fork.c | 1 + kernel/locking/rtmutex.c | 61 +- kernel/sched/core.c| 2 ++ 6 files changed, 35 insertions(+), 37 deletions(-) diff --git a/include/linux/init_task.h b/include/linux/init_task.h index f2cb8d4..7be5a83 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -162,7 +162,8 @@ extern struct task_group root_task_group; #ifdef CONFIG_RT_MUTEXES # define INIT_RT_MUTEXES(tsk) \ .pi_waiters = RB_ROOT, \ - .pi_waiters_leftmost = NULL, + .pi_waiters_leftmost = NULL,\ + .pi_waiters_leftmost_copy = NULL, #else # define INIT_RT_MUTEXES(tsk) #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 45e848c..332a6b5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1621,7 +1621,10 @@ struct task_struct { #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task */ struct rb_root pi_waiters; + /* Updated under pi_lock and rtmutex lock */ struct rb_node *pi_waiters_leftmost; + /* Updated under pi_lock, rq_lock, and rtmutex lock */ + struct rb_node *pi_waiters_leftmost_copy; /* Deadlock detection and priority inheritance handling */ struct rt_mutex_waiter *pi_blocked_on; #endif diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h index 9089a2a..e8304d4 100644 --- a/include/linux/sched/deadline.h +++ b/include/linux/sched/deadline.h @@ -26,4 +26,6 @@ static inline bool dl_time_before(u64 a, u64 b) return (s64)(a - b) < 0; } +extern void rt_mutex_update_copy(struct task_struct *p); + #endif /* _SCHED_DEADLINE_H */ diff --git a/kernel/fork.c b/kernel/fork.c index a453546..4b847e4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1225,6 +1225,7 @@ static void rt_mutex_init_task(struct task_struct *p) #ifdef CONFIG_RT_MUTEXES p->pi_waiters = RB_ROOT; p->pi_waiters_leftmost = NULL; + p->pi_waiters_leftmost_copy = NULL; p->pi_blocked_on = NULL; #endif } diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 42bc59b..00c6560 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -273,10 +273,11 @@ int rt_mutex_getprio(struct task_struct *task) struct task_struct *rt_mutex_get_top_task(struct task_struct *task) { - if (likely(!task_has_pi_waiters(task))) + if (!task->pi_waiters_leftmost_copy) return NULL; - return task_top_pi_waiter(task)->task; + return rb_entry(task->pi_waiters_leftmost_copy, + struct