[PATCH 1/1] powerpc: get_wchan(): solve possible race scenario due to parallel wakeup
Add a check for p->state == TASK_RUNNING so that any wake-ups on task_struct p in the interim lead to 0 being returned by get_wchan(). Signed-off-by: Kautuk Consul <kautuk.consul.1...@gmail.com> --- arch/powerpc/kernel/process.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b8500b4..f233352 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1785,7 +1785,8 @@ unsigned long get_wchan(struct task_struct *p) do { sp = *(unsigned long *)sp; - if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD)) + if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) || + p->state == TASK_RUNNING) return 0; if (count > 0) { ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE]; -- 1.7.9.5
[PATCH 1/1] powerpc: get_wchan(): solve possible race scenario due to parallel wakeup
Add a check for p->state == TASK_RUNNING so that any wake-ups on task_struct p in the interim lead to 0 being returned by get_wchan(). Signed-off-by: Kautuk Consul --- arch/powerpc/kernel/process.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b8500b4..f233352 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1785,7 +1785,8 @@ unsigned long get_wchan(struct task_struct *p) do { sp = *(unsigned long *)sp; - if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD)) + if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) || + p->state == TASK_RUNNING) return 0; if (count > 0) { ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE]; -- 1.7.9.5
Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()
Sorry folks, I got one thing wrong: >From some more code review, both __down_common() and do_wait_for_common() inspect the signal_pending() only while in TASK_RUNNING. So I think that it cannot be possible that this happened on my system due to __down_common() and/or wait_for_common(). Which only leaves out the possibility that the BUG() on my embedded system happened due to a driver which was trying to implement its own sleeping primitive by first setting the task state to TASK_INTERRUPTIBLE and then checking for signal_pending/signal_pending_state. (But this is anyway generally frowned upon and not really acceptable nowadays). I'll review those drivers for this kind of mistake. On Mon, Aug 25, 2014 at 9:27 PM, Oleg Nesterov wrote: > On 08/25, Kautuk Consul wrote: >> >> I encountered a BUG() scenario within do_exit() on an ARM system. >> >> The problem is due to a race scenario between do_exit() and try_to_wake_up() >> on different CPUs due to usage of sleeping primitives such as __down_common >> and wait_for_common. >> >> Race Scenario >> = >> >> Let us assume there are 2 CPUs A and B execute code in the following order: >> 1)CPU A was running in user-mode and enters kernel mode via some >> syscall/exception handler. >> 2)CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via >> __down_common >> or wait_for_common. >> 3)CPU A checks for signal_pending() and returns due to TIF_SIGPENDING >> being set in t's threadinfo due to a previous signal(say SIGKILL) being >> received on this task t. >> 4)CPU A returns returns back to the assembly trap handler and calls >> do_work_pending() -> do_signal() -> get_signal() -> do_group_exit() >>-> do_exit() >> CPU A has not yet executed the following line of code before the final >> call to schedule: >> /* causes final put_task_struct in finish_task_switch(). */ >> tsk->state = TASK_DEAD; >> 5)CPU B tries to send a signal to task t (currently executing on CPU A) >> and thus enters: signal_wake_up_state() -> wake_up_state() -> >>try_to_wake_up() >> 6)CPU B executes all code in try_to_wake_up() till the call to >> ttwu_queue -> ttwu_do_activate -> ttwu_do_wakeup(). >> CPU B has still not executed the following code in ttwu_do_wakeup(): >> p->state = TASK_RUNNING; >> 7)CPU A executes the following line of code: >> /* causes final put_task_struct in finish_task_switch(). */ >> tsk->state = TASK_DEAD; >> 8)CPU B executes the following code in ttwu_do_wakeup(): >> p->state = TASK_RUNNING; >> 9)CPU A continues to the call to do_exit() -> schedule(). >> Since the tsk->state is TASK_RUNNING, the call to schedule() returns >> and >> do_exit() -> BUG() is hit on CPU A. >> >> Alternate Solution >> == >> >> An alternate solution would be to simply set the current task state to >> TASK_RUNNING in __down_common(), wait_for_common() and all other >> interruptible >> sleeping primitives in their if(signal_pending/signal_pending_state) >> conditional >> blocks. >> >> But this change seems to me to be more logical because: >> i)This will involve lesser changes to the kernel core code. >> ii) Any further sleeping primitives in the kernel also need not >> suffer from >> this kind of race scenario. >> >> Signed-off-by: Kautuk Consul >> --- >> kernel/exit.c | 10 ++ >> 1 file changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/exit.c b/kernel/exit.c >> index 32c58f7..69a8231 100644 >> --- a/kernel/exit.c >> +++ b/kernel/exit.c >> @@ -824,14 +824,16 @@ void do_exit(long code) >>* (or hypervisor of virtual machine switches to other guest) >>* As a result, we may become TASK_RUNNING after becoming TASK_DEAD >>* >> - * To avoid it, we have to wait for releasing tsk->pi_lock which >> - * is held by try_to_wake_up() >> + * To solve this, we have to compete for tsk->pi_lock which is held by >> + * try_to_wake_up(). >>*/ >> - smp_mb(); >> - raw_spin_unlock_wait(>pi_lock); >> + raw_spin_lock(>pi_lock); >> >> /* causes final put_task_struct in finish_task_switch(). */ >> tsk->state = TASK_DEAD; >> + >> + raw_spin_unlock(>pi_lock); >> + >> ts
[PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()
I encountered a BUG() scenario within do_exit() on an ARM system. The problem is due to a race scenario between do_exit() and try_to_wake_up() on different CPUs due to usage of sleeping primitives such as __down_common and wait_for_common. Race Scenario = Let us assume there are 2 CPUs A and B execute code in the following order: 1) CPU A was running in user-mode and enters kernel mode via some syscall/exception handler. 2) CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via __down_common or wait_for_common. 3) CPU A checks for signal_pending() and returns due to TIF_SIGPENDING being set in t's threadinfo due to a previous signal(say SIGKILL) being received on this task t. 4) CPU A returns returns back to the assembly trap handler and calls do_work_pending() -> do_signal() -> get_signal() -> do_group_exit() -> do_exit() CPU A has not yet executed the following line of code before the final call to schedule: /* causes final put_task_struct in finish_task_switch(). */ tsk->state = TASK_DEAD; 5) CPU B tries to send a signal to task t (currently executing on CPU A) and thus enters: signal_wake_up_state() -> wake_up_state() -> try_to_wake_up() 6) CPU B executes all code in try_to_wake_up() till the call to ttwu_queue -> ttwu_do_activate -> ttwu_do_wakeup(). CPU B has still not executed the following code in ttwu_do_wakeup(): p->state = TASK_RUNNING; 7) CPU A executes the following line of code: /* causes final put_task_struct in finish_task_switch(). */ tsk->state = TASK_DEAD; 8) CPU B executes the following code in ttwu_do_wakeup(): p->state = TASK_RUNNING; 9) CPU A continues to the call to do_exit() -> schedule(). Since the tsk->state is TASK_RUNNING, the call to schedule() returns and do_exit() -> BUG() is hit on CPU A. Alternate Solution == An alternate solution would be to simply set the current task state to TASK_RUNNING in __down_common(), wait_for_common() and all other interruptible sleeping primitives in their if(signal_pending/signal_pending_state) conditional blocks. But this change seems to me to be more logical because: i) This will involve lesser changes to the kernel core code. ii) Any further sleeping primitives in the kernel also need not suffer from this kind of race scenario. Signed-off-by: Kautuk Consul --- kernel/exit.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 32c58f7..69a8231 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -824,14 +824,16 @@ void do_exit(long code) * (or hypervisor of virtual machine switches to other guest) * As a result, we may become TASK_RUNNING after becoming TASK_DEAD * -* To avoid it, we have to wait for releasing tsk->pi_lock which -* is held by try_to_wake_up() +* To solve this, we have to compete for tsk->pi_lock which is held by +* try_to_wake_up(). */ - smp_mb(); - raw_spin_unlock_wait(>pi_lock); + raw_spin_lock(>pi_lock); /* causes final put_task_struct in finish_task_switch(). */ tsk->state = TASK_DEAD; + + raw_spin_unlock(>pi_lock); + tsk->flags |= PF_NOFREEZE; /* tell freezer to ignore us */ schedule(); BUG(); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()
I encountered a BUG() scenario within do_exit() on an ARM system. The problem is due to a race scenario between do_exit() and try_to_wake_up() on different CPUs due to usage of sleeping primitives such as __down_common and wait_for_common. Race Scenario = Let us assume there are 2 CPUs A and B execute code in the following order: 1) CPU A was running in user-mode and enters kernel mode via some syscall/exception handler. 2) CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via __down_common or wait_for_common. 3) CPU A checks for signal_pending() and returns due to TIF_SIGPENDING being set in t's threadinfo due to a previous signal(say SIGKILL) being received on this task t. 4) CPU A returns returns back to the assembly trap handler and calls do_work_pending() - do_signal() - get_signal() - do_group_exit() - do_exit() CPU A has not yet executed the following line of code before the final call to schedule: /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; 5) CPU B tries to send a signal to task t (currently executing on CPU A) and thus enters: signal_wake_up_state() - wake_up_state() - try_to_wake_up() 6) CPU B executes all code in try_to_wake_up() till the call to ttwu_queue - ttwu_do_activate - ttwu_do_wakeup(). CPU B has still not executed the following code in ttwu_do_wakeup(): p-state = TASK_RUNNING; 7) CPU A executes the following line of code: /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; 8) CPU B executes the following code in ttwu_do_wakeup(): p-state = TASK_RUNNING; 9) CPU A continues to the call to do_exit() - schedule(). Since the tsk-state is TASK_RUNNING, the call to schedule() returns and do_exit() - BUG() is hit on CPU A. Alternate Solution == An alternate solution would be to simply set the current task state to TASK_RUNNING in __down_common(), wait_for_common() and all other interruptible sleeping primitives in their if(signal_pending/signal_pending_state) conditional blocks. But this change seems to me to be more logical because: i) This will involve lesser changes to the kernel core code. ii) Any further sleeping primitives in the kernel also need not suffer from this kind of race scenario. Signed-off-by: Kautuk Consul consul.kau...@gmail.com --- kernel/exit.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 32c58f7..69a8231 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -824,14 +824,16 @@ void do_exit(long code) * (or hypervisor of virtual machine switches to other guest) * As a result, we may become TASK_RUNNING after becoming TASK_DEAD * -* To avoid it, we have to wait for releasing tsk-pi_lock which -* is held by try_to_wake_up() +* To solve this, we have to compete for tsk-pi_lock which is held by +* try_to_wake_up(). */ - smp_mb(); - raw_spin_unlock_wait(tsk-pi_lock); + raw_spin_lock(tsk-pi_lock); /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; + + raw_spin_unlock(tsk-pi_lock); + tsk-flags |= PF_NOFREEZE; /* tell freezer to ignore us */ schedule(); BUG(); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()
Sorry folks, I got one thing wrong: From some more code review, both __down_common() and do_wait_for_common() inspect the signal_pending() only while in TASK_RUNNING. So I think that it cannot be possible that this happened on my system due to __down_common() and/or wait_for_common(). Which only leaves out the possibility that the BUG() on my embedded system happened due to a driver which was trying to implement its own sleeping primitive by first setting the task state to TASK_INTERRUPTIBLE and then checking for signal_pending/signal_pending_state. (But this is anyway generally frowned upon and not really acceptable nowadays). I'll review those drivers for this kind of mistake. On Mon, Aug 25, 2014 at 9:27 PM, Oleg Nesterov o...@redhat.com wrote: On 08/25, Kautuk Consul wrote: I encountered a BUG() scenario within do_exit() on an ARM system. The problem is due to a race scenario between do_exit() and try_to_wake_up() on different CPUs due to usage of sleeping primitives such as __down_common and wait_for_common. Race Scenario = Let us assume there are 2 CPUs A and B execute code in the following order: 1)CPU A was running in user-mode and enters kernel mode via some syscall/exception handler. 2)CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via __down_common or wait_for_common. 3)CPU A checks for signal_pending() and returns due to TIF_SIGPENDING being set in t's threadinfo due to a previous signal(say SIGKILL) being received on this task t. 4)CPU A returns returns back to the assembly trap handler and calls do_work_pending() - do_signal() - get_signal() - do_group_exit() - do_exit() CPU A has not yet executed the following line of code before the final call to schedule: /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; 5)CPU B tries to send a signal to task t (currently executing on CPU A) and thus enters: signal_wake_up_state() - wake_up_state() - try_to_wake_up() 6)CPU B executes all code in try_to_wake_up() till the call to ttwu_queue - ttwu_do_activate - ttwu_do_wakeup(). CPU B has still not executed the following code in ttwu_do_wakeup(): p-state = TASK_RUNNING; 7)CPU A executes the following line of code: /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; 8)CPU B executes the following code in ttwu_do_wakeup(): p-state = TASK_RUNNING; 9)CPU A continues to the call to do_exit() - schedule(). Since the tsk-state is TASK_RUNNING, the call to schedule() returns and do_exit() - BUG() is hit on CPU A. Alternate Solution == An alternate solution would be to simply set the current task state to TASK_RUNNING in __down_common(), wait_for_common() and all other interruptible sleeping primitives in their if(signal_pending/signal_pending_state) conditional blocks. But this change seems to me to be more logical because: i)This will involve lesser changes to the kernel core code. ii) Any further sleeping primitives in the kernel also need not suffer from this kind of race scenario. Signed-off-by: Kautuk Consul consul.kau...@gmail.com --- kernel/exit.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 32c58f7..69a8231 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -824,14 +824,16 @@ void do_exit(long code) * (or hypervisor of virtual machine switches to other guest) * As a result, we may become TASK_RUNNING after becoming TASK_DEAD * - * To avoid it, we have to wait for releasing tsk-pi_lock which - * is held by try_to_wake_up() + * To solve this, we have to compete for tsk-pi_lock which is held by + * try_to_wake_up(). */ - smp_mb(); - raw_spin_unlock_wait(tsk-pi_lock); + raw_spin_lock(tsk-pi_lock); /* causes final put_task_struct in finish_task_switch(). */ tsk-state = TASK_DEAD; + + raw_spin_unlock(tsk-pi_lock); + tsk-flags |= PF_NOFREEZE; /* tell freezer to ignore us */ schedule(); BUG(); -- Peter, do you remember another problem with TASK_DEAD we discussed recently? (prev_state == TASK_DEAD detection in finish_task_switch() still looks racy). I am starting to think that perhaps we need something like below, what do you all think? Oleg. --- x/kernel/sched/core.c +++ x/kernel/sched/core.c @@ -2205,9 +2205,10 @@ static void finish_task_switch(struct rq __releases(rq-lock) { struct mm_struct *mm = rq-prev_mm; - long prev_state; + struct task_struct *dead = rq-dead; rq-prev_mm = NULL; + rq-dead = NULL; /* * A task struct has one reference
Re: [QUERY]: Understanding the calculations in mm/page-writeback.c
> > Here is the slides I used in LinuxCon Japan 2012, please feel free to > ask more specific questions on it :) > > http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf > Thanks for your help and early response ! I shall definitely study this and ping you as and when doubts start arising in my mind. :-) > Thanks, > Fengguang > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUERY]: Understanding the calculations in mm/page-writeback.c
Here is the slides I used in LinuxCon Japan 2012, please feel free to ask more specific questions on it :) http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf Thanks for your help and early response ! I shall definitely study this and ping you as and when doubts start arising in my mind. :-) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/