[PATCH 1/1] powerpc: get_wchan(): solve possible race scenario due to parallel wakeup

2016-04-19 Thread Kautuk Consul
Add a check for p->state == TASK_RUNNING so that any wake-ups on
task_struct p in the interim lead to 0 being returned by get_wchan().

Signed-off-by: Kautuk Consul <kautuk.consul.1...@gmail.com>
---
 arch/powerpc/kernel/process.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b8500b4..f233352 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1785,7 +1785,8 @@ unsigned long get_wchan(struct task_struct *p)
 
do {
sp = *(unsigned long *)sp;
-   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
+   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) ||
+   p->state == TASK_RUNNING)
return 0;
if (count > 0) {
ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE];
-- 
1.7.9.5



[PATCH 1/1] powerpc: get_wchan(): solve possible race scenario due to parallel wakeup

2016-04-19 Thread Kautuk Consul
Add a check for p->state == TASK_RUNNING so that any wake-ups on
task_struct p in the interim lead to 0 being returned by get_wchan().

Signed-off-by: Kautuk Consul 
---
 arch/powerpc/kernel/process.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b8500b4..f233352 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1785,7 +1785,8 @@ unsigned long get_wchan(struct task_struct *p)
 
do {
sp = *(unsigned long *)sp;
-   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
+   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) ||
+   p->state == TASK_RUNNING)
return 0;
if (count > 0) {
ip = ((unsigned long *)sp)[STACK_FRAME_LR_SAVE];
-- 
1.7.9.5



Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()

2014-08-25 Thread Kautuk Consul
Sorry folks,

I got one thing wrong:
>From some more code review, both __down_common() and
do_wait_for_common() inspect the signal_pending() only while in
TASK_RUNNING.

So I think that it cannot be possible that this happened on my system
due to __down_common() and/or wait_for_common().

Which only leaves out the possibility that the BUG() on my embedded
system happened due to a driver which was trying to implement its own
sleeping primitive
by first setting the task state to TASK_INTERRUPTIBLE and then
checking for signal_pending/signal_pending_state.
(But this is anyway generally frowned upon and not really acceptable nowadays).

I'll review those drivers for this kind of mistake.


On Mon, Aug 25, 2014 at 9:27 PM, Oleg Nesterov  wrote:
> On 08/25, Kautuk Consul wrote:
>>
>> I encountered a BUG() scenario within do_exit() on an ARM system.
>>
>> The problem is due to a race scenario between do_exit() and try_to_wake_up()
>> on different CPUs due to usage of sleeping primitives such as __down_common
>> and wait_for_common.
>>
>> Race Scenario
>> =
>>
>> Let us assume there are 2 CPUs A and B execute code in the following order:
>> 1)CPU A was running in user-mode and enters kernel mode via some
>>   syscall/exception handler.
>> 2)CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via 
>> __down_common
>>   or wait_for_common.
>> 3)CPU A checks for signal_pending() and returns due to TIF_SIGPENDING
>>   being set in t's threadinfo due to a previous signal(say SIGKILL) being
>>   received on this task t.
>> 4)CPU A returns returns back to the assembly trap handler and calls
>>   do_work_pending() -> do_signal() -> get_signal() -> do_group_exit()
>>-> do_exit()
>>   CPU A  has not yet executed the following line of code before the final
>>   call to schedule:
>> /* causes final put_task_struct in finish_task_switch(). */
>> tsk->state = TASK_DEAD;
>> 5)CPU B tries to send a signal to task t (currently executing on CPU A)
>>   and thus enters: signal_wake_up_state() -> wake_up_state() ->
>>try_to_wake_up()
>> 6)CPU B executes all code in try_to_wake_up() till the call to
>>   ttwu_queue -> ttwu_do_activate -> ttwu_do_wakeup().
>>   CPU B has still not executed the following code in ttwu_do_wakeup():
>>   p->state = TASK_RUNNING;
>> 7)CPU A executes the following line of code:
>> /* causes final put_task_struct in finish_task_switch(). */
>> tsk->state = TASK_DEAD;
>> 8)CPU B executes the following code in ttwu_do_wakeup():
>>   p->state = TASK_RUNNING;
>> 9)CPU A continues to the call to do_exit() -> schedule().
>>   Since the tsk->state is TASK_RUNNING, the call to schedule() returns 
>> and
>>   do_exit() -> BUG() is hit on CPU A.
>>
>> Alternate Solution
>> ==
>>
>> An alternate solution would be to simply set the current task state to
>> TASK_RUNNING in __down_common(), wait_for_common() and all other 
>> interruptible
>> sleeping primitives in their if(signal_pending/signal_pending_state) 
>> conditional
>> blocks.
>>
>> But this change seems to me to be more logical because:
>> i)This will involve lesser changes to the kernel core code.
>> ii)   Any further sleeping primitives in the kernel also need not 
>> suffer from
>>   this kind of race scenario.
>>
>> Signed-off-by: Kautuk Consul 
>> ---
>>  kernel/exit.c |   10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index 32c58f7..69a8231 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -824,14 +824,16 @@ void do_exit(long code)
>>* (or hypervisor of virtual machine switches to other guest)
>>*  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
>>*
>> -  * To avoid it, we have to wait for releasing tsk->pi_lock which
>> -  * is held by try_to_wake_up()
>> +  * To solve this, we have to compete for tsk->pi_lock which is held by
>> +  * try_to_wake_up().
>>*/
>> - smp_mb();
>> - raw_spin_unlock_wait(>pi_lock);
>> + raw_spin_lock(>pi_lock);
>>
>>   /* causes final put_task_struct in finish_task_switch(). */
>>   tsk->state = TASK_DEAD;
>> +
>> + raw_spin_unlock(>pi_lock);
>> +
>>   ts

[PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()

2014-08-25 Thread Kautuk Consul
I encountered a BUG() scenario within do_exit() on an ARM system.

The problem is due to a race scenario between do_exit() and try_to_wake_up()
on different CPUs due to usage of sleeping primitives such as __down_common
and wait_for_common.

Race Scenario
=

Let us assume there are 2 CPUs A and B execute code in the following order:
1)  CPU A was running in user-mode and enters kernel mode via some
syscall/exception handler.
2)  CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via 
__down_common
or wait_for_common.
3)  CPU A checks for signal_pending() and returns due to TIF_SIGPENDING
being set in t's threadinfo due to a previous signal(say SIGKILL) being
received on this task t.
4)  CPU A returns returns back to the assembly trap handler and calls
do_work_pending() -> do_signal() -> get_signal() -> do_group_exit()
 -> do_exit()
CPU A  has not yet executed the following line of code before the final
call to schedule:
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
5)  CPU B tries to send a signal to task t (currently executing on CPU A)
and thus enters: signal_wake_up_state() -> wake_up_state() ->
 try_to_wake_up()
6)  CPU B executes all code in try_to_wake_up() till the call to
ttwu_queue -> ttwu_do_activate -> ttwu_do_wakeup().
CPU B has still not executed the following code in ttwu_do_wakeup():
p->state = TASK_RUNNING;
7)  CPU A executes the following line of code:
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
8)  CPU B executes the following code in ttwu_do_wakeup():
p->state = TASK_RUNNING;
9)  CPU A continues to the call to do_exit() -> schedule().
Since the tsk->state is TASK_RUNNING, the call to schedule() returns and
do_exit() -> BUG() is hit on CPU A.

Alternate Solution
==

An alternate solution would be to simply set the current task state to
TASK_RUNNING in __down_common(), wait_for_common() and all other interruptible
sleeping primitives in their if(signal_pending/signal_pending_state) conditional
blocks.

But this change seems to me to be more logical because:
i)  This will involve lesser changes to the kernel core code.
ii) Any further sleeping primitives in the kernel also need not 
suffer from
    this kind of race scenario.

Signed-off-by: Kautuk Consul 
---
 kernel/exit.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 32c58f7..69a8231 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -824,14 +824,16 @@ void do_exit(long code)
 * (or hypervisor of virtual machine switches to other guest)
 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
 *
-* To avoid it, we have to wait for releasing tsk->pi_lock which
-* is held by try_to_wake_up()
+* To solve this, we have to compete for tsk->pi_lock which is held by
+* try_to_wake_up().
 */
-   smp_mb();
-   raw_spin_unlock_wait(>pi_lock);
+   raw_spin_lock(>pi_lock);
 
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
+
+   raw_spin_unlock(>pi_lock);
+
tsk->flags |= PF_NOFREEZE;  /* tell freezer to ignore us */
schedule();
BUG();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()

2014-08-25 Thread Kautuk Consul
I encountered a BUG() scenario within do_exit() on an ARM system.

The problem is due to a race scenario between do_exit() and try_to_wake_up()
on different CPUs due to usage of sleeping primitives such as __down_common
and wait_for_common.

Race Scenario
=

Let us assume there are 2 CPUs A and B execute code in the following order:
1)  CPU A was running in user-mode and enters kernel mode via some
syscall/exception handler.
2)  CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via 
__down_common
or wait_for_common.
3)  CPU A checks for signal_pending() and returns due to TIF_SIGPENDING
being set in t's threadinfo due to a previous signal(say SIGKILL) being
received on this task t.
4)  CPU A returns returns back to the assembly trap handler and calls
do_work_pending() - do_signal() - get_signal() - do_group_exit()
 - do_exit()
CPU A  has not yet executed the following line of code before the final
call to schedule:
/* causes final put_task_struct in finish_task_switch(). */
tsk-state = TASK_DEAD;
5)  CPU B tries to send a signal to task t (currently executing on CPU A)
and thus enters: signal_wake_up_state() - wake_up_state() -
 try_to_wake_up()
6)  CPU B executes all code in try_to_wake_up() till the call to
ttwu_queue - ttwu_do_activate - ttwu_do_wakeup().
CPU B has still not executed the following code in ttwu_do_wakeup():
p-state = TASK_RUNNING;
7)  CPU A executes the following line of code:
/* causes final put_task_struct in finish_task_switch(). */
tsk-state = TASK_DEAD;
8)  CPU B executes the following code in ttwu_do_wakeup():
p-state = TASK_RUNNING;
9)  CPU A continues to the call to do_exit() - schedule().
Since the tsk-state is TASK_RUNNING, the call to schedule() returns and
do_exit() - BUG() is hit on CPU A.

Alternate Solution
==

An alternate solution would be to simply set the current task state to
TASK_RUNNING in __down_common(), wait_for_common() and all other interruptible
sleeping primitives in their if(signal_pending/signal_pending_state) conditional
blocks.

But this change seems to me to be more logical because:
i)  This will involve lesser changes to the kernel core code.
ii) Any further sleeping primitives in the kernel also need not 
suffer from
this kind of race scenario.

Signed-off-by: Kautuk Consul consul.kau...@gmail.com
---
 kernel/exit.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 32c58f7..69a8231 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -824,14 +824,16 @@ void do_exit(long code)
 * (or hypervisor of virtual machine switches to other guest)
 *  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
 *
-* To avoid it, we have to wait for releasing tsk-pi_lock which
-* is held by try_to_wake_up()
+* To solve this, we have to compete for tsk-pi_lock which is held by
+* try_to_wake_up().
 */
-   smp_mb();
-   raw_spin_unlock_wait(tsk-pi_lock);
+   raw_spin_lock(tsk-pi_lock);
 
/* causes final put_task_struct in finish_task_switch(). */
tsk-state = TASK_DEAD;
+
+   raw_spin_unlock(tsk-pi_lock);
+
tsk-flags |= PF_NOFREEZE;  /* tell freezer to ignore us */
schedule();
BUG();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race with try_to_wake_up()

2014-08-25 Thread Kautuk Consul
Sorry folks,

I got one thing wrong:
From some more code review, both __down_common() and
do_wait_for_common() inspect the signal_pending() only while in
TASK_RUNNING.

So I think that it cannot be possible that this happened on my system
due to __down_common() and/or wait_for_common().

Which only leaves out the possibility that the BUG() on my embedded
system happened due to a driver which was trying to implement its own
sleeping primitive
by first setting the task state to TASK_INTERRUPTIBLE and then
checking for signal_pending/signal_pending_state.
(But this is anyway generally frowned upon and not really acceptable nowadays).

I'll review those drivers for this kind of mistake.


On Mon, Aug 25, 2014 at 9:27 PM, Oleg Nesterov o...@redhat.com wrote:
 On 08/25, Kautuk Consul wrote:

 I encountered a BUG() scenario within do_exit() on an ARM system.

 The problem is due to a race scenario between do_exit() and try_to_wake_up()
 on different CPUs due to usage of sleeping primitives such as __down_common
 and wait_for_common.

 Race Scenario
 =

 Let us assume there are 2 CPUs A and B execute code in the following order:
 1)CPU A was running in user-mode and enters kernel mode via some
   syscall/exception handler.
 2)CPU A sets the current task(t) state to TASK_INTERRUPTIBLE via 
 __down_common
   or wait_for_common.
 3)CPU A checks for signal_pending() and returns due to TIF_SIGPENDING
   being set in t's threadinfo due to a previous signal(say SIGKILL) being
   received on this task t.
 4)CPU A returns returns back to the assembly trap handler and calls
   do_work_pending() - do_signal() - get_signal() - do_group_exit()
- do_exit()
   CPU A  has not yet executed the following line of code before the final
   call to schedule:
 /* causes final put_task_struct in finish_task_switch(). */
 tsk-state = TASK_DEAD;
 5)CPU B tries to send a signal to task t (currently executing on CPU A)
   and thus enters: signal_wake_up_state() - wake_up_state() -
try_to_wake_up()
 6)CPU B executes all code in try_to_wake_up() till the call to
   ttwu_queue - ttwu_do_activate - ttwu_do_wakeup().
   CPU B has still not executed the following code in ttwu_do_wakeup():
   p-state = TASK_RUNNING;
 7)CPU A executes the following line of code:
 /* causes final put_task_struct in finish_task_switch(). */
 tsk-state = TASK_DEAD;
 8)CPU B executes the following code in ttwu_do_wakeup():
   p-state = TASK_RUNNING;
 9)CPU A continues to the call to do_exit() - schedule().
   Since the tsk-state is TASK_RUNNING, the call to schedule() returns 
 and
   do_exit() - BUG() is hit on CPU A.

 Alternate Solution
 ==

 An alternate solution would be to simply set the current task state to
 TASK_RUNNING in __down_common(), wait_for_common() and all other 
 interruptible
 sleeping primitives in their if(signal_pending/signal_pending_state) 
 conditional
 blocks.

 But this change seems to me to be more logical because:
 i)This will involve lesser changes to the kernel core code.
 ii)   Any further sleeping primitives in the kernel also need not 
 suffer from
   this kind of race scenario.

 Signed-off-by: Kautuk Consul consul.kau...@gmail.com
 ---
  kernel/exit.c |   10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

 diff --git a/kernel/exit.c b/kernel/exit.c
 index 32c58f7..69a8231 100644
 --- a/kernel/exit.c
 +++ b/kernel/exit.c
 @@ -824,14 +824,16 @@ void do_exit(long code)
* (or hypervisor of virtual machine switches to other guest)
*  As a result, we may become TASK_RUNNING after becoming TASK_DEAD
*
 -  * To avoid it, we have to wait for releasing tsk-pi_lock which
 -  * is held by try_to_wake_up()
 +  * To solve this, we have to compete for tsk-pi_lock which is held by
 +  * try_to_wake_up().
*/
 - smp_mb();
 - raw_spin_unlock_wait(tsk-pi_lock);
 + raw_spin_lock(tsk-pi_lock);

   /* causes final put_task_struct in finish_task_switch(). */
   tsk-state = TASK_DEAD;
 +
 + raw_spin_unlock(tsk-pi_lock);
 +
   tsk-flags |= PF_NOFREEZE;  /* tell freezer to ignore us */
   schedule();
   BUG();
 --

 Peter, do you remember another problem with TASK_DEAD we discussed recently?
 (prev_state == TASK_DEAD detection in finish_task_switch() still looks racy).

 I am starting to think that perhaps we need something like below, what do
 you all think?

 Oleg.

 --- x/kernel/sched/core.c
 +++ x/kernel/sched/core.c
 @@ -2205,9 +2205,10 @@ static void finish_task_switch(struct rq
 __releases(rq-lock)
  {
 struct mm_struct *mm = rq-prev_mm;
 -   long prev_state;
 +   struct task_struct *dead = rq-dead;

 rq-prev_mm = NULL;
 +   rq-dead = NULL;

 /*
  * A task struct has one reference

Re: [QUERY]: Understanding the calculations in mm/page-writeback.c

2012-07-24 Thread Kautuk Consul
>
> Here is the slides I used in LinuxCon Japan 2012, please feel free to
> ask more specific questions on it :)
>
> http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf
>

Thanks for your help and early response !
I shall definitely study this and ping you as and when doubts start
arising in my mind. :-)

> Thanks,
> Fengguang
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [QUERY]: Understanding the calculations in mm/page-writeback.c

2012-07-24 Thread Kautuk Consul

 Here is the slides I used in LinuxCon Japan 2012, please feel free to
 ask more specific questions on it :)

 http://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf


Thanks for your help and early response !
I shall definitely study this and ping you as and when doubts start
arising in my mind. :-)

 Thanks,
 Fengguang

 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/