Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-11-03 Thread Thomas Gleixner
On Sun, 2 Nov 2014, Daniel J Blueman wrote:
> I just got a window to test this, and it reliably addresses the boot-time core
> onlining race that we've seen occasionally on a 2000-core customer system.
> Splendid work, Thomas.
> 
> Tested-by: Daniel J Blueman 

Well, you forgot to read my follow up mail on this:

  https://lkml.org/lkml/2014/6/26/554

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-11-03 Thread Thomas Gleixner
On Sun, 2 Nov 2014, Daniel J Blueman wrote:
 I just got a window to test this, and it reliably addresses the boot-time core
 onlining race that we've seen occasionally on a 2000-core customer system.
 Splendid work, Thomas.
 
 Tested-by: Daniel J Blueman dan...@numascale.com

Well, you forgot to read my follow up mail on this:

  https://lkml.org/lkml/2014/6/26/554

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-11-02 Thread Daniel J Blueman

On Thursday, June 26, 2014 8:50:01 AM UTC+8, Thomas Gleixner wrote:
> On Wed, 25 Jun 2014, Subbaraman Narayanamurthy wrote:
> > While stressing the CPU hotplug path, sometimes we hit a problem
> > as shown below.
> >
> > [57056.416774] [ cut here ]
> > [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
> > [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
> > [57056.489259] [ cut here ]
> > [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
> > [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> > [57056.519055] Modules linked in: wlan(O) mhi(O)
> > [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O
> > 3.10.0-g3677c61-8-g180c060 #1
> > [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
> > [57056.537991] PC is at smpboot_thread_fn+0x124/0x218
> > [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
> > [57056.547528] pc : []lr : [] psr: 200f0013
> > [57056.547528] sp : f0e79f30  ip :   fp : 
> > [57056.558983] r10: 0001  r9 :   r8 : f0e78000
> > [57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : 
f0e5fd00
> > [57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 


> >
> > This issue was always seen in the context of "ksoftirqd". It seems to
> > be happening because of a potential race condition in __kthread_parkme
> > where just after completing the parked completion, before the
> > ksoftirqd task has been scheduled again, it can go into running state.
>
> This explanation does not make any sense. You completely fail to
> explain the details of the race. And your patch does not make any
> sense either, because the real issue is this:
>
> Task   CPU 0   CPU 1
>
> T1 unplug cpu1
>kthread_park(T2)
>set_bit(KTHREAD_SHOULD_PARK);
>  wait_for_completion()
> T2 parkme(X)
>  __set_current_state(TASK_PARKED);
>  while (test_bit(KTHREAD_SHOULD_PARK)) {
>if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
>  complete();
>schedule();
> T1   plug cpu1
>
> --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
> CPU 0
>
> T2__set_current_state(TASK_PARKED);
>
> --> Preemption by the plug thread
>
> T1 thread_unpark(T2)
>  clear_bit(KTHREAD_SHOULD_PARK);
>
> --> Preemption by the softirq thread which breaks out of the
> while(test_bit(KTHREAD_SHOULD_PARK)) loop because
> KTHREAD_SHOULD_PARK is not longer set.
>
> T2   }
> clear_bit(KTHREAD_IS_PARKED);
>
> --> Now T2 happily continues to run on CPU0 which rightfully casues
> the BUG to trigger.
>
> T1
>__kthread_bind(T2)
>
> --> Too late 
>
> So the real issue is that the park/unpark code is not able to handle
> the premature wakeup of T2 and that needs to be fixed.
>
> Your changelog says:
>
> > It seems to be happening because of a potential race condition in
>
> Potential is wrong to begin with. A race condition is either real and
> explainable or it does not exist.
>
> > __kthread_parkme where just after completing the parked completion,
> > before the ksoftirqd task has been scheduled again, it can go into
> > running state.
>
> What exactly has this to do with state RUNNING or PARKED?
>
>   Nothing, the task state is completely irrelevant as the real issue
>   is the task->*PARK flags state.
>
> So what is your patch solving here ?
>
>   You put a wait for task->state == TASK_PARKED after the
>   wait_for_completion. What does it solve? Actually nothing. It just
>   changes the propability of that issue. Go and apply it between any
>   step of the above and figure out what it solves. Nothing, really.
>
>   Now just as an extra thought experiment assume that you have only
>   two cpus and T1 is a SCHED_FIFO task and T2 is SCHED_OTHER 
>
> Please do not misunderstand me, but "fixing" races without proper
> understanding them is plain wrong. Providing a vague changelog which
> does neither explain what the issue is and why the fix works is even
> more wrong.
>
> The next time you hit something like this, please take the time and
> sit down, get out the old fashioned piece of paper and a pencil and
> draw the picture so you can actually understand what the root cause of
> the observed issue is before sending out halfarsed duct tape fixes
> which just paper over the root cause. If you cannot figure it out,
> send a proper bug report.
>
> Thanks,
>
>tglx
> -->
>
> Subject: kthread: Plug park/ unplug race
> From: Thomas Gleixner 
> Date: Thu, 26 Jun 2014 01:24:36 +0200
>
> The kthread park/unpark logic has the following issue:
>
> Task   CPU 0   CPU 1
>
> T1 unplug cpu1
>

Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-11-02 Thread Daniel J Blueman

On Thursday, June 26, 2014 8:50:01 AM UTC+8, Thomas Gleixner wrote:
 On Wed, 25 Jun 2014, Subbaraman Narayanamurthy wrote:
  While stressing the CPU hotplug path, sometimes we hit a problem
  as shown below.
 
  [57056.416774] [ cut here ]
  [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
  [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
  [57056.489259] [ cut here ]
  [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
  [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
  [57056.519055] Modules linked in: wlan(O) mhi(O)
  [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O
  3.10.0-g3677c61-8-g180c060 #1
  [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
  [57056.537991] PC is at smpboot_thread_fn+0x124/0x218
  [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
  [57056.547528] pc : [c01931e8]lr : [c01931e0] psr: 200f0013
  [57056.547528] sp : f0e79f30  ip :   fp : 
  [57056.558983] r10: 0001  r9 :   r8 : f0e78000
  [57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : 
f0e5fd00
  [57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 


 
  This issue was always seen in the context of ksoftirqd. It seems to
  be happening because of a potential race condition in __kthread_parkme
  where just after completing the parked completion, before the
  ksoftirqd task has been scheduled again, it can go into running state.

 This explanation does not make any sense. You completely fail to
 explain the details of the race. And your patch does not make any
 sense either, because the real issue is this:

 Task   CPU 0   CPU 1

 T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
 T2 parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) {
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
 T1   plug cpu1

 -- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0

 T2__set_current_state(TASK_PARKED);

 -- Preemption by the plug thread

 T1 thread_unpark(T2)
  clear_bit(KTHREAD_SHOULD_PARK);

 -- Preemption by the softirq thread which breaks out of the
 while(test_bit(KTHREAD_SHOULD_PARK)) loop because
 KTHREAD_SHOULD_PARK is not longer set.

 T2   }
 clear_bit(KTHREAD_IS_PARKED);

 -- Now T2 happily continues to run on CPU0 which rightfully casues
 the BUG to trigger.

 T1
__kthread_bind(T2)

 -- Too late 

 So the real issue is that the park/unpark code is not able to handle
 the premature wakeup of T2 and that needs to be fixed.

 Your changelog says:

  It seems to be happening because of a potential race condition in

 Potential is wrong to begin with. A race condition is either real and
 explainable or it does not exist.

  __kthread_parkme where just after completing the parked completion,
  before the ksoftirqd task has been scheduled again, it can go into
  running state.

 What exactly has this to do with state RUNNING or PARKED?

   Nothing, the task state is completely irrelevant as the real issue
   is the task-*PARK flags state.

 So what is your patch solving here ?

   You put a wait for task-state == TASK_PARKED after the
   wait_for_completion. What does it solve? Actually nothing. It just
   changes the propability of that issue. Go and apply it between any
   step of the above and figure out what it solves. Nothing, really.

   Now just as an extra thought experiment assume that you have only
   two cpus and T1 is a SCHED_FIFO task and T2 is SCHED_OTHER 

 Please do not misunderstand me, but fixing races without proper
 understanding them is plain wrong. Providing a vague changelog which
 does neither explain what the issue is and why the fix works is even
 more wrong.

 The next time you hit something like this, please take the time and
 sit down, get out the old fashioned piece of paper and a pencil and
 draw the picture so you can actually understand what the root cause of
 the observed issue is before sending out halfarsed duct tape fixes
 which just paper over the root cause. If you cannot figure it out,
 send a proper bug report.

 Thanks,

tglx
 --

 Subject: kthread: Plug park/ unplug race
 From: Thomas Gleixner t...@linutronix.de
 Date: Thu, 26 Jun 2014 01:24:36 +0200

 The kthread park/unpark logic has the following issue:

 Task   CPU 0   CPU 1

 T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
 T2 parkme(X)
 

Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Thomas Gleixner
On Thu, 26 Jun 2014, Subbaraman Narayanamurthy wrote:
> On 06/25/14 17:43, Thomas Gleixner wrote:
> > The kthread park/unpark logic has the following issue:
> > 
> > Task   CPU 0CPU 1
> > 
> > T1 unplug cpu1
> > kthread_park(T2)
> > set_bit(KTHREAD_SHOULD_PARK);
> >   wait_for_completion()
> > T2  parkme(X)
> >   __set_current_state(TASK_PARKED);
> >   while
> > (test_bit(KTHREAD_SHOULD_PARK)) {
> > if
> > (!test_and_set_bit(KTHREAD_IS_PARKED))
> >   complete();
> > schedule();
> > T1   plug cpu1
> > 
> > --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
> >  CPU 0

> I understood the explanation above. But still I don't understand how this
> premature wakeup of T2 is happening/possible?

Come on. You have a machine which reproduces the issue. So some
moderate tracing should tell you that ...

Without using my lost crystal ball, I bet that it's a premature per
cpu timer interrupt.

> Also, what will happen if the task state is not in TASK_PARKED when
> __kthread_unpark is called?  __kthread_bind will fail silently
> causing the same problem.

Right you are, but thinking more about it:

Nothing is supposed to wakeup a parked thread except the unpark
machinery. So the real question is: What causes the premature wakeup?

Darn, I should have thought about that before, but you tricked my
overloaded brain into believing that this is a real issue.

No, it's not.

The parked state is not any different from creating a new kthread,
advertise the thread to possible wakers and then do the bind.

So yes, the code is fine and the BUG_ON() is rightfully asserting
here.

> Thanks for the patch. I've tested (running hotplug tests) it for sometime and
> looks good so far. Can you please submit it?

So you have a legitimate question about the correctness of the patch
and then you ask me to apply it?

Again, we do not apply patches which "fix" an issue just because we do
not observe it anymore. We apply them when the problem at hand is
fully understood and the solution solves all aspects.

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Subbaraman Narayanamurthy

On 06/25/14 17:43, Thomas Gleixner wrote:

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

--> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0
I understood the explanation above. But still I don't understand how 
this premature wakeup of T2 is happening/possible? Also, what will 
happen if the task state is not in TASK_PARKED when __kthread_unpark is 
called? __kthread_bind will fail silently causing the same problem.

Reorder the logic so that the unplug code binds the thread to the
target cpu before clearing the KTHREAD_SHOULD_PARK bit.

Reported-by: Subbaraman Narayanamurthy
Signed-off-by: Thomas Gleixner
Cc:sta...@vger.kernel.org

---
  kernel/kthread.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/kthread.c
===
--- linux.orig/kernel/kthread.c
+++ linux/kernel/kthread.c
@@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
  
  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)

  {
+   /*
+* Rebind the thread to the target cpu first if it is a per
+* cpu thread unconditionally because it must be bound to the
+* target cpu before it can observe the KTHREAD_SHOULD_PARK
+* bit cleared.
+*/
+   if (test_bit(KTHREAD_IS_PER_CPU, >flags))
+   __kthread_bind(k, kthread->cpu, TASK_PARKED);
+
clear_bit(KTHREAD_SHOULD_PARK, >flags);
/*
 * We clear the IS_PARKED bit here as we don't wait
@@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
 * park before that happens we'd see the IS_PARKED bit
 * which might be about to be cleared.
 */
-   if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags)) {
-   if (test_bit(KTHREAD_IS_PER_CPU, >flags))
-   __kthread_bind(k, kthread->cpu, TASK_PARKED);
+   if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags))
wake_up_state(k, TASK_PARKED);
-   }
  }
  
  /**







Thanks for the patch. I've tested (running hotplug tests) it for 
sometime and looks good so far. Can you please submit it?


--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Subbaraman Narayanamurthy

On 06/25/14 17:43, Thomas Gleixner wrote:

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

-- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0
I understood the explanation above. But still I don't understand how 
this premature wakeup of T2 is happening/possible? Also, what will 
happen if the task state is not in TASK_PARKED when __kthread_unpark is 
called? __kthread_bind will fail silently causing the same problem.

Reorder the logic so that the unplug code binds the thread to the
target cpu before clearing the KTHREAD_SHOULD_PARK bit.

Reported-by: Subbaraman Narayanamurthysubba...@codeaurora.org
Signed-off-by: Thomas Gleixnert...@linutronix.de
Cc:sta...@vger.kernel.org

---
  kernel/kthread.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/kthread.c
===
--- linux.orig/kernel/kthread.c
+++ linux/kernel/kthread.c
@@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
  
  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)

  {
+   /*
+* Rebind the thread to the target cpu first if it is a per
+* cpu thread unconditionally because it must be bound to the
+* target cpu before it can observe the KTHREAD_SHOULD_PARK
+* bit cleared.
+*/
+   if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
+   __kthread_bind(k, kthread-cpu, TASK_PARKED);
+
clear_bit(KTHREAD_SHOULD_PARK, kthread-flags);
/*
 * We clear the IS_PARKED bit here as we don't wait
@@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
 * park before that happens we'd see the IS_PARKED bit
 * which might be about to be cleared.
 */
-   if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags)) {
-   if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
-   __kthread_bind(k, kthread-cpu, TASK_PARKED);
+   if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags))
wake_up_state(k, TASK_PARKED);
-   }
  }
  
  /**







Thanks for the patch. I've tested (running hotplug tests) it for 
sometime and looks good so far. Can you please submit it?


--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Thomas Gleixner
On Thu, 26 Jun 2014, Subbaraman Narayanamurthy wrote:
 On 06/25/14 17:43, Thomas Gleixner wrote:
  The kthread park/unpark logic has the following issue:
  
  Task   CPU 0CPU 1
  
  T1 unplug cpu1
  kthread_park(T2)
  set_bit(KTHREAD_SHOULD_PARK);
wait_for_completion()
  T2  parkme(X)
__set_current_state(TASK_PARKED);
while
  (test_bit(KTHREAD_SHOULD_PARK)) {
  if
  (!test_and_set_bit(KTHREAD_IS_PARKED))
complete();
  schedule();
  T1   plug cpu1
  
  -- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
   CPU 0

 I understood the explanation above. But still I don't understand how this
 premature wakeup of T2 is happening/possible?

Come on. You have a machine which reproduces the issue. So some
moderate tracing should tell you that ...

Without using my lost crystal ball, I bet that it's a premature per
cpu timer interrupt.

 Also, what will happen if the task state is not in TASK_PARKED when
 __kthread_unpark is called?  __kthread_bind will fail silently
 causing the same problem.

Right you are, but thinking more about it:

Nothing is supposed to wakeup a parked thread except the unpark
machinery. So the real question is: What causes the premature wakeup?

Darn, I should have thought about that before, but you tricked my
overloaded brain into believing that this is a real issue.

No, it's not.

The parked state is not any different from creating a new kthread,
advertise the thread to possible wakers and then do the bind.

So yes, the code is fine and the BUG_ON() is rightfully asserting
here.

 Thanks for the patch. I've tested (running hotplug tests) it for sometime and
 looks good so far. Can you please submit it?

So you have a legitimate question about the correctness of the patch
and then you ask me to apply it?

Again, we do not apply patches which fix an issue just because we do
not observe it anymore. We apply them when the problem at hand is
fully understood and the solution solves all aspects.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Steven Rostedt
On Wed, Jun 25, 2014 at 10:00:22PM -0400, Steven Rostedt wrote:
> On Thu, Jun 26, 2014 at 02:43:56AM +0200, Thomas Gleixner wrote:
> > 
> > Subject: kthread: Plug park/ unplug race
> > From: Thomas Gleixner 
> > Date: Thu, 26 Jun 2014 01:24:36 +0200
> > 
> > The kthread park/unpark logic has the following issue:
> > 
> > Task   CPU 0CPU 1
> > 
> > T1 unplug cpu1
> >kthread_park(T2)
> >set_bit(KTHREAD_SHOULD_PARK);
> >   wait_for_completion()
> > T2  parkme(X)
> 
> But with your patch, isn't it possible for T1 to call thread_unpark here?

Let me answer that No, it can't.

I missed the wait_for_completion() above, which will prevent this from 
happening.

Nevermind, I'll go work on something less brain intensive.

-- Steve

> 
> Then looking at the code I see this turn of events:
> 
>   if (test_bit(KTHREAD_IS_PER_CPU, >flags))
> __kthread_bind(k, kthread->cpu, TASK_PARKED);
> 
> Which in __kthread_bind() (state == TASK_PARKED)
> 
>   if (!wait_task_inactive(p, state)) {
>   WARN_ON(1);
>   return;
>   }
> 
> Where wait_task_inactive() does:
> 
>   while (task_running(rq, p)) {
>   if (match_state && unlikely(p->state != match_state))
>   return 0;
> 
> As match_state is non zero and p->state != match_state because it hasn't been
> set yet. The wait_task_inactive() returns zero, and then we hit the WARN_ON()
> in __kthread_bind().
> 
> -- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Steven Rostedt
On Thu, Jun 26, 2014 at 02:43:56AM +0200, Thomas Gleixner wrote:
> 
> Subject: kthread: Plug park/ unplug race
> From: Thomas Gleixner 
> Date: Thu, 26 Jun 2014 01:24:36 +0200
> 
> The kthread park/unpark logic has the following issue:
> 
> Task   CPU 0  CPU 1
> 
> T1 unplug cpu1
>kthread_park(T2)
>set_bit(KTHREAD_SHOULD_PARK);
> wait_for_completion()
> T2parkme(X)

But with your patch, isn't it possible for T1 to call thread_unpark here?

Then looking at the code I see this turn of events:

  if (test_bit(KTHREAD_IS_PER_CPU, >flags))
__kthread_bind(k, kthread->cpu, TASK_PARKED);

Which in __kthread_bind() (state == TASK_PARKED)

if (!wait_task_inactive(p, state)) {
WARN_ON(1);
return;
}

Where wait_task_inactive() does:

while (task_running(rq, p)) {
if (match_state && unlikely(p->state != match_state))
return 0;

As match_state is non zero and p->state != match_state because it hasn't been
set yet. The wait_task_inactive() returns zero, and then we hit the WARN_ON()
in __kthread_bind().

-- Steve



> __set_current_state(TASK_PARKED);
> while (test_bit(KTHREAD_SHOULD_PARK)) 
> {
>   if 
> (!test_and_set_bit(KTHREAD_IS_PARKED))
> complete();
>   schedule();
> T1   plug cpu1
> 
> --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
> CPU 0
> 
> T2__set_current_state(TASK_PARKED);
> 
> --> Preemption by the plug thread
> 
> T1 thread_unpark(T2)
>  clear_bit(KTHREAD_SHOULD_PARK);
> 
> --> Preemption by the softirq thread which breaks out of the
> while(test_bit(KTHREAD_SHOULD_PARK)) loop because
> KTHREAD_SHOULD_PARK is not longer set.
> 
> T2   }
> clear_bit(KTHREAD_IS_PARKED);
> 
> --> Now T2 happily continues to run on CPU0 which rightfully causes
> the BUG_ON(T2->cpu != smp_processor_id()) to trigger.
> 
> T1
>   __kthread_bind(T2)
> 
> --> Too late 
> 
> Reorder the logic so that the unplug code binds the thread to the
> target cpu before clearing the KTHREAD_SHOULD_PARK bit.
> 
> Reported-by: Subbaraman Narayanamurthy 
> Signed-off-by: Thomas Gleixner 
> Cc: sta...@vger.kernel.org
> 
> ---
>  kernel/kthread.c |   14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> Index: linux/kernel/kthread.c
> ===
> --- linux.orig/kernel/kthread.c
> +++ linux/kernel/kthread.c
> @@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
>  
>  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
>  {
> + /*
> +  * Rebind the thread to the target cpu first if it is a per
> +  * cpu thread unconditionally because it must be bound to the
> +  * target cpu before it can observe the KTHREAD_SHOULD_PARK
> +  * bit cleared.
> +  */
> + if (test_bit(KTHREAD_IS_PER_CPU, >flags))
> + __kthread_bind(k, kthread->cpu, TASK_PARKED);
> +
>   clear_bit(KTHREAD_SHOULD_PARK, >flags);
>   /*
>* We clear the IS_PARKED bit here as we don't wait
> @@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
>* park before that happens we'd see the IS_PARKED bit
>* which might be about to be cleared.
>*/
> - if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags)) {
> - if (test_bit(KTHREAD_IS_PER_CPU, >flags))
> - __kthread_bind(k, kthread->cpu, TASK_PARKED);
> + if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags))
>   wake_up_state(k, TASK_PARKED);
> - }
>  }
>  
>  /**
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Thomas Gleixner
On Wed, 25 Jun 2014, Subbaraman Narayanamurthy wrote:

> While stressing the CPU hotplug path, sometimes we hit a problem
> as shown below.
> 
> [57056.416774] [ cut here ]
> [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
> [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
> [57056.489259] [ cut here ]
> [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
> [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [57056.519055] Modules linked in: wlan(O) mhi(O)
> [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O
> 3.10.0-g3677c61-8-g180c060 #1
> [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
> [57056.537991] PC is at smpboot_thread_fn+0x124/0x218
> [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
> [57056.547528] pc : []lr : [] psr: 200f0013
> [57056.547528] sp : f0e79f30  ip :   fp : 
> [57056.558983] r10: 0001  r9 :   r8 : f0e78000
> [57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
> [57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 
> 
> This issue was always seen in the context of "ksoftirqd". It seems to
> be happening because of a potential race condition in __kthread_parkme
> where just after completing the parked completion, before the
> ksoftirqd task has been scheduled again, it can go into running state.

This explanation does not make any sense. You completely fail to
explain the details of the race. And your patch does not make any
sense either, because the real issue is this:

Task   CPU 0CPU 1

T1 unplug cpu1
   kthread_park(T2)
   set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

--> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
CPU 0

T2__set_current_state(TASK_PARKED);

--> Preemption by the plug thread

T1 thread_unpark(T2)
 clear_bit(KTHREAD_SHOULD_PARK);

--> Preemption by the softirq thread which breaks out of the
while(test_bit(KTHREAD_SHOULD_PARK)) loop because
KTHREAD_SHOULD_PARK is not longer set.

T2   }
clear_bit(KTHREAD_IS_PARKED);

--> Now T2 happily continues to run on CPU0 which rightfully casues
the BUG to trigger.

T1
__kthread_bind(T2)

--> Too late 

So the real issue is that the park/unpark code is not able to handle
the premature wakeup of T2 and that needs to be fixed.

Your changelog says:

> It seems to be happening because of a potential race condition in

Potential is wrong to begin with. A race condition is either real and
explainable or it does not exist.

> __kthread_parkme where just after completing the parked completion,
> before the ksoftirqd task has been scheduled again, it can go into
> running state.

What exactly has this to do with state RUNNING or PARKED?

  Nothing, the task state is completely irrelevant as the real issue
  is the task->*PARK flags state.

So what is your patch solving here ?

  You put a wait for task->state == TASK_PARKED after the
  wait_for_completion. What does it solve? Actually nothing. It just
  changes the propability of that issue. Go and apply it between any
  step of the above and figure out what it solves. Nothing, really.

  Now just as an extra thought experiment assume that you have only
  two cpus and T1 is a SCHED_FIFO task and T2 is SCHED_OTHER 

Please do not misunderstand me, but "fixing" races without proper
understanding them is plain wrong. Providing a vague changelog which
does neither explain what the issue is and why the fix works is even
more wrong.

The next time you hit something like this, please take the time and
sit down, get out the old fashioned piece of paper and a pencil and
draw the picture so you can actually understand what the root cause of
the observed issue is before sending out halfarsed duct tape fixes
which just paper over the root cause. If you cannot figure it out,
send a proper bug report.

Thanks,

tglx
-->

Subject: kthread: Plug park/ unplug race
From: Thomas Gleixner 
Date: Thu, 26 Jun 2014 01:24:36 +0200

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
   kthread_park(T2)
   set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while 

[PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Subbaraman Narayanamurthy

While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 
3.10.0-g3677c61-8-g180c060 #1

[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : []lr : [] psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

This issue was always seen in the context of "ksoftirqd". It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting for the
parked completion.

Signed-off-by: Subbaraman Narayanamurthy 
---
 kernel/kthread.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..c56c6f8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k)
 if (k != current) {
 wake_up_process(k);
 wait_for_completion(>parked);
+while (k->state != TASK_PARKED)
+cond_resched();
 }
 }
 ret = 0;
--
1.8.2.1

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Subbaraman Narayanamurthy

While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 
3.10.0-g3677c61-8-g180c060 #1

[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : [c01931e8]lr : [c01931e0] psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

This issue was always seen in the context of ksoftirqd. It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting for the
parked completion.

Signed-off-by: Subbaraman Narayanamurthy subba...@codeaurora.org
---
 kernel/kthread.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..c56c6f8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k)
 if (k != current) {
 wake_up_process(k);
 wait_for_completion(kthread-parked);
+while (k-state != TASK_PARKED)
+cond_resched();
 }
 }
 ret = 0;
--
1.8.2.1

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Thomas Gleixner
On Wed, 25 Jun 2014, Subbaraman Narayanamurthy wrote:

 While stressing the CPU hotplug path, sometimes we hit a problem
 as shown below.
 
 [57056.416774] [ cut here ]
 [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
 [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
 [57056.489259] [ cut here ]
 [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
 [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
 [57056.519055] Modules linked in: wlan(O) mhi(O)
 [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O
 3.10.0-g3677c61-8-g180c060 #1
 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218
 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
 [57056.547528] pc : [c01931e8]lr : [c01931e0] psr: 200f0013
 [57056.547528] sp : f0e79f30  ip :   fp : 
 [57056.558983] r10: 0001  r9 :   r8 : f0e78000
 [57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
 [57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 
 
 This issue was always seen in the context of ksoftirqd. It seems to
 be happening because of a potential race condition in __kthread_parkme
 where just after completing the parked completion, before the
 ksoftirqd task has been scheduled again, it can go into running state.

This explanation does not make any sense. You completely fail to
explain the details of the race. And your patch does not make any
sense either, because the real issue is this:

Task   CPU 0CPU 1

T1 unplug cpu1
   kthread_park(T2)
   set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

-- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
CPU 0

T2__set_current_state(TASK_PARKED);

-- Preemption by the plug thread

T1 thread_unpark(T2)
 clear_bit(KTHREAD_SHOULD_PARK);

-- Preemption by the softirq thread which breaks out of the
while(test_bit(KTHREAD_SHOULD_PARK)) loop because
KTHREAD_SHOULD_PARK is not longer set.

T2   }
clear_bit(KTHREAD_IS_PARKED);

-- Now T2 happily continues to run on CPU0 which rightfully casues
the BUG to trigger.

T1
__kthread_bind(T2)

-- Too late 

So the real issue is that the park/unpark code is not able to handle
the premature wakeup of T2 and that needs to be fixed.

Your changelog says:

 It seems to be happening because of a potential race condition in

Potential is wrong to begin with. A race condition is either real and
explainable or it does not exist.

 __kthread_parkme where just after completing the parked completion,
 before the ksoftirqd task has been scheduled again, it can go into
 running state.

What exactly has this to do with state RUNNING or PARKED?

  Nothing, the task state is completely irrelevant as the real issue
  is the task-*PARK flags state.

So what is your patch solving here ?

  You put a wait for task-state == TASK_PARKED after the
  wait_for_completion. What does it solve? Actually nothing. It just
  changes the propability of that issue. Go and apply it between any
  step of the above and figure out what it solves. Nothing, really.

  Now just as an extra thought experiment assume that you have only
  two cpus and T1 is a SCHED_FIFO task and T2 is SCHED_OTHER 

Please do not misunderstand me, but fixing races without proper
understanding them is plain wrong. Providing a vague changelog which
does neither explain what the issue is and why the fix works is even
more wrong.

The next time you hit something like this, please take the time and
sit down, get out the old fashioned piece of paper and a pencil and
draw the picture so you can actually understand what the root cause of
the observed issue is before sending out halfarsed duct tape fixes
which just paper over the root cause. If you cannot figure it out,
send a proper bug report.

Thanks,

tglx
--

Subject: kthread: Plug park/ unplug race
From: Thomas Gleixner t...@linutronix.de
Date: Thu, 26 Jun 2014 01:24:36 +0200

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
   kthread_park(T2)
   set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while 

Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Steven Rostedt
On Thu, Jun 26, 2014 at 02:43:56AM +0200, Thomas Gleixner wrote:
 
 Subject: kthread: Plug park/ unplug race
 From: Thomas Gleixner t...@linutronix.de
 Date: Thu, 26 Jun 2014 01:24:36 +0200
 
 The kthread park/unpark logic has the following issue:
 
 Task   CPU 0  CPU 1
 
 T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
 wait_for_completion()
 T2parkme(X)

But with your patch, isn't it possible for T1 to call thread_unpark here?

Then looking at the code I see this turn of events:

  if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
__kthread_bind(k, kthread-cpu, TASK_PARKED);

Which in __kthread_bind() (state == TASK_PARKED)

if (!wait_task_inactive(p, state)) {
WARN_ON(1);
return;
}

Where wait_task_inactive() does:

while (task_running(rq, p)) {
if (match_state  unlikely(p-state != match_state))
return 0;

As match_state is non zero and p-state != match_state because it hasn't been
set yet. The wait_task_inactive() returns zero, and then we hit the WARN_ON()
in __kthread_bind().

-- Steve



 __set_current_state(TASK_PARKED);
 while (test_bit(KTHREAD_SHOULD_PARK)) 
 {
   if 
 (!test_and_set_bit(KTHREAD_IS_PARKED))
 complete();
   schedule();
 T1   plug cpu1
 
 -- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0
 
 T2__set_current_state(TASK_PARKED);
 
 -- Preemption by the plug thread
 
 T1 thread_unpark(T2)
  clear_bit(KTHREAD_SHOULD_PARK);
 
 -- Preemption by the softirq thread which breaks out of the
 while(test_bit(KTHREAD_SHOULD_PARK)) loop because
 KTHREAD_SHOULD_PARK is not longer set.
 
 T2   }
 clear_bit(KTHREAD_IS_PARKED);
 
 -- Now T2 happily continues to run on CPU0 which rightfully causes
 the BUG_ON(T2-cpu != smp_processor_id()) to trigger.
 
 T1
   __kthread_bind(T2)
 
 -- Too late 
 
 Reorder the logic so that the unplug code binds the thread to the
 target cpu before clearing the KTHREAD_SHOULD_PARK bit.
 
 Reported-by: Subbaraman Narayanamurthy subba...@codeaurora.org
 Signed-off-by: Thomas Gleixner t...@linutronix.de
 Cc: sta...@vger.kernel.org
 
 ---
  kernel/kthread.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)
 
 Index: linux/kernel/kthread.c
 ===
 --- linux.orig/kernel/kthread.c
 +++ linux/kernel/kthread.c
 @@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
  
  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
  {
 + /*
 +  * Rebind the thread to the target cpu first if it is a per
 +  * cpu thread unconditionally because it must be bound to the
 +  * target cpu before it can observe the KTHREAD_SHOULD_PARK
 +  * bit cleared.
 +  */
 + if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
 + __kthread_bind(k, kthread-cpu, TASK_PARKED);
 +
   clear_bit(KTHREAD_SHOULD_PARK, kthread-flags);
   /*
* We clear the IS_PARKED bit here as we don't wait
 @@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
* park before that happens we'd see the IS_PARKED bit
* which might be about to be cleared.
*/
 - if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags)) {
 - if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
 - __kthread_bind(k, kthread-cpu, TASK_PARKED);
 + if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags))
   wake_up_state(k, TASK_PARKED);
 - }
  }
  
  /**
 
 
 
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Steven Rostedt
On Wed, Jun 25, 2014 at 10:00:22PM -0400, Steven Rostedt wrote:
 On Thu, Jun 26, 2014 at 02:43:56AM +0200, Thomas Gleixner wrote:
  
  Subject: kthread: Plug park/ unplug race
  From: Thomas Gleixner t...@linutronix.de
  Date: Thu, 26 Jun 2014 01:24:36 +0200
  
  The kthread park/unpark logic has the following issue:
  
  Task   CPU 0CPU 1
  
  T1 unplug cpu1
 kthread_park(T2)
 set_bit(KTHREAD_SHOULD_PARK);
wait_for_completion()
  T2  parkme(X)
 
 But with your patch, isn't it possible for T1 to call thread_unpark here?

Let me answer that No, it can't.

I missed the wait_for_completion() above, which will prevent this from 
happening.

Nevermind, I'll go work on something less brain intensive.

-- Steve

 
 Then looking at the code I see this turn of events:
 
   if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
 __kthread_bind(k, kthread-cpu, TASK_PARKED);
 
 Which in __kthread_bind() (state == TASK_PARKED)
 
   if (!wait_task_inactive(p, state)) {
   WARN_ON(1);
   return;
   }
 
 Where wait_task_inactive() does:
 
   while (task_running(rq, p)) {
   if (match_state  unlikely(p-state != match_state))
   return 0;
 
 As match_state is non zero and p-state != match_state because it hasn't been
 set yet. The wait_task_inactive() returns zero, and then we hit the WARN_ON()
 in __kthread_bind().
 
 -- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/