On Thursday, June 26, 2014 8:50:01 AM UTC+8, Thomas Gleixner wrote:
> On Wed, 25 Jun 2014, Subbaraman Narayanamurthy wrote:
> > While stressing the CPU hotplug path, sometimes we hit a problem
> > as shown below.
> >
> > [57056.416774] ------------[ cut here ]------------
> > [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
> > [57056.489245] Code: e594a000 eb085236 e15a0000 0a000000 (e7f001f2)
> > [57056.489259] ------------[ cut here ]------------
> > [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
> > [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> > [57056.519055] Modules linked in: wlan(O) mhi(O)
> > [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: G        W O
> > 3.10.0-g3677c61-00008-g180c060 #1
> > [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
> > [57056.537991] PC is at smpboot_thread_fn+0x124/0x218
> > [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
> > [57056.547528] pc : [<c01931e8>]    lr : [<c01931e0>] psr: 200f0013
> > [57056.547528] sp : f0e79f30  ip : 00000000  fp : 00000000
> > [57056.558983] r10: 00000001  r9 : 00000000  r8 : f0e78000
> > [57056.564192] r7 : 00000001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 > > [57056.570701] r3 : 00000001 r2 : f0e79f20 r1 : 00000000 r0 : 00000000
> >
> > This issue was always seen in the context of "ksoftirqd". It seems to
> > be happening because of a potential race condition in __kthread_parkme
> > where just after completing the parked completion, before the
> > ksoftirqd task has been scheduled again, it can go into running state.
>
> This explanation does not make any sense. You completely fail to
> explain the details of the race. And your patch does not make any
> sense either, because the real issue is this:
>
> Task   CPU 0                               CPU 1
>
> T1     unplug cpu1
>        kthread_park(T2)
>        set_bit(KTHREAD_SHOULD_PARK);
>      wait_for_completion()
> T2                                 parkme(X)
>                                      __set_current_state(TASK_PARKED);
>                                      while (test_bit(KTHREAD_SHOULD_PARK)) {
>                                        if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
>                                          complete();
>                                        schedule();
> T1   plug cpu1
>
> --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
>     CPU 0
>
> T2    __set_current_state(TASK_PARKED);
>
> --> Preemption by the plug thread
>
> T1     thread_unpark(T2)
>          clear_bit(KTHREAD_SHOULD_PARK);
>
> --> Preemption by the softirq thread which breaks out of the
>     while(test_bit(KTHREAD_SHOULD_PARK)) loop because
>     KTHREAD_SHOULD_PARK is not longer set.
>
> T2   }
>     clear_bit(KTHREAD_IS_PARKED);
>
> --> Now T2 happily continues to run on CPU0 which rightfully casues
>     the BUG to trigger.
>
> T1
>    __kthread_bind(T2)
>
> --> Too late ....
>
> So the real issue is that the park/unpark code is not able to handle
> the premature wakeup of T2 and that needs to be fixed.
>
> Your changelog says:
>
> > It seems to be happening because of a potential race condition in
>
> Potential is wrong to begin with. A race condition is either real and
> explainable or it does not exist.
>
> > __kthread_parkme where just after completing the parked completion,
> > before the ksoftirqd task has been scheduled again, it can go into
> > running state.
>
> What exactly has this to do with state RUNNING or PARKED?
>
>   Nothing, the task state is completely irrelevant as the real issue
>   is the task->*PARK flags state.
>
> So what is your patch solving here ?
>
>   You put a wait for task->state == TASK_PARKED after the
>   wait_for_completion. What does it solve? Actually nothing. It just
>   changes the propability of that issue. Go and apply it between any
>   step of the above and figure out what it solves. Nothing, really.
>
>   Now just as an extra thought experiment assume that you have only
>   two cpus and T1 is a SCHED_FIFO task and T2 is SCHED_OTHER ....
>
> Please do not misunderstand me, but "fixing" races without proper
> understanding them is plain wrong. Providing a vague changelog which
> does neither explain what the issue is and why the fix works is even
> more wrong.
>
> The next time you hit something like this, please take the time and
> sit down, get out the old fashioned piece of paper and a pencil and
> draw the picture so you can actually understand what the root cause of
> the observed issue is before sending out halfarsed duct tape fixes
> which just paper over the root cause. If you cannot figure it out,
> send a proper bug report.
>
> Thanks,
>
>    tglx
> ------------------>
>
> Subject: kthread: Plug park/ unplug race
> From: Thomas Gleixner <t...@linutronix.de>
> Date: Thu, 26 Jun 2014 01:24:36 +0200
>
> The kthread park/unpark logic has the following issue:
>
> Task   CPU 0                               CPU 1
>
> T1     unplug cpu1
>        kthread_park(T2)
>        set_bit(KTHREAD_SHOULD_PARK);
>      wait_for_completion()
> T2                                 parkme(X)
>                                      __set_current_state(TASK_PARKED);
>                                      while (test_bit(KTHREAD_SHOULD_PARK)) {
>                                        if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
>                                          complete();
>                                        schedule();
> T1   plug cpu1
>
> --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
>     CPU 0
>
> T2    __set_current_state(TASK_PARKED);
>
> --> Preemption by the plug thread
>
> T1     thread_unpark(T2)
>          clear_bit(KTHREAD_SHOULD_PARK);
>
> --> Preemption by the softirq thread which breaks out of the
>     while(test_bit(KTHREAD_SHOULD_PARK)) loop because
>     KTHREAD_SHOULD_PARK is not longer set.
>
> T2   }
>     clear_bit(KTHREAD_IS_PARKED);
>
> --> Now T2 happily continues to run on CPU0 which rightfully causes
>     the BUG_ON(T2->cpu != smp_processor_id()) to trigger.
>
> T1
>    __kthread_bind(T2)
>
> --> Too late ....
>
> Reorder the logic so that the unplug code binds the thread to the
> target cpu before clearing the KTHREAD_SHOULD_PARK bit.
>
> Reported-by: Subbaraman Narayanamurthy <subba...@codeaurora.org>
> Signed-off-by: Thomas Gleixner <t...@linutronix.de>
> Cc: sta...@vger.kernel.org
>
> ---
>  kernel/kthread.c |   14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> Index: linux/kernel/kthread.c
> ===================================================================
> --- linux.orig/kernel/kthread.c
> +++ linux/kernel/kthread.c
> @@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
>
> static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)
>  {
> +  /*
> +   * Rebind the thread to the target cpu first if it is a per
> +   * cpu thread unconditionally because it must be bound to the
> +   * target cpu before it can observe the KTHREAD_SHOULD_PARK
> +   * bit cleared.
> +   */
> +  if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
> +          __kthread_bind(k, kthread->cpu, TASK_PARKED);
> +
>    clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>    /*
>     * We clear the IS_PARKED bit here as we don't wait
> @@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
>     * park before that happens we'd see the IS_PARKED bit
>     * which might be about to be cleared.
>     */
> -  if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags)) {
> -          if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
> -                  __kthread_bind(k, kthread->cpu, TASK_PARKED);
> +  if (test_and_clear_bit(KTHREAD_IS_PARKED, &kthread->flags))
>            wake_up_state(k, TASK_PARKED);
> -  }
>  }
>
>  /**

I just got a window to test this, and it reliably addresses the boot-time core onlining race that we've seen occasionally on a 2000-core customer system. Splendid work, Thomas.

Tested-by: Daniel J Blueman <dan...@numascale.com>

Many thanks,
  Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to