[PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow
Currently, when handling the SPMI summary interrupt, the hw_irq number is calculated based on SID, Peripheral ID, IRQ index and APID. This is then passed to irq_find_mapping() to see if a mapping exists for this hw_irq and if available, invoke the interrupt handler. Since the IRQ index uses an "int" type, hw_irq which is of unsigned long data type can take a large value when SID has its MSB set to 1 and the type conversion happens. Because of this, irq_find_mapping() returns 0 as there is no mapping for this hw_irq. This ends up invoking cleanup_irq() as if the interrupt is spurious whereas it is actually a valid interrupt. Fix this by using the proper data type (u32) for id. Cc: sta...@vger.kernel.org Signed-off-by: Subbaraman Narayanamurthy --- drivers/spmi/spmi-pmic-arb.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c index de844b4..bbbd311 100644 --- a/drivers/spmi/spmi-pmic-arb.c +++ b/drivers/spmi/spmi-pmic-arb.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved. + * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights reserved. */ #include #include @@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 apid, int id) static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid) { unsigned int irq; - u32 status; - int id; + u32 status, id; u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF; u8 per = pmic_arb->apid_data[apid].ppid & 0xFF; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[RESEND PATCH] nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
"sdam->pdev" is uninitialized and it is used to print error logs. Fix it. Since device pointer can be used from sdam_config, use it directly thereby removing pdev pointer. Cc: sta...@vger.kernel.org Signed-off-by: Subbaraman Narayanamurthy --- drivers/nvmem/qcom-spmi-sdam.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/nvmem/qcom-spmi-sdam.c b/drivers/nvmem/qcom-spmi-sdam.c index a72704c..f6e9f96 100644 --- a/drivers/nvmem/qcom-spmi-sdam.c +++ b/drivers/nvmem/qcom-spmi-sdam.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2017, 2020 The Linux Foundation. All rights reserved. + * Copyright (c) 2017, 2020-2021, The Linux Foundation. All rights reserved. */ #include @@ -18,7 +18,6 @@ #define SDAM_PBS_TRIG_CLR 0xE6 struct sdam_chip { - struct platform_device *pdev; struct regmap *regmap; struct nvmem_config sdam_config; unsigned intbase; @@ -65,7 +64,7 @@ static int sdam_read(void *priv, unsigned int offset, void *val, size_t bytes) { struct sdam_chip *sdam = priv; - struct device *dev = >pdev->dev; + struct device *dev = sdam->sdam_config.dev; int rc; if (!sdam_is_valid(sdam, offset, bytes)) { @@ -86,7 +85,7 @@ static int sdam_write(void *priv, unsigned int offset, void *val, size_t bytes) { struct sdam_chip *sdam = priv; - struct device *dev = >pdev->dev; + struct device *dev = sdam->sdam_config.dev; int rc; if (!sdam_is_valid(sdam, offset, bytes)) { -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[RESEND PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow
Currently, when handling the SPMI summary interrupt, the hw_irq number is calculated based on SID, Peripheral ID, IRQ index and APID. This is then passed to irq_find_mapping() to see if a mapping exists for this hw_irq and if available, invoke the interrupt handler. Since the IRQ index uses an "int" type, hw_irq which is of unsigned long data type can take a large value when SID has its MSB set to 1 and the type conversion happens. Because of this, irq_find_mapping() returns 0 as there is no mapping for this hw_irq. This ends up invoking cleanup_irq() as if the interrupt is spurious whereas it is actually a valid interrupt. Fix this by using the proper data type (u32) for id. Cc: sta...@vger.kernel.org Signed-off-by: Subbaraman Narayanamurthy --- drivers/spmi/spmi-pmic-arb.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c index de844b4..bbbd311 100644 --- a/drivers/spmi/spmi-pmic-arb.c +++ b/drivers/spmi/spmi-pmic-arb.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved. + * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights reserved. */ #include #include @@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 apid, int id) static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid) { unsigned int irq; - u32 status; - int id; + u32 status, id; u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF; u8 per = pmic_arb->apid_data[apid].ppid & 0xFF; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH] nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer
"sdam->pdev" is uninitialized and it is used to print error logs. Fix it. Since device pointer can be used from sdam_config, use it directly thereby removing pdev pointer. Signed-off-by: Subbaraman Narayanamurthy --- --- drivers/nvmem/qcom-spmi-sdam.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/nvmem/qcom-spmi-sdam.c b/drivers/nvmem/qcom-spmi-sdam.c index a72704c..f6e9f96 100644 --- a/drivers/nvmem/qcom-spmi-sdam.c +++ b/drivers/nvmem/qcom-spmi-sdam.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2017, 2020 The Linux Foundation. All rights reserved. + * Copyright (c) 2017, 2020-2021, The Linux Foundation. All rights reserved. */ #include @@ -18,7 +18,6 @@ #define SDAM_PBS_TRIG_CLR 0xE6 struct sdam_chip { - struct platform_device *pdev; struct regmap *regmap; struct nvmem_config sdam_config; unsigned intbase; @@ -65,7 +64,7 @@ static int sdam_read(void *priv, unsigned int offset, void *val, size_t bytes) { struct sdam_chip *sdam = priv; - struct device *dev = >pdev->dev; + struct device *dev = sdam->sdam_config.dev; int rc; if (!sdam_is_valid(sdam, offset, bytes)) { @@ -86,7 +85,7 @@ static int sdam_write(void *priv, unsigned int offset, void *val, size_t bytes) { struct sdam_chip *sdam = priv; - struct device *dev = >pdev->dev; + struct device *dev = sdam->sdam_config.dev; int rc; if (!sdam_is_valid(sdam, offset, bytes)) { -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow
Currently, when handling the SPMI summary interrupt, the hw_irq number is calculated based on SID, Peripheral ID, IRQ index and APID. This is then passed to irq_find_mapping() to see if a mapping exists for this hw_irq and if available, invoke the interrupt handler. Since the IRQ index uses an "int" type, hw_irq which is of unsigned long data type can take a large value when SID has its MSB set to 1 and the type conversion happens. Because of this, irq_find_mapping() returns 0 as there is no mapping for this hw_irq. This ends up invoking cleanup_irq() as if the interrupt is spurious whereas it is actually a valid interrupt. Fix this by using the proper data type (u32) for id. Signed-off-by: Subbaraman Narayanamurthy --- drivers/spmi/spmi-pmic-arb.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c index de844b4..bbbd311 100644 --- a/drivers/spmi/spmi-pmic-arb.c +++ b/drivers/spmi/spmi-pmic-arb.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved. + * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights reserved. */ #include #include @@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 apid, int id) static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid) { unsigned int irq; - u32 status; - int id; + u32 status, id; u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF; u8 per = pmic_arb->apid_data[apid].ppid & 0xFF; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH] kthread: Fix the race condition when kthread is parked
On 06/25/14 17:43, Thomas Gleixner wrote: The kthread park/unpark logic has the following issue: Task CPU 0CPU 1 T1 unplug cpu1 kthread_park(T2) set_bit(KTHREAD_SHOULD_PARK); wait_for_completion() T2 parkme(X) __set_current_state(TASK_PARKED); while (test_bit(KTHREAD_SHOULD_PARK)) { if (!test_and_set_bit(KTHREAD_IS_PARKED)) complete(); schedule(); T1 plug cpu1 --> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on CPU 0 I understood the explanation above. But still I don't understand how this premature wakeup of T2 is happening/possible? Also, what will happen if the task state is not in TASK_PARKED when __kthread_unpark is called? __kthread_bind will fail silently causing the same problem. Reorder the logic so that the unplug code binds the thread to the target cpu before clearing the KTHREAD_SHOULD_PARK bit. Reported-by: Subbaraman Narayanamurthy Signed-off-by: Thomas Gleixner Cc:sta...@vger.kernel.org --- kernel/kthread.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) Index: linux/kernel/kthread.c === --- linux.orig/kernel/kthread.c +++ linux/kernel/kthread.c @@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp static void __kthread_unpark(struct task_struct *k, struct kthread *kthread) { + /* +* Rebind the thread to the target cpu first if it is a per +* cpu thread unconditionally because it must be bound to the +* target cpu before it can observe the KTHREAD_SHOULD_PARK +* bit cleared. +*/ + if (test_bit(KTHREAD_IS_PER_CPU, >flags)) + __kthread_bind(k, kthread->cpu, TASK_PARKED); + clear_bit(KTHREAD_SHOULD_PARK, >flags); /* * We clear the IS_PARKED bit here as we don't wait @@ -389,11 +398,8 @@ static void __kthread_unpark(struct task * park before that happens we'd see the IS_PARKED bit * which might be about to be cleared. */ - if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags)) { - if (test_bit(KTHREAD_IS_PER_CPU, >flags)) - __kthread_bind(k, kthread->cpu, TASK_PARKED); + if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags)) wake_up_state(k, TASK_PARKED); - } } /** Thanks for the patch. I've tested (running hotplug tests) it for sometime and looks good so far. Can you please submit it? -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread: Fix the race condition when kthread is parked
On 06/25/14 17:43, Thomas Gleixner wrote: The kthread park/unpark logic has the following issue: Task CPU 0CPU 1 T1 unplug cpu1 kthread_park(T2) set_bit(KTHREAD_SHOULD_PARK); wait_for_completion() T2 parkme(X) __set_current_state(TASK_PARKED); while (test_bit(KTHREAD_SHOULD_PARK)) { if (!test_and_set_bit(KTHREAD_IS_PARKED)) complete(); schedule(); T1 plug cpu1 -- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on CPU 0 I understood the explanation above. But still I don't understand how this premature wakeup of T2 is happening/possible? Also, what will happen if the task state is not in TASK_PARKED when __kthread_unpark is called? __kthread_bind will fail silently causing the same problem. Reorder the logic so that the unplug code binds the thread to the target cpu before clearing the KTHREAD_SHOULD_PARK bit. Reported-by: Subbaraman Narayanamurthysubba...@codeaurora.org Signed-off-by: Thomas Gleixnert...@linutronix.de Cc:sta...@vger.kernel.org --- kernel/kthread.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) Index: linux/kernel/kthread.c === --- linux.orig/kernel/kthread.c +++ linux/kernel/kthread.c @@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp static void __kthread_unpark(struct task_struct *k, struct kthread *kthread) { + /* +* Rebind the thread to the target cpu first if it is a per +* cpu thread unconditionally because it must be bound to the +* target cpu before it can observe the KTHREAD_SHOULD_PARK +* bit cleared. +*/ + if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags)) + __kthread_bind(k, kthread-cpu, TASK_PARKED); + clear_bit(KTHREAD_SHOULD_PARK, kthread-flags); /* * We clear the IS_PARKED bit here as we don't wait @@ -389,11 +398,8 @@ static void __kthread_unpark(struct task * park before that happens we'd see the IS_PARKED bit * which might be about to be cleared. */ - if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags)) { - if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags)) - __kthread_bind(k, kthread-cpu, TASK_PARKED); + if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags)) wake_up_state(k, TASK_PARKED); - } } /** Thanks for the patch. I've tested (running hotplug tests) it for sometime and looks good so far. Can you please submit it? -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kthread: Fix the race condition when kthread is parked
While stressing the CPU hotplug path, sometimes we hit a problem as shown below. [57056.416774] [ cut here ] [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8 [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2) [57056.489259] [ cut here ] [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134! [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [57056.519055] Modules linked in: wlan(O) mhi(O) [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 3.10.0-g3677c61-8-g180c060 #1 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218 [57056.547528] pc : []lr : [] psr: 200f0013 [57056.547528] sp : f0e79f30 ip : fp : [57056.558983] r10: 0001 r9 : r8 : f0e78000 [57056.564192] r7 : 0001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 [57056.570701] r3 : 0001 r2 : f0e79f20 r1 : r0 : This issue was always seen in the context of "ksoftirqd". It seems to be happening because of a potential race condition in __kthread_parkme where just after completing the parked completion, before the ksoftirqd task has been scheduled again, it can go into running state. Fix this by waiting for the task state to parked after waiting for the parked completion. Signed-off-by: Subbaraman Narayanamurthy --- kernel/kthread.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 760e86d..c56c6f8 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k) if (k != current) { wake_up_process(k); wait_for_completion(>parked); +while (k->state != TASK_PARKED) +cond_resched(); } } ret = 0; -- 1.8.2.1 -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kthread: Fix the race condition when kthread is parked
While stressing the CPU hotplug path, sometimes we hit a problem as shown below. [57056.416774] [ cut here ] [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8 [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2) [57056.489259] [ cut here ] [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134! [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [57056.519055] Modules linked in: wlan(O) mhi(O) [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 3.10.0-g3677c61-8-g180c060 #1 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218 [57056.547528] pc : [c01931e8]lr : [c01931e0] psr: 200f0013 [57056.547528] sp : f0e79f30 ip : fp : [57056.558983] r10: 0001 r9 : r8 : f0e78000 [57056.564192] r7 : 0001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 [57056.570701] r3 : 0001 r2 : f0e79f20 r1 : r0 : This issue was always seen in the context of ksoftirqd. It seems to be happening because of a potential race condition in __kthread_parkme where just after completing the parked completion, before the ksoftirqd task has been scheduled again, it can go into running state. Fix this by waiting for the task state to parked after waiting for the parked completion. Signed-off-by: Subbaraman Narayanamurthy subba...@codeaurora.org --- kernel/kthread.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kthread.c b/kernel/kthread.c index 760e86d..c56c6f8 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k) if (k != current) { wake_up_process(k); wait_for_completion(kthread-parked); +while (k-state != TASK_PARKED) +cond_resched(); } } ret = 0; -- 1.8.2.1 -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel BUG at kernel/smpboot.c:134
Hi, While stressing the CPU hotplug path, sometimes we hit the problem as shown below. Kernel is based off 3.10 and has the commit "f2530dc71cf082" already. [57056.416774] [ cut here ] [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8 [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2) [57056.489259] [ cut here ] [57056.492840] kernel BUG at kernel/smpboot.c:134! [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [57056.519055] Modules linked in: wlan(O) mhi(O) [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 3.10.0-g3677c61-8-g180c060 #1 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218 [57056.547528] pc : []lr : []psr: 200f0013 [57056.547528] sp : f0e79f30 ip : fp : [57056.558983] r10: 0001 r9 : r8 : f0e78000 [57056.564192] r7 : 0001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 [57056.570701] r3 : 0001 r2 : f0e79f20 r1 : r0 : Flow of events looks like below. ksoftirqd/2 migration/2 cpu_up task -- -- smpboot_thread_fn() kthread_parkme() complete(>parked) spin_unlock_irq() preempt_schedule() __schedule() migrate_tasks(2) __kthread_unpark(ksoftirqd/2) test_and_clear_bit(KTHREAD_IS_PARKED,>flags) __kthread_bind(k,kthread->cpu,TASK_PARKED); schedule() wake_up_state(k,TASK_PARKED); __set_current_state(TASK_PARKED); clear_bit(KTHREAD_IS_PARKED, >flags); __set_current_state(TASK_RUNNING); ... set_current_state(TASK_INTERRUPTIBLE); preempt_disable(); ... BUG_ON(td->cpu != smp_processor_id()) While debugging with adding a BUG_ON() in remote_cpu_softirq_notify (for CPU_DEAD action), at a particular instance,I could confirm that the "ksoftirqd" (for the CPU which is bought down) is not in parked state (512) but in running state (0). If the thread is not in parked state when the CPU is bought up again, then __kthread_bind() can fail making the kthread to run on a wrong CPU. Is it possible that this can happen because of the following potential race condition? In __kthread_parkme, just after completing the parked completion, before the ksoftirqd task has been scheduled again, it can go into running state because it got woken up by the wake_up_process() from kthread_park(). Thanks, Subbaraman -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel BUG at kernel/smpboot.c:134
Hi, While stressing the CPU hotplug path, sometimes we hit the problem as shown below. Kernel is based off 3.10 and has the commit f2530dc71cf082 already. [57056.416774] [ cut here ] [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8 [57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2) [57056.489259] [ cut here ] [57056.492840] kernel BUG at kernel/smpboot.c:134! [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [57056.519055] Modules linked in: wlan(O) mhi(O) [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 3.10.0-g3677c61-8-g180c060 #1 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218 [57056.547528] pc : [c01931e8]lr : [c01931e0]psr: 200f0013 [57056.547528] sp : f0e79f30 ip : fp : [57056.558983] r10: 0001 r9 : r8 : f0e78000 [57056.564192] r7 : 0001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 [57056.570701] r3 : 0001 r2 : f0e79f20 r1 : r0 : Flow of events looks like below. ksoftirqd/2 migration/2 cpu_up task -- -- smpboot_thread_fn() kthread_parkme() complete(self-parked) spin_unlock_irq() preempt_schedule() __schedule() migrate_tasks(2) migrate to CPU 0 __kthread_unpark(ksoftirqd/2) test_and_clear_bit(KTHREAD_IS_PARKED,kthread-flags) __kthread_bind(k,kthread-cpu,TASK_PARKED); schedule() wake_up_state(k,TASK_PARKED); __set_current_state(TASK_PARKED); clear_bit(KTHREAD_IS_PARKED, self-flags); __set_current_state(TASK_RUNNING); ... set_current_state(TASK_INTERRUPTIBLE); preempt_disable(); ... BUG_ON(td-cpu != smp_processor_id()) While debugging with adding a BUG_ON() in remote_cpu_softirq_notify (for CPU_DEAD action), at a particular instance,I could confirm that the ksoftirqd (for the CPU which is bought down) is not in parked state (512) but in running state (0). If the thread is not in parked state when the CPU is bought up again, then __kthread_bind() can fail making the kthread to run on a wrong CPU. Is it possible that this can happen because of the following potential race condition? In __kthread_parkme, just after completing the parked completion, before the ksoftirqd task has been scheduled again, it can go into running state because it got woken up by the wake_up_process() from kthread_park(). Thanks, Subbaraman -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/