[PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow

2021-02-08 Thread Subbaraman Narayanamurthy
Currently, when handling the SPMI summary interrupt, the hw_irq
number is calculated based on SID, Peripheral ID, IRQ index and
APID. This is then passed to irq_find_mapping() to see if a
mapping exists for this hw_irq and if available, invoke the
interrupt handler. Since the IRQ index uses an "int" type, hw_irq
which is of unsigned long data type can take a large value when
SID has its MSB set to 1 and the type conversion happens. Because
of this, irq_find_mapping() returns 0 as there is no mapping
for this hw_irq. This ends up invoking cleanup_irq() as if
the interrupt is spurious whereas it is actually a valid
interrupt. Fix this by using the proper data type (u32) for id.

Cc: sta...@vger.kernel.org
Signed-off-by: Subbaraman Narayanamurthy 
---
 drivers/spmi/spmi-pmic-arb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index de844b4..bbbd311 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights 
reserved.
  */
 #include 
 #include 
@@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 
apid, int id)
 static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid)
 {
unsigned int irq;
-   u32 status;
-   int id;
+   u32 status, id;
u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF;
u8 per = pmic_arb->apid_data[apid].ppid & 0xFF;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[RESEND PATCH] nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer

2021-02-03 Thread Subbaraman Narayanamurthy
"sdam->pdev" is uninitialized and it is used to print error logs.
Fix it. Since device pointer can be used from sdam_config, use it
directly thereby removing pdev pointer.

Cc: sta...@vger.kernel.org
Signed-off-by: Subbaraman Narayanamurthy 
---
 drivers/nvmem/qcom-spmi-sdam.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/nvmem/qcom-spmi-sdam.c b/drivers/nvmem/qcom-spmi-sdam.c
index a72704c..f6e9f96 100644
--- a/drivers/nvmem/qcom-spmi-sdam.c
+++ b/drivers/nvmem/qcom-spmi-sdam.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2017, 2020 The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017, 2020-2021, The Linux Foundation. All rights reserved.
  */
 
 #include 
@@ -18,7 +18,6 @@
 #define SDAM_PBS_TRIG_CLR  0xE6
 
 struct sdam_chip {
-   struct platform_device  *pdev;
struct regmap   *regmap;
struct nvmem_config sdam_config;
unsigned intbase;
@@ -65,7 +64,7 @@ static int sdam_read(void *priv, unsigned int offset, void 
*val,
size_t bytes)
 {
struct sdam_chip *sdam = priv;
-   struct device *dev = >pdev->dev;
+   struct device *dev = sdam->sdam_config.dev;
int rc;
 
if (!sdam_is_valid(sdam, offset, bytes)) {
@@ -86,7 +85,7 @@ static int sdam_write(void *priv, unsigned int offset, void 
*val,
size_t bytes)
 {
struct sdam_chip *sdam = priv;
-   struct device *dev = >pdev->dev;
+   struct device *dev = sdam->sdam_config.dev;
int rc;
 
if (!sdam_is_valid(sdam, offset, bytes)) {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[RESEND PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow

2021-01-29 Thread Subbaraman Narayanamurthy
Currently, when handling the SPMI summary interrupt, the hw_irq
number is calculated based on SID, Peripheral ID, IRQ index and
APID. This is then passed to irq_find_mapping() to see if a
mapping exists for this hw_irq and if available, invoke the
interrupt handler. Since the IRQ index uses an "int" type, hw_irq
which is of unsigned long data type can take a large value when
SID has its MSB set to 1 and the type conversion happens. Because
of this, irq_find_mapping() returns 0 as there is no mapping
for this hw_irq. This ends up invoking cleanup_irq() as if
the interrupt is spurious whereas it is actually a valid
interrupt. Fix this by using the proper data type (u32) for id.

Cc: sta...@vger.kernel.org
Signed-off-by: Subbaraman Narayanamurthy 
---
 drivers/spmi/spmi-pmic-arb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index de844b4..bbbd311 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights 
reserved.
  */
 #include 
 #include 
@@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 
apid, int id)
 static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid)
 {
unsigned int irq;
-   u32 status;
-   int id;
+   u32 status, id;
u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF;
u8 per = pmic_arb->apid_data[apid].ppid & 0xFF;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointer

2021-01-26 Thread Subbaraman Narayanamurthy
"sdam->pdev" is uninitialized and it is used to print error logs.
Fix it. Since device pointer can be used from sdam_config, use it
directly thereby removing pdev pointer.

Signed-off-by: Subbaraman Narayanamurthy 
---
---
 drivers/nvmem/qcom-spmi-sdam.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/nvmem/qcom-spmi-sdam.c b/drivers/nvmem/qcom-spmi-sdam.c
index a72704c..f6e9f96 100644
--- a/drivers/nvmem/qcom-spmi-sdam.c
+++ b/drivers/nvmem/qcom-spmi-sdam.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2017, 2020 The Linux Foundation. All rights reserved.
+ * Copyright (c) 2017, 2020-2021, The Linux Foundation. All rights reserved.
  */
 
 #include 
@@ -18,7 +18,6 @@
 #define SDAM_PBS_TRIG_CLR  0xE6
 
 struct sdam_chip {
-   struct platform_device  *pdev;
struct regmap   *regmap;
struct nvmem_config sdam_config;
unsigned intbase;
@@ -65,7 +64,7 @@ static int sdam_read(void *priv, unsigned int offset, void 
*val,
size_t bytes)
 {
struct sdam_chip *sdam = priv;
-   struct device *dev = >pdev->dev;
+   struct device *dev = sdam->sdam_config.dev;
int rc;
 
if (!sdam_is_valid(sdam, offset, bytes)) {
@@ -86,7 +85,7 @@ static int sdam_write(void *priv, unsigned int offset, void 
*val,
size_t bytes)
 {
struct sdam_chip *sdam = priv;
-   struct device *dev = >pdev->dev;
+   struct device *dev = sdam->sdam_config.dev;
int rc;
 
if (!sdam_is_valid(sdam, offset, bytes)) {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] spmi: spmi-pmic-arb: Fix hw_irq overflow

2021-01-22 Thread Subbaraman Narayanamurthy
Currently, when handling the SPMI summary interrupt, the hw_irq
number is calculated based on SID, Peripheral ID, IRQ index and
APID. This is then passed to irq_find_mapping() to see if a
mapping exists for this hw_irq and if available, invoke the
interrupt handler. Since the IRQ index uses an "int" type, hw_irq
which is of unsigned long data type can take a large value when
SID has its MSB set to 1 and the type conversion happens. Because
of this, irq_find_mapping() returns 0 as there is no mapping
for this hw_irq. This ends up invoking cleanup_irq() as if
the interrupt is spurious whereas it is actually a valid
interrupt. Fix this by using the proper data type (u32) for id.

Signed-off-by: Subbaraman Narayanamurthy 
---
 drivers/spmi/spmi-pmic-arb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index de844b4..bbbd311 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /*
- * Copyright (c) 2012-2015, 2017, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2012-2015, 2017, 2021, The Linux Foundation. All rights 
reserved.
  */
 #include 
 #include 
@@ -505,8 +505,7 @@ static void cleanup_irq(struct spmi_pmic_arb *pmic_arb, u16 
apid, int id)
 static void periph_interrupt(struct spmi_pmic_arb *pmic_arb, u16 apid)
 {
unsigned int irq;
-   u32 status;
-   int id;
+   u32 status, id;
u8 sid = (pmic_arb->apid_data[apid].ppid >> 8) & 0xF;
u8 per = pmic_arb->apid_data[apid].ppid & 0xFF;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Subbaraman Narayanamurthy

On 06/25/14 17:43, Thomas Gleixner wrote:

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

--> premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0
I understood the explanation above. But still I don't understand how 
this premature wakeup of T2 is happening/possible? Also, what will 
happen if the task state is not in TASK_PARKED when __kthread_unpark is 
called? __kthread_bind will fail silently causing the same problem.

Reorder the logic so that the unplug code binds the thread to the
target cpu before clearing the KTHREAD_SHOULD_PARK bit.

Reported-by: Subbaraman Narayanamurthy
Signed-off-by: Thomas Gleixner
Cc:sta...@vger.kernel.org

---
  kernel/kthread.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/kthread.c
===
--- linux.orig/kernel/kthread.c
+++ linux/kernel/kthread.c
@@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
  
  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)

  {
+   /*
+* Rebind the thread to the target cpu first if it is a per
+* cpu thread unconditionally because it must be bound to the
+* target cpu before it can observe the KTHREAD_SHOULD_PARK
+* bit cleared.
+*/
+   if (test_bit(KTHREAD_IS_PER_CPU, >flags))
+   __kthread_bind(k, kthread->cpu, TASK_PARKED);
+
clear_bit(KTHREAD_SHOULD_PARK, >flags);
/*
 * We clear the IS_PARKED bit here as we don't wait
@@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
 * park before that happens we'd see the IS_PARKED bit
 * which might be about to be cleared.
 */
-   if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags)) {
-   if (test_bit(KTHREAD_IS_PER_CPU, >flags))
-   __kthread_bind(k, kthread->cpu, TASK_PARKED);
+   if (test_and_clear_bit(KTHREAD_IS_PARKED, >flags))
wake_up_state(k, TASK_PARKED);
-   }
  }
  
  /**







Thanks for the patch. I've tested (running hotplug tests) it for 
sometime and looks good so far. Can you please submit it?


--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kthread: Fix the race condition when kthread is parked

2014-06-26 Thread Subbaraman Narayanamurthy

On 06/25/14 17:43, Thomas Gleixner wrote:

The kthread park/unpark logic has the following issue:

Task   CPU 0CPU 1

T1 unplug cpu1
kthread_park(T2)
set_bit(KTHREAD_SHOULD_PARK);
  wait_for_completion()
T2  parkme(X)
  __set_current_state(TASK_PARKED);
  while (test_bit(KTHREAD_SHOULD_PARK)) 
{
if 
(!test_and_set_bit(KTHREAD_IS_PARKED))
  complete();
schedule();
T1   plug cpu1

-- premature wakeup of T2, i.e. before unpark, so T2 gets scheduled on
 CPU 0
I understood the explanation above. But still I don't understand how 
this premature wakeup of T2 is happening/possible? Also, what will 
happen if the task state is not in TASK_PARKED when __kthread_unpark is 
called? __kthread_bind will fail silently causing the same problem.

Reorder the logic so that the unplug code binds the thread to the
target cpu before clearing the KTHREAD_SHOULD_PARK bit.

Reported-by: Subbaraman Narayanamurthysubba...@codeaurora.org
Signed-off-by: Thomas Gleixnert...@linutronix.de
Cc:sta...@vger.kernel.org

---
  kernel/kthread.c |   14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux/kernel/kthread.c
===
--- linux.orig/kernel/kthread.c
+++ linux/kernel/kthread.c
@@ -382,6 +382,15 @@ struct task_struct *kthread_create_on_cp
  
  static void __kthread_unpark(struct task_struct *k, struct kthread *kthread)

  {
+   /*
+* Rebind the thread to the target cpu first if it is a per
+* cpu thread unconditionally because it must be bound to the
+* target cpu before it can observe the KTHREAD_SHOULD_PARK
+* bit cleared.
+*/
+   if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
+   __kthread_bind(k, kthread-cpu, TASK_PARKED);
+
clear_bit(KTHREAD_SHOULD_PARK, kthread-flags);
/*
 * We clear the IS_PARKED bit here as we don't wait
@@ -389,11 +398,8 @@ static void __kthread_unpark(struct task
 * park before that happens we'd see the IS_PARKED bit
 * which might be about to be cleared.
 */
-   if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags)) {
-   if (test_bit(KTHREAD_IS_PER_CPU, kthread-flags))
-   __kthread_bind(k, kthread-cpu, TASK_PARKED);
+   if (test_and_clear_bit(KTHREAD_IS_PARKED, kthread-flags))
wake_up_state(k, TASK_PARKED);
-   }
  }
  
  /**







Thanks for the patch. I've tested (running hotplug tests) it for 
sometime and looks good so far. Can you please submit it?


--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Subbaraman Narayanamurthy

While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 
3.10.0-g3677c61-8-g180c060 #1

[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : []lr : [] psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

This issue was always seen in the context of "ksoftirqd". It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting for the
parked completion.

Signed-off-by: Subbaraman Narayanamurthy 
---
 kernel/kthread.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..c56c6f8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k)
 if (k != current) {
 wake_up_process(k);
 wait_for_completion(>parked);
+while (k->state != TASK_PARKED)
+cond_resched();
 }
 }
 ret = 0;
--
1.8.2.1

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kthread: Fix the race condition when kthread is parked

2014-06-25 Thread Subbaraman Narayanamurthy

While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW O 
3.10.0-g3677c61-8-g180c060 #1

[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : [c01931e8]lr : [c01931e0] psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

This issue was always seen in the context of ksoftirqd. It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting for the
parked completion.

Signed-off-by: Subbaraman Narayanamurthy subba...@codeaurora.org
---
 kernel/kthread.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 760e86d..c56c6f8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -398,6 +398,8 @@ int kthread_park(struct task_struct *k)
 if (k != current) {
 wake_up_process(k);
 wait_for_completion(kthread-parked);
+while (k-state != TASK_PARKED)
+cond_resched();
 }
 }
 ret = 0;
--
1.8.2.1

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at kernel/smpboot.c:134

2014-06-23 Thread Subbaraman Narayanamurthy

Hi,
While stressing the CPU hotplug path, sometimes we hit the problem as shown below. Kernel 
is based off 3.10 and has the commit "f2530dc71cf082" already.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW  O
3.10.0-g3677c61-8-g180c060 #1
[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : []lr : []psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

Flow of events looks like below.

ksoftirqd/2   migration/2 cpu_up task
--   --   

smpboot_thread_fn()
 kthread_parkme()
  complete(>parked)
   spin_unlock_irq()
preempt_schedule()
__schedule()
 migrate_tasks(2)
 
  
__kthread_unpark(ksoftirqd/2)
   
test_and_clear_bit(KTHREAD_IS_PARKED,>flags)
   
__kthread_bind(k,kthread->cpu,TASK_PARKED);
  schedule()
   
wake_up_state(k,TASK_PARKED);
  __set_current_state(TASK_PARKED);
  clear_bit(KTHREAD_IS_PARKED, >flags);
  __set_current_state(TASK_RUNNING);
 ...
 set_current_state(TASK_INTERRUPTIBLE);
 preempt_disable();
 ...
 BUG_ON(td->cpu != smp_processor_id())

While debugging with adding a BUG_ON() in remote_cpu_softirq_notify (for CPU_DEAD 
action), at a particular instance,I could confirm that the "ksoftirqd" (for the 
CPU which is bought down) is not in parked state (512) but in running state (0).

If the thread is not in parked state when the CPU is bought up again, then 
__kthread_bind() can fail making the kthread to run on a wrong CPU. Is it 
possible that this can happen because of the following potential race condition?

In __kthread_parkme, just after completing the parked completion, before the 
ksoftirqd task has been scheduled again, it can go into running state because 
it got woken up by the wake_up_process() from kthread_park().

Thanks,
Subbaraman

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel BUG at kernel/smpboot.c:134

2014-06-23 Thread Subbaraman Narayanamurthy

Hi,
While stressing the CPU hotplug path, sometimes we hit the problem as shown below. Kernel 
is based off 3.10 and has the commit f2530dc71cf082 already.

[57056.416774] [ cut here ]
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a 0a00 (e7f001f2)
[57056.489259] [ cut here ]
[57056.492840] kernel BUG at kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: GW  O
3.10.0-g3677c61-8-g180c060 #1
[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : [c01931e8]lr : [c01931e0]psr: 200f0013
[57056.547528] sp : f0e79f30  ip :   fp : 
[57056.558983] r10: 0001  r9 :   r8 : f0e78000
[57056.564192] r7 : 0001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 0001  r2 : f0e79f20  r1 :   r0 : 

Flow of events looks like below.

ksoftirqd/2   migration/2 cpu_up task
--   --   

smpboot_thread_fn()
 kthread_parkme()
  complete(self-parked)
   spin_unlock_irq()
preempt_schedule()
__schedule()
 migrate_tasks(2)
 migrate to CPU 0
  
__kthread_unpark(ksoftirqd/2)
   
test_and_clear_bit(KTHREAD_IS_PARKED,kthread-flags)
   
__kthread_bind(k,kthread-cpu,TASK_PARKED);
  schedule()
   
wake_up_state(k,TASK_PARKED);
  __set_current_state(TASK_PARKED);
  clear_bit(KTHREAD_IS_PARKED, self-flags);
  __set_current_state(TASK_RUNNING);
 ...
 set_current_state(TASK_INTERRUPTIBLE);
 preempt_disable();
 ...
 BUG_ON(td-cpu != smp_processor_id())

While debugging with adding a BUG_ON() in remote_cpu_softirq_notify (for CPU_DEAD 
action), at a particular instance,I could confirm that the ksoftirqd (for the 
CPU which is bought down) is not in parked state (512) but in running state (0).

If the thread is not in parked state when the CPU is bought up again, then 
__kthread_bind() can fail making the kthread to run on a wrong CPU. Is it 
possible that this can happen because of the following potential race condition?

In __kthread_parkme, just after completing the parked completion, before the 
ksoftirqd task has been scheduled again, it can go into running state because 
it got woken up by the wake_up_process() from kthread_park().

Thanks,
Subbaraman

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/