Re: Panda ES board hang when using GPIO as interrupt

2012-06-29 Thread Jon Hunter

On 06/28/2012 11:07 PM, DebBarma, Tarun Kanti wrote:
> On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin  wrote:
>> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>>
>>> I am wondering if this could be the bug ... on start-up I see that we do
>>> a context restore on bank1 during the probe which is before we have done
>>> the first suspend! In other words, we could restore a bad/uninitialised
>>> context for bank1. In the case of bank1, the loss count starts at 1 and
>>> not 0 and so we falsely think we need to perform a restore :-(
>>>
>>> [0.176269] omap_gpio_runtime_resume: bank @ 0xfc31
>>> [0.177276] omap_gpio_runtime_resume: count 0, now 1
>>> [0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>>> [0.177642] omap_gpio_runtime_suspend: bank @ 0xfc31
>>>
>>> Can you try ...
>>>
>>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>>> index c4ed172..9623408 100644
>>> --- a/drivers/gpio/gpio-omap.c
>>> +++ b/drivers/gpio/gpio-omap.c
>>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>>> platform_device *pdev)
>>>  #ifdef CONFIG_OF_GPIO
>>> bank->chip.of_node = of_node_get(node);
>>>  #endif
>>> +   if (bank->get_context_loss_count)
>>> +   bank->context_loss_count =
>>> +   bank->get_context_loss_count(bank->dev);
>>>
>>> bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>> if (bank->irq_base < 0) {
>>>
>>
>> Looks like you found the culprit. :) It does fix the problem.
> So this looks similar to what NeilBrown  reported in
> another thread.
> The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in 
> the
> WKUP domain. In fact he tried out with the same fix. Anyways, we
> should hear from
> Kevin now whether it is feasible to fix the context_loss_count for the WKUP 
> GPIO
> bank or to put the workaround here in the gpio driver.

Ok, so I have been looking at this some more today. I believe that the
actual bug is that we are not checking to see if "loses_context" is true
before populating "get_context_loss_count" (see omap dmtimer driver).
For bank0 loses_context is false and so we should never be calling
"get_context_loss_count" in the first place.

I will send out a patch to fix this and will copy Kevin and Franky.

Franky, if you can test and confirm it works that would be great.

Kevin, if you can review that would be great too.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread DebBarma, Tarun Kanti
On Fri, Jun 29, 2012 at 6:29 AM, Franky Lin  wrote:
> On 06/28/2012 04:54 PM, Jon Hunter wrote:
>>
>> I am wondering if this could be the bug ... on start-up I see that we do
>> a context restore on bank1 during the probe which is before we have done
>> the first suspend! In other words, we could restore a bad/uninitialised
>> context for bank1. In the case of bank1, the loss count starts at 1 and
>> not 0 and so we falsely think we need to perform a restore :-(
>>
>> [    0.176269] omap_gpio_runtime_resume: bank @ 0xfc31
>> [    0.177276] omap_gpio_runtime_resume: count 0, now 1
>> [    0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
>> [    0.177642] omap_gpio_runtime_suspend: bank @ 0xfc31
>>
>> Can you try ...
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..9623408 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
>> platform_device *pdev)
>>  #ifdef CONFIG_OF_GPIO
>>         bank->chip.of_node = of_node_get(node);
>>  #endif
>> +       if (bank->get_context_loss_count)
>> +               bank->context_loss_count =
>> +                               bank->get_context_loss_count(bank->dev);
>>
>>         bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
>>         if (bank->irq_base < 0) {
>>
>
> Looks like you found the culprit. :) It does fix the problem.
So this looks similar to what NeilBrown  reported in
another thread.
The reason was context_loss_count = 1 for GPIO BANK#0 which of course is in the
WKUP domain. In fact he tried out with the same fix. Anyways, we
should hear from
Kevin now whether it is feasible to fix the context_loss_count for the WKUP GPIO
bank or to put the workaround here in the gpio driver.
--
Tarun
>
> Franky
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Franky Lin

On 06/28/2012 04:54 PM, Jon Hunter wrote:

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[0.176269] omap_gpio_runtime_resume: bank @ 0xfc31
[0.177276] omap_gpio_runtime_resume: count 0, now 1
[0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[0.177642] omap_gpio_runtime_suspend: bank @ 0xfc31

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
  #ifdef CONFIG_OF_GPIO
 bank->chip.of_node = of_node_get(node);
  #endif
+   if (bank->get_context_loss_count)
+   bank->context_loss_count =
+   bank->get_context_loss_count(bank->dev);

 bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
 if (bank->irq_base < 0) {



Looks like you found the culprit. :) It does fix the problem.

Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [0.171844] dummy:
> [0.172912] NET: Registered protocol family 16
> [0.173431] GPMC revision 6.0
> [0.173492] gpmc: irq-52 could not claim: err -22
> [0.177551] ??omap_gpio_runtime_resume
> [0.178619] OMAP GPIO hardware version 0.1
> [0.178649] !omap_gpio_runtime_suspend
> [0.178771] ??omap_gpio_runtime_resume
> [0.179351] !omap_gpio_runtime_suspend
> [0.179504] ??omap_gpio_runtime_resume
> [0.180023] !omap_gpio_runtime_suspend
> [0.180145] ??omap_gpio_runtime_resume
> [0.180694] !omap_gpio_runtime_suspend
> [0.180847] ??omap_gpio_runtime_resume
> [0.181365] !omap_gpio_runtime_suspend
> [0.181518] ??omap_gpio_runtime_resume
> [0.182037] !omap_gpio_runtime_suspend
> [0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [0.186584] error setting wl12xx data: -38
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.239501] ??omap_gpio_runtime_resume
> [0.239532] ??omap_gpio_runtime_resume
> [0.241058]  usbhs_omap: alias fck already exists
> [0.244781] ??omap_gpio_runtime_resume

I am wondering if this could be the bug ... on start-up I see that we do
a context restore on bank1 during the probe which is before we have done
the first suspend! In other words, we could restore a bad/uninitialised
context for bank1. In the case of bank1, the loss count starts at 1 and
not 0 and so we falsely think we need to perform a restore :-(

[0.176269] omap_gpio_runtime_resume: bank @ 0xfc31
[0.177276] omap_gpio_runtime_resume: count 0, now 1
[0.177276] gpiochip_add: registered GPIOs 0 to 31 on device: gpio
[0.177642] omap_gpio_runtime_suspend: bank @ 0xfc31

Can you try ...

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..9623408 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1086,6 +1086,9 @@ static int __devinit omap_gpio_probe(struct
platform_device *pdev)
 #ifdef CONFIG_OF_GPIO
bank->chip.of_node = of_node_get(node);
 #endif
+   if (bank->get_context_loss_count)
+   bank->context_loss_count =
+   bank->get_context_loss_count(bank->dev);

bank->irq_base = irq_alloc_descs(-1, 0, bank->width, 0);
if (bank->irq_base < 0) {

Thanks
Jon

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [0.171844] dummy:
> [0.172912] NET: Registered protocol family 16
> [0.173431] GPMC revision 6.0
> [0.173492] gpmc: irq-52 could not claim: err -22
> [0.177551] ??omap_gpio_runtime_resume
> [0.178619] OMAP GPIO hardware version 0.1
> [0.178649] !omap_gpio_runtime_suspend
> [0.178771] ??omap_gpio_runtime_resume
> [0.179351] !omap_gpio_runtime_suspend
> [0.179504] ??omap_gpio_runtime_resume
> [0.180023] !omap_gpio_runtime_suspend
> [0.180145] ??omap_gpio_runtime_resume
> [0.180694] !omap_gpio_runtime_suspend
> [0.180847] ??omap_gpio_runtime_resume
> [0.181365] !omap_gpio_runtime_suspend
> [0.181518] ??omap_gpio_runtime_resume
> [0.182037] !omap_gpio_runtime_suspend
> [0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [0.186584] error setting wl12xx data: -38
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.239501] ??omap_gpio_runtime_resume
> [0.239532] ??omap_gpio_runtime_resume
> [0.241058]  usbhs_omap: alias fck already exists
> [0.244781] ??omap_gpio_runtime_resume

Sorry, can you do one more test? :-)

Add the following and send me the output?

Thanks!
Jon

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3aa0f96 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1155,6 +1155,7 @@ static int omap_gpio_runtime_suspend(struct device
*dev)
unsigned long flags;
u32 wake_low, wake_hi;

+   pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
spin_lock_irqsave(&bank->lock, flags);

/*
@@ -1221,6 +1222,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
u32 l = 0, gen, gen0, gen1;
unsigned long flags;

+   pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
spin_lock_irqsave(&bank->lock, flags);
_gpio_dbck_enable(bank);

@@ -1239,6 +1241,7 @@ static int omap_gpio_runtime_resume(struct device
*dev)
context_lost_cnt_after =
bank->get_context_loss_count(bank->dev);
if (context_lost_cnt_after != bank->context_loss_count) {
+   pr_info("%s: count %d, now %d", __func__,
bank->context_loss_count, context_lost_cnt_after);
omap_gpio_restore_context(bank);
} else {
spin_unlock_irqrestore(&bank->lock, flags);
@@ -1341,6 +1344,7 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+   pr_info("%s: bank @ 0x%x\n", __func__, (u32)bank->base);
__raw_writel(bank->context.wake_en,
bank->base + bank->regs->wkup_en);
__raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/28/2012 06:10 PM, Franky Lin wrote:
> On 06/28/2012 03:59 PM, Jon Hunter wrote:
>>
>> On 06/28/2012 05:53 PM, Franky Lin wrote:
>>> I found one interesting thing. When I added the print info to see when
>>> runtime_suspend/resume get called, it seems like the suspend/resume is
>>> unbalance during boot. Resume got called more than suspend. So I hack
>>> the code to make sure suspend and resume are called in pair. A resume
>>> without suspend will do nothing and return immediately. This also makes
>>> the hang vanish.
>>
>> I am not 100% sure I follow. On boot I would expect to see a
>> resume/suspend due to the probe on the irq bank and then I would expect
>> to see another resume from the acquisition of the gpio, however, I would
>> not expect a suspend until the gpio is freed, which I don't believe you
>> are doing.
>>
>> Can you share your hack? Just paste the diff? This may help me
>> understand more.
>>
> 
> OK.
> This is what I saw in the log:
> [0.171844] dummy:
> [0.172912] NET: Registered protocol family 16
> [0.173431] GPMC revision 6.0
> [0.173492] gpmc: irq-52 could not claim: err -22
> [0.177551] ??omap_gpio_runtime_resume
> [0.178619] OMAP GPIO hardware version 0.1
> [0.178649] !omap_gpio_runtime_suspend
> [0.178771] ??omap_gpio_runtime_resume
> [0.179351] !omap_gpio_runtime_suspend
> [0.179504] ??omap_gpio_runtime_resume
> [0.180023] !omap_gpio_runtime_suspend
> [0.180145] ??omap_gpio_runtime_resume
> [0.180694] !omap_gpio_runtime_suspend
> [0.180847] ??omap_gpio_runtime_resume
> [0.181365] !omap_gpio_runtime_suspend
> [0.181518] ??omap_gpio_runtime_resume
> [0.182037] !omap_gpio_runtime_suspend

There a 6 resume/suspend pairs here one for probing each of the 6 gpio
banks. So this makes sense.

> [0.185089] omap_mux_init: Add partition: #1: core, flags: 2
> [0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
> [0.186584] error setting wl12xx data: -38
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.189788] _omap_mux_get_by_name: Could not find signal
> uart1_rx.uart1_rx
> [0.239501] ??omap_gpio_runtime_resume
> [0.239532] ??omap_gpio_runtime_resume
> [0.241058]  usbhs_omap: alias fck already exists
> [0.244781] ??omap_gpio_runtime_resume

Yes, these 3 resumes at the end are most likely caused by calls to
omap_gpio_request(). In other words, 3 gpios are acquired. So that is
expected and looks fine to me.

> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> index c4ed172..bca3985 100644
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct
> platform_device *pdev)
> 
>  #if defined(CONFIG_PM_RUNTIME)
>  static void omap_gpio_restore_context(struct gpio_bank *bank);
> -
> +static int flag = 0;
>  static int omap_gpio_runtime_suspend(struct device *dev)
>  {
> struct platform_device *pdev = to_platform_device(dev);
> @@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device
> *dev)
> unsigned long flags;
> u32 wake_low, wake_hi;
> 
> +   flag ++;
> +
> spin_lock_irqsave(&bank->lock, flags);
> 
> /*
> @@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device
> *dev)
> u32 l = 0, gen, gen0, gen1;
> unsigned long flags;
> 
> +   if (flag)
> +   flag--;
> +   else
> +   return 0;
> +
> spin_lock_irqsave(&bank->lock, flags);
> _gpio_dbck_enable(bank);

I guess that this would also avoid the context restore, so I could see
it would work, but this is definitely not right. Ok, well let me look
into the restore.

Thanks
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Franky Lin

On 06/28/2012 03:59 PM, Jon Hunter wrote:


On 06/28/2012 05:53 PM, Franky Lin wrote:

I found one interesting thing. When I added the print info to see when
runtime_suspend/resume get called, it seems like the suspend/resume is
unbalance during boot. Resume got called more than suspend. So I hack
the code to make sure suspend and resume are called in pair. A resume
without suspend will do nothing and return immediately. This also makes
the hang vanish.


I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.



OK.
This is what I saw in the log:
[0.171844] dummy:
[0.172912] NET: Registered protocol family 16
[0.173431] GPMC revision 6.0
[0.173492] gpmc: irq-52 could not claim: err -22
[0.177551] ??omap_gpio_runtime_resume
[0.178619] OMAP GPIO hardware version 0.1
[0.178649] !omap_gpio_runtime_suspend
[0.178771] ??omap_gpio_runtime_resume
[0.179351] !omap_gpio_runtime_suspend
[0.179504] ??omap_gpio_runtime_resume
[0.180023] !omap_gpio_runtime_suspend
[0.180145] ??omap_gpio_runtime_resume
[0.180694] !omap_gpio_runtime_suspend
[0.180847] ??omap_gpio_runtime_resume
[0.181365] !omap_gpio_runtime_suspend
[0.181518] ??omap_gpio_runtime_resume
[0.182037] !omap_gpio_runtime_suspend
[0.185089] omap_mux_init: Add partition: #1: core, flags: 2
[0.186462] omap_mux_init: Add partition: #2: wkup, flags: 2
[0.186584] error setting wl12xx data: -38
[0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx
[0.189788] _omap_mux_get_by_name: Could not find signal 
uart1_rx.uart1_rx

[0.239501] ??omap_gpio_runtime_resume
[0.239532] ??omap_gpio_runtime_resume
[0.241058]  usbhs_omap: alias fck already exists
[0.244781] ??omap_gpio_runtime_resume

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..bca3985 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1146,7 +1146,7 @@ static int __devinit omap_gpio_probe(struct 
platform_device *pdev)


 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank);
-
+static int flag = 0;
 static int omap_gpio_runtime_suspend(struct device *dev)
 {
struct platform_device *pdev = to_platform_device(dev);
@@ -1155,6 +1155,8 @@ static int omap_gpio_runtime_suspend(struct device 
*dev)

unsigned long flags;
u32 wake_low, wake_hi;

+   flag ++;
+
spin_lock_irqsave(&bank->lock, flags);

/*
@@ -1221,6 +1223,11 @@ static int omap_gpio_runtime_resume(struct device 
*dev)

u32 l = 0, gen, gen0, gen1;
unsigned long flags;

+   if (flag)
+   flag--;
+   else
+   return 0;
+
spin_lock_irqsave(&bank->lock, flags);
_gpio_dbck_enable(bank);

Regards,
Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/28/2012 05:53 PM, Franky Lin wrote:
> On 06/28/2012 02:55 PM, Jon Hunter wrote:
>> Ok. Any way to manually reset the wlan module to deactivate the gpio
>> when it is hung? I am wondering if the gpio is deactivated if the board
>> comes back to life, indicating it is stuck in the interrupt somewhere.
> 
> The only way I can think of is removing the module manually. But it
> didn't bring the board back to live.
> 
>> Well, at least that is consistent with what I see, but also perplexing
>> that it takes sometime to fail. Can you try the following as a debug
>> patch to see if it is in the context restore that is the problem. From
>> your testing and bisect, the only possible difference in the current
>> kernel is that it could perform the context restore when acquiring the
>> gpio.
>>
>> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
>> index c4ed172..a2401bd 100644
>> --- a/drivers/gpio/gpio-omap.c
>> +++ b/drivers/gpio/gpio-omap.c
>> @@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
>>   #if defined(CONFIG_PM_RUNTIME)
>>   static void omap_gpio_restore_context(struct gpio_bank *bank)
>>   {
>> +   return;
>> +
>>  __raw_writel(bank->context.wake_en,
>>  bank->base + bank->regs->wkup_en);
>>  __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);
>>
> 
> This one works! It can run more than 20 mins.

Great! I need to dig into the context restore some more.

> I found one interesting thing. When I added the print info to see when
> runtime_suspend/resume get called, it seems like the suspend/resume is
> unbalance during boot. Resume got called more than suspend. So I hack
> the code to make sure suspend and resume are called in pair. A resume
> without suspend will do nothing and return immediately. This also makes
> the hang vanish.

I am not 100% sure I follow. On boot I would expect to see a
resume/suspend due to the probe on the irq bank and then I would expect
to see another resume from the acquisition of the gpio, however, I would
not expect a suspend until the gpio is freed, which I don't believe you
are doing.

Can you share your hack? Just paste the diff? This may help me
understand more.

Thanks
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Franky Lin

On 06/28/2012 02:55 PM, Jon Hunter wrote:

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.


The only way I can think of is removing the module manually. But it 
didn't bring the board back to live.



Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
  #if defined(CONFIG_PM_RUNTIME)
  static void omap_gpio_restore_context(struct gpio_bank *bank)
  {
+   return;
+
 __raw_writel(bank->context.wake_en,
 bank->base + bank->regs->wkup_en);
 __raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);



This one works! It can run more than 20 mins.

I found one interesting thing. When I added the print info to see when 
runtime_suspend/resume get called, it seems like the suspend/resume is 
unbalance during boot. Resume got called more than suspend. So I hack 
the code to make sure suspend and resume are called in pair. A resume 
without suspend will do nothing and return immediately. This also makes 
the hang vanish.


Regards,
Franky


--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/28/2012 04:24 PM, Franky Lin wrote:
> On 06/28/2012 08:42 AM, Jon Hunter wrote:
>>
>> On 06/27/2012 07:41 PM, Franky Lin wrote:
>>> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
 "Franky Lin"  writes:
> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

 If reverting the patch fixes your problem, can you isolate down to
 which
 part of that patch causes the problem?  IOW, can you fix your
 problem if
 you undo just the hunk added in runtime_suspend or undo just the moved
 hunk runtime_resume?  Or is reverting both required?

 I suspect the added runtime_suspend hunk is causing the problems, so
 can
 you see if just undoing that part works[1].  If that works, I will give
 a bit more of a thinking on it tomorrow.
>>>
>>> runtime_suspend hunk is fine. The hang still exist after reverting it.
>>> The culprit is the moved hunk in runtime_resume. Reverting it makes the
>>> hang disappear.
>>
>> Thanks. From reviewing the code the only thing that appears suspect based
>> upon your findings is the return if we find the context has not been
>> lost.
>> We are not checking if "workaround_enabled" is set before we return.
>>
>> Could you try the following change on top of v3.5-rc3?
>>
> 
> The patch doesn't help. And I also managed to probe the signal. It's
> active when it hung.

Ok. Any way to manually reset the wlan module to deactivate the gpio
when it is hung? I am wondering if the gpio is deactivated if the board
comes back to life, indicating it is stuck in the interrupt somewhere.

>> Also, could you add a print in the runtime_suspend/resume() functions so
>> we can see how often these are being called. In my case, I really
>> don't see
>> these being exercised and I am wondering how often you see suspend/resume
>> being called in your setup.
> 
> Well, the runtime_suspend/resume never get called during the test.

Well, at least that is consistent with what I see, but also perplexing
that it takes sometime to fail. Can you try the following as a debug
patch to see if it is in the context restore that is the problem. From
your testing and bisect, the only possible difference in the current
kernel is that it could perform the context restore when acquiring the gpio.

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..a2401bd 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1341,6 +1341,8 @@ void omap2_gpio_resume_after_idle(void)
 #if defined(CONFIG_PM_RUNTIME)
 static void omap_gpio_restore_context(struct gpio_bank *bank)
 {
+   return;
+
__raw_writel(bank->context.wake_en,
bank->base + bank->regs->wkup_en);
__raw_writel(bank->context.ctrl, bank->base + bank->regs->ctrl);

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Franky Lin

On 06/28/2012 08:42 AM, Jon Hunter wrote:


On 06/27/2012 07:41 PM, Franky Lin wrote:

On 06/26/2012 08:37 PM, Kevin Hilman wrote:

"Franky Lin"  writes:

I noticed Kevin raised some similar cases on other platforms and also
provided two patches in the patch mail thread. But unfortunately those
two patches doesn't help in our case. I tested the driver with 3.5-rc3
mainline kernel and the issue is still there. I can only "fix" the
hang by either reverting the commit or disabling
CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
Panda with 4430 works good.

Any thoughts and suggestions?


If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.


runtime_suspend hunk is fine. The hang still exist after reverting it.
The culprit is the moved hunk in runtime_resume. Reverting it makes the
hang disappear.


Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return.

Could you try the following change on top of v3.5-rc3?



The patch doesn't help. And I also managed to probe the signal. It's 
active when it hung.



Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.


Well, the runtime_suspend/resume never get called during the test.

Thanks,
Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter

On 06/27/2012 07:41 PM, Franky Lin wrote:
> On 06/26/2012 08:37 PM, Kevin Hilman wrote:
>> "Franky Lin"  writes:
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the
>>> hang by either reverting the commit or disabling
>>> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
>>> Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> If reverting the patch fixes your problem, can you isolate down to which
>> part of that patch causes the problem?  IOW, can you fix your problem if
>> you undo just the hunk added in runtime_suspend or undo just the moved
>> hunk runtime_resume?  Or is reverting both required?
>>
>> I suspect the added runtime_suspend hunk is causing the problems, so can
>> you see if just undoing that part works[1].  If that works, I will give
>> a bit more of a thinking on it tomorrow.
> 
> runtime_suspend hunk is fine. The hang still exist after reverting it.
> The culprit is the moved hunk in runtime_resume. Reverting it makes the
> hang disappear.

Thanks. From reviewing the code the only thing that appears suspect based
upon your findings is the return if we find the context has not been lost.
We are not checking if "workaround_enabled" is set before we return. 

Could you try the following change on top of v3.5-rc3?

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..3b89e85 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1238,12 +1238,8 @@ static int omap_gpio_runtime_resume(struct device *dev)
if (bank->get_context_loss_count) {
context_lost_cnt_after =
bank->get_context_loss_count(bank->dev);
-   if (context_lost_cnt_after != bank->context_loss_count) {
+   if (context_lost_cnt_after != bank->context_loss_count)
omap_gpio_restore_context(bank);
-   } else {
-   spin_unlock_irqrestore(&bank->lock, flags);
-   return 0;
-   }
}

Also, could you add a print in the runtime_suspend/resume() functions so
we can see how often these are being called. In my case, I really don't see
these being exercised and I am wondering how often you see suspend/resume
being called in your setup.

Cheers
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-28 Thread Jon Hunter
Hi Franky,

On 06/27/2012 08:03 PM, Franky Lin wrote:
> On 06/27/2012 04:43 PM, Jon Hunter wrote:
>> Hi Franky,
>>
>> On 06/25/2012 03:52 PM, Franky Lin wrote:
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
>>> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
>>> within 5 mins during a network traffic test. After bisecting we found
>>> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
>>> *_runtime_suspend()" [1].
>>
>> I have been looking into this today to see if I can replicate the
>> problem that you have reported. However, so far I have not had any luck.
>> Please note that my test setup is not exactly the same as yours as I
>> don't have your wlan module. However, I have been using a 2nd board to
>> generate gpio events to a panda-es to see I can make it lock up. I have
>> tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
>> problems after sending 100k gpio events (over many minutes). My setup is
>> as follows ...
>>
>> - OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
>> - Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
>> - Created a simple kernel module that acquires gpio-138 and sets up a
>>IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
>> - GPIO events are triggered roughly every 1ms
> 
> Don't know if it's related, but we also mux several other pins on
> connector A:
> /* MMC2 Mux for extension board */
> /* MMC2 CMD */
> OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* MMC2 CLK */
> OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* MMC2 DAT 0-3 */
> OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
> /* GPIO MUX for OOB interupt of dongle */
> OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
> /* GPIO MUX for WLAN_ENABLE for dongle */
> OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),

I would not have thought so. However, I will think about that thanks.

>> Can you confirm ...
>> 1. You are just using omap2plus_defconfig with no changes?
> No, we enable following options
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_USB_OHCI_HCD=y

Ok, thanks.

>> 2. Rough frequency of gpio events?
> 3367 interrupts were triggered during a 10 secs throughput test.
> 
>> 3. Is the gpio configured for active low or high?
> active high
> 
>> 4. When the hang occurs, what is the state of the gpio? Active or
>> inactive? Can you probe it with a scope? If it was always active I
>> could see that this would lock the device up, but I am not sure how
>> that would relate to the results from your bisect???
> 
> I dont have a scope nearby. Let me see if I can find one tomorrow.

Great, that would be good.

>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline kernel and the issue is still there. I can only "fix" the hang
>>> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
>>> hang only happens on Panda ES board. Old Panda with 4430 works good.
>>
>> It does not make sense to me yet why this would only impact 4460, but I
>> will keep this in mind.
>>
>> In your wlan driver are you acquiring and freeing the gpio often? Or are
>> you only acquiring the gpio on boot?
>>
>> The reason I ask is because for omap4, it seems that we are not
>> currently calling omap2_gpio_prepare_for_idle() during idle and so the
>> only time I see us call the runtime_suspend/resume handlers for omap4 is
>> during probe and when we acquire and free the gpio.
>>
>> So if you were not acquiring and freeing the gpio and are using the
>> stock kernel, then as far as I can tell, the runtime pm code is not
>> being exercised much. My test is not acquiring and releasing the gpio
>> and so I am wondering if that is the secret to reproducing this
>> problem :-)
> 
> We only request the irq once during initialization. But we do frequently
> disable and re-enable it since we need to access to the module through
> SDIO to clear the interrupt. Apparently we can't finish all this in irq
> handler.

Ok, thanks. I don't see why that would cause a problem, but I can try
that too.

> Hope these could help.

Yes, good info to have.

Thanks
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-27 Thread Franky Lin

On 06/27/2012 04:43 PM, Jon Hunter wrote:

Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO
WiFi dongle on MMC2 with a level triggered interrupt signal connected to
GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
within 5 mins during a network traffic test. After bisecting we found
the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
*_runtime_suspend()" [1].


I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
   IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms


Don't know if it's related, but we also mux several other pins on 
connector A:

/* MMC2 Mux for extension board */
/* MMC2 CMD */
OMAP4_MUX(GPMC_NWE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* MMC2 CLK */
OMAP4_MUX(GPMC_NOE, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* MMC2 DAT 0-3 */
OMAP4_MUX(GPMC_AD0, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD1, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD2, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
OMAP4_MUX(GPMC_AD3, OMAP_MUX_MODE1 | OMAP_PIN_INPUT_PULLUP),
/* GPIO MUX for OOB interupt of dongle */
OMAP4_MUX(MCSPI1_CS1, OMAP_MUX_MODE3 | OMAP_PIN_INPUT_PULLDOWN),
/* GPIO MUX for WLAN_ENABLE for dongle */
OMAP4_MUX(MCSPI1_CLK, OMAP_MUX_MODE3 | OMAP_PIN_OUTPUT),


Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?

No, we enable following options
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_USB_OHCI_HCD=y


2. Rough frequency of gpio events?

3367 interrupts were triggered during a 10 secs throughput test.


3. Is the gpio configured for active low or high?

active high


4. When the hang occurs, what is the state of the gpio? Active or
inactive? Can you probe it with a scope? If it was always active I
could see that this would lock the device up, but I am not sure how
that would relate to the results from your bisect???


I dont have a scope nearby. Let me see if I can find one tomorrow.


I noticed Kevin raised some similar cases on other platforms and also
provided two patches in the patch mail thread. But unfortunately those
two patches doesn't help in our case. I tested the driver with 3.5-rc3
mainline kernel and the issue is still there. I can only "fix" the hang
by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
hang only happens on Panda ES board. Old Panda with 4430 works good.


It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)


We only request the irq once during initialization. But we do frequently 
disable and re-enable it since we need to access to the module through 
SDIO to clear the interrupt. Apparently we can't finish all this in irq 
handler.


Hope these could help.

Regards,
Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-27 Thread Franky Lin

On 06/26/2012 08:37 PM, Kevin Hilman wrote:

"Franky Lin"  writes:

I noticed Kevin raised some similar cases on other platforms and also
provided two patches in the patch mail thread. But unfortunately those
two patches doesn't help in our case. I tested the driver with 3.5-rc3
mainline kernel and the issue is still there. I can only "fix" the
hang by either reverting the commit or disabling
CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
Panda with 4430 works good.

Any thoughts and suggestions?


If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.


runtime_suspend hunk is fine. The hang still exist after reverting it. 
The culprit is the moved hunk in runtime_resume. Reverting it makes the 
hang disappear.




Thanks for reporting the problem!   Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin



You are welcome.

Regards,
Franky

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-27 Thread Jon Hunter
Hi Franky,

On 06/25/2012 03:52 PM, Franky Lin wrote:
> Hi Kevin, Tarun,
> 
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected to
> GPIO 138. It's been working fine until 3.5 rc1. The board hang randomly
> within 5 mins during a network traffic test. After bisecting we found
> the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].

I have been looking into this today to see if I can replicate the
problem that you have reported. However, so far I have not had any luck.
Please note that my test setup is not exactly the same as yours as I
don't have your wlan module. However, I have been using a 2nd board to
generate gpio events to a panda-es to see I can make it lock up. I have
tried mainline kernel 3.5-rc1 and 3.5-rc3 but I have not seen any
problems after sending 100k gpio events (over many minutes). My setup is
as follows ...

- OMAP4460 panda-es with gpio-138 connected to OMAP3430 beagle gpio-11.
- Mainline kernel 3.5-rc1/3 using omap2plus_defconfig (no changes)
- Created a simple kernel module that acquires gpio-138 and sets up a
  IRQ with flag IRQF_TRIGGER_HIGH (for active high level interrupt).
- GPIO events are triggered roughly every 1ms

Can you confirm ...
1. You are just using omap2plus_defconfig with no changes?
2. Rough frequency of gpio events?
3. Is the gpio configured for active low or high?
4. When the hang occurs, what is the state of the gpio? Active or
   inactive? Can you probe it with a scope? If it was always active I
   could see that this would lock the device up, but I am not sure how
   that would relate to the results from your bisect???

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the hang
> by either reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the
> hang only happens on Panda ES board. Old Panda with 4430 works good.

It does not make sense to me yet why this would only impact 4460, but I
will keep this in mind.

In your wlan driver are you acquiring and freeing the gpio often? Or are
you only acquiring the gpio on boot?

The reason I ask is because for omap4, it seems that we are not
currently calling omap2_gpio_prepare_for_idle() during idle and so the
only time I see us call the runtime_suspend/resume handlers for omap4 is
during probe and when we acquire and free the gpio.

So if you were not acquiring and freeing the gpio and are using the
stock kernel, then as far as I can tell, the runtime pm code is not
being exercised much. My test is not acquiring and releasing the gpio
and so I am wondering if that is the secret to reproducing this problem :-)

Cheers
Jon

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-27 Thread DebBarma, Tarun Kanti
On Tue, Jun 26, 2012 at 11:50 PM, Franky Lin  wrote:
> On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:
>>
>> On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin  wrote:
>>>
>>> Hi Kevin, Tarun,
>>>
>>> We are using the expansion connector A on Panda board to mount a SDIO
>>> WiFi
>>> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
>>> 138. It's been working fine until 3.5 rc1. The board hang randomly within
>>> 5
>>> mins during a network traffic test. After bisecting we found the culprit
>>> is
>>> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>>>
>>> I noticed Kevin raised some similar cases on other platforms and also
>>> provided two patches in the patch mail thread. But unfortunately those
>>> two
>>> patches doesn't help in our case. I tested the driver with 3.5-rc3
>>> mainline
>>> kernel and the issue is still there. I can only "fix" the hang by either
>>> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
>>> happens on Panda ES board. Old Panda with 4430 works good.
>>>
>>> Any thoughts and suggestions?
>>
>> I just had a quick look at the code. Can you please check if the
>> attached patch solves
>> the issue? I just boot tested on Panda and Blaze.
>> --
>> Tarun
>>
>
> Thanks for the prompt reply.
>
> Booting is fine even without the patch and revert. The wifi dongle generates
> interrupt whenever there is data packet available for host to read. So
> during a traffic test a significant numbers of interrupt will be triggered
> through the GPIO. So I assume it has something to do with the interrupt
> GPIO.
>
> With the patch, the kernel still crashes. But the symptom is slightly
> different. Now it has a panic log every time. See attachment.
I tried comparing the present code with older version with regard
to enabled_non_wakeup_gpios check. The obvious difference I
observed is that this check is performed after off-mode check,
unlike the present case where the check is done just prior to
off-mode check. But then, as Kevin pointed out, we need to understand
the exact problem. I am trying to have a setup to reproduce the
problem. BTW, you can ignore my patch because I realized that
saved_datain is part of the workaround.
---
Tarun

>
> Regards,
> Franky
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-26 Thread Kevin Hilman
Hello,

"Franky Lin"  writes:

> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO
> WiFi dongle on MMC2 with a level triggered interrupt signal connected
> to GPIO 138. It's been working fine until 3.5 rc1. The board hang
> randomly within 5 mins during a network traffic test. After bisecting
> we found the culprit is "[PATCH 8/8] gpio/omap: fix missing check in
> *_runtime_suspend()" [1].



As you might guess.  That patch has caused me enough headaches that
reverting it sounds like a good idea now.  But, I'd still like to better
understand exactly what's going on.

> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those
> two patches doesn't help in our case. I tested the driver with 3.5-rc3
> mainline kernel and the issue is still there. I can only "fix" the
> hang by either reverting the commit or disabling
> CONFIG_PM_RUNTIME. Also, the hang only happens on Panda ES board. Old
> Panda with 4430 works good.
>
> Any thoughts and suggestions?

If reverting the patch fixes your problem, can you isolate down to which
part of that patch causes the problem?  IOW, can you fix your problem if
you undo just the hunk added in runtime_suspend or undo just the moved
hunk runtime_resume?  Or is reverting both required?

I suspect the added runtime_suspend hunk is causing the problems, so can
you see if just undoing that part works[1].  If that works, I will give
a bit more of a thinking on it tomorrow.

Thanks for reporting the problem!   Bug reports like this that have
clearly been thoroughly researched and bisected are greatly appreciated!

Kevin

[1] patch against v3.5-rc4

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..2a6067f 100644   
--- a/drivers/gpio/gpio-omap.c  
+++ b/drivers/gpio/gpio-omap.c  
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
__raw_writel(wake_hi | bank->context.risingdetect,  
 bank->base + bank->regs->risingdetect);

-   if (!bank->enabled_non_wakeup_gpios)
-   goto update_gpio_context_count; 
-   
if (bank->power_mode != OFF_MODE) { 
bank->power_mode = 0;   
goto update_gpio_context_count; 



--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panda ES board hang when using GPIO as interrupt

2012-06-26 Thread Franky Lin

On 06/26/2012 12:21 AM, DebBarma, Tarun Kanti wrote:

On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin  wrote:

Hi Kevin, Tarun,

We are using the expansion connector A on Panda board to mount a SDIO WiFi
dongle on MMC2 with a level triggered interrupt signal connected to GPIO
138. It's been working fine until 3.5 rc1. The board hang randomly within 5
mins during a network traffic test. After bisecting we found the culprit is
"[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].

I noticed Kevin raised some similar cases on other platforms and also
provided two patches in the patch mail thread. But unfortunately those two
patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
kernel and the issue is still there. I can only "fix" the hang by either
reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
happens on Panda ES board. Old Panda with 4430 works good.

Any thoughts and suggestions?

I just had a quick look at the code. Can you please check if the
attached patch solves
the issue? I just boot tested on Panda and Blaze.
--
Tarun



Thanks for the prompt reply.

Booting is fine even without the patch and revert. The wifi dongle 
generates interrupt whenever there is data packet available for host to 
read. So during a traffic test a significant numbers of interrupt will 
be triggered through the GPIO. So I assume it has something to do with 
the interrupt GPIO.


With the patch, the kernel still crashes. But the symptom is slightly 
different. Now it has a panic log every time. See attachment.


Regards,
Franky
[  636.143585] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM 


[  636.150634] Modules linked in: brcmfmac brcmutil cfg80211


[  636.156311] CPU: 0Not tainted  (3.5.0-rc4+ #3)   


[  636.161346] PC is at __lock_acquire+0x65c/0x1d88 


[  636.166198] LR is at 0x6093  


[  636.169494] pc : []lr : [<6093>]psr: 2093  


[  636.169494] sp : c06b1e18  ip : 9e370001  fp : c0724f70  


[  636.181549] r10: c06b  r9 : 001e  r8 : c0b92998  


[  636.187042] r7 : c06d2cc8  r6 :   r5 : c0746d64  r4 : c06d2868   


[  636.193908] r3 : 3b0e  r2 : ec3b001d  r1 : 0001d870  r0 : 001d   


[  636.200744] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
kernel  
   
[  636.208526] Control: 10c53c7d  Table: ae39c04a  DAC: 0017


[  636.214569] Process swapper/0 (pid: 0, stack limit = 0xc06b02f8) 


[  636.220855] Stack: (0xc06b1e18 to 0xc06b2000)


[  636.225433] 1e00:   
c06d00f8 0002   
 
[  636.234039] 1e20: c0807968 0001  0002 001d  
0001 0001d870   
 
[  636.242614] 1e40: c08070e8 0001  0002 0002  
 c00903e4   
 
[  636.251220] 1e60: 0002 0080  c0066838   
6093    
 
[  636.259796] 1e80: 6093  c06b4324 c06b   
0002  

Re: Panda ES board hang when using GPIO as interrupt

2012-06-26 Thread DebBarma, Tarun Kanti
On Tue, Jun 26, 2012 at 2:22 AM, Franky Lin  wrote:
> Hi Kevin, Tarun,
>
> We are using the expansion connector A on Panda board to mount a SDIO WiFi
> dongle on MMC2 with a level triggered interrupt signal connected to GPIO
> 138. It's been working fine until 3.5 rc1. The board hang randomly within 5
> mins during a network traffic test. After bisecting we found the culprit is
> "[PATCH 8/8] gpio/omap: fix missing check in *_runtime_suspend()" [1].
>
> I noticed Kevin raised some similar cases on other platforms and also
> provided two patches in the patch mail thread. But unfortunately those two
> patches doesn't help in our case. I tested the driver with 3.5-rc3 mainline
> kernel and the issue is still there. I can only "fix" the hang by either
> reverting the commit or disabling CONFIG_PM_RUNTIME. Also, the hang only
> happens on Panda ES board. Old Panda with 4430 works good.
>
> Any thoughts and suggestions?
I just had a quick look at the code. Can you please check if the
attached patch solves
the issue? I just boot tested on Panda and Blaze.
--
Tarun

>From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma 
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips
bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma 
---
 drivers/gpio/gpio-omap.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
__raw_writel(wake_hi | bank->context.risingdetect,
 bank->base + bank->regs->risingdetect);

-   if (!bank->enabled_non_wakeup_gpios)
-   goto update_gpio_context_count;
-
if (bank->power_mode != OFF_MODE) {
bank->power_mode = 0;
goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
 */
bank->saved_datain = __raw_readl(bank->base +
bank->regs->datain);
+
+   if (!bank->enabled_non_wakeup_gpios)
+   goto update_gpio_context_count;
+
l1 = bank->context.fallingdetect;
l2 = bank->context.risingdetect;

-- 
1.7.0.4



>
> Thanks,
> Franky
>
> [1] http://article.gmane.org/gmane.linux.ports.arm.omap/75708/
>
From 0e1b322451b7a49487d2d17a147db1aa1d1119fa Mon Sep 17 00:00:00 2001
From: Tarun Kanti DebBarma 
Date: Tue, 26 Jun 2012 12:13:47 +0530
Subject: [PATCH] gpio/omap: enabled_non_wakeup_gpios check skips bank->saved_datain

Commit b3c64bc30af67ed328a8d919e41160942b870451
(gpio/omap: (re)fix wakeups on level-triggered GPIOs)
still skips update of bank->saved_datain in *_runtime_suspend()
which must be done irrespective of edge/level trigger types.
Therefore, move the enbaled_non_wakeup_gpios check after the
bank->saved_datain is updated.

Signed-off-by: Tarun Kanti DebBarma 
---
 drivers/gpio/gpio-omap.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index c4ed172..94ecdcf 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -1177,9 +1177,6 @@ static int omap_gpio_runtime_suspend(struct device *dev)
 		__raw_writel(wake_hi | bank->context.risingdetect,
 			 bank->base + bank->regs->risingdetect);
 
-	if (!bank->enabled_non_wakeup_gpios)
-		goto update_gpio_context_count;
-
 	if (bank->power_mode != OFF_MODE) {
 		bank->power_mode = 0;
 		goto update_gpio_context_count;
@@ -1191,6 +1188,10 @@ static int omap_gpio_runtime_suspend(struct device *dev)
 	 */
 	bank->saved_datain = __raw_readl(bank->base +
 		bank->regs->datain);
+
+	if (!bank->enabled_non_wakeup_gpios)
+		goto update_gpio_context_count;
+
 	l1 = bank->context.fallingdetect;
 	l2 = bank->context.risingdetect;
 
-- 
1.7.0.4