NeilBrown <ne...@suse.de> writes:

> On Fri, 10 Aug 2012 11:49:27 -0700 Kevin Hilman <khil...@ti.com> wrote:
>
>> Hello,
>> 
>> In doing some automated testing of suspend/resume I noticed that
>> repeated attempts to suspend and resume via RTC wakeup fail on
>> 3530/Beagle and 3730/Beagle-xM, but work fine on 3430/n900, 3530/Overo,
>> 3730/OveroSTORM and 4430/Panda.
>> 
>> When RTC wakeup fails, a UART wakeup will work, and in the logs, you'll
>> see this:
>> 
>> [  316.036132] twl: i2c_read failed to transfer all messages
>> [  316.036163] twl4030: I2C error -13 reading PIH ISR
>> 
>> My guess about what might be happening is that very late in the suspend
>> process (during the noirq hooks), a PMIC interrupt fires, but by this
>> time the I2C driver is runtime suspended (and clock gated.)  Since
>> runtime PM is disabled at this point, I2C reads fail, so the twl4030 IRQ
>> driver cannot talk over I2C to the PMIC to determine the interrupt
>> source.
>
> This area seems to be rife with opportunity for bugs.  I wrote about some of
> it here:  https://lwn.net/Articles/482345/

Yeah, that was a great article.

> I don't know that I saw quite what you are seeing though.
>
> If a PMIC interrupt fires during the noirq phase, the interrupt handler
> shouldn't be run (it isn't marked NOSUSPEND).  However there is probably room
> for a race between the 'suspend' phase and the 'noirq' phase.
>
> When the suspend processing handles the I2C device, the last thing that
> __device_suspend does is
>
>               __pm_runtime_disable(dev, false);
>
> which will freeze the current runtime_pm state of the I2C device.  If it is
> off, it stays off.  If on, it stays on.

Correct.

> As the noirq phase hasn't been entered yet an interrupt from the PMIC could
> still be handled.  If it is, you get exactly the error you see.

Right, this is what I suspect is happening.  I just haven't had the time
to confirm and/or fix.

> I'm not convinced that the __pm_runtime_disable call is correct.  It think we
> need to stop async runtime_suspends, but we don't need to stop sync
> runtime_resumes. So just a pm_runtime_get should be enough.  But there  is
> possibly an important point I am missing.

Yes, I went back and forth with Rafael on the interactions between
system PM and runtime PM and he pursuaded me that there were a handful
of reasons that system PM needed to block runtime PM transitions.  I
don't recall what they are at the moment (my brain is already being
pre-flushed in preparation for some time off.)

> However if my analysis is correct, then this can be 'fixed' by changing the
> omap i2c suspend routine to do a pm_runtime_get, and the resume routine to do
> a pm_runtime_put.  The I2C will still be put to sleep during suspend by the
> noirq suspend handler, but we will be sure of it being awake during the
> crucial suspend and resume transition.

Hmm, that's a good idea.  Hopefully someone can give it a try before I
get back from vacation. :)

Thanks for the suggestion.

> See also https://lwn.net/Articles/505683/.  Particularly (towards the end)
>
>     If the device might be needed to power down other devices, such as an I2C
>     controller that might be needed to tell some regulator to turn off, then 
> the
>     device should be activated for runtime PM purposes so that it will still 
> be
>     active when runtime PM is disabled. 
>
> (Rafael reviewed this article so it shouldn't be very far from the mark).
>
>
>> 
>> The real mystery is why this happens on Beagle and Beagle-xM, but none
>> of the other OMAP3 boards (at least the ones I have.)
>
> Maybe didn't components of the PMIC are active and have the potential to
> generate an interrupt at an awkward time.  USB and battery chargers seem good
> at that.
>
> Or maybe due to the particular components active and the particular timing,
> the pm_runtime_disable ends up freezing the runtime_pm state in 'on' rather
> than 'off'.
>
>> 
>> Reproducing is easy.  Simply run rtcwake in a loop:
>> 
>>   # while true; do rtcwake -m mem -s 1; done
>> 
>> In my tests, this happens using omap2plus_defconfig (+ initramfs) on
>> v3.6-rc1, v3.5, v3.4, v3.3 but seems to work fine on v3.2.
>> 
>> I'm going on vacation for a few weeks, so any help debugging this would
>> be greatly appreciated. 
>
> Enjoy your vacation!   I don't suppose it ends up in San Diego in late August
> for one of the multitude of conferences there?

No, I'll miss the summer conferences this year, this vacation will be
unplugged.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to