Re: Issues with calling pm_runtime functions in platform_pm_suspend_noirq/IRQ disabled context.

Kevin Hilman Wed, 04 Aug 2010 14:48:12 -0700

"Basak, Partha" <p-bas...@ti.com> writes:

>> -----Original Message-----
>> From: Kevin Hilman [mailto:khil...@deeprootsystems.com]
>> Sent: Tuesday, August 03, 2010 11:06 PM
>> To: Basak, Partha
>> Cc: Paul Walmsley; linux-omap@vger.kernel.org; Kalliguddi, Hema; Raja,
>> Govindraj; Varadarajan, Charulatha; Nayak, Rajendra; Cousson, Benoit
>> Subject: Re: Issues with calling pm_runtime functions in
>> platform_pm_suspend_noirq/IRQ disabled context.
>> 
>> "Basak, Partha" <p-bas...@ti.com> writes:
>> 
>> > Resending with the corrected mailing list
>> >
>> > Hello Kevin,
>> >
>> > I want to draw your attention to an issue the following patch
>> > introduces. http://dev.omapzoom.org/?p=swarch/linux-omap-
>> adv.git;a=commitdiff;h=8041359e18e49bf8a3d41f15894db9e476f3a8fc
>> 
>> [...]
>> 
>> > I believe, it is not correct to call the pm_runtime APIs from
>> > interrupt locked context since the underlying functions
>> > __pm_runtime_suspend & __pm_runtime_resume enable interrupts and call
>> > schedule().
>> >
>> > Further, from a s/w layering standpoint, platform-bus is a deeper
>> > layer than the Runtime layer, which the runtime layer calls into. So
>> > it may not be correct to have this layer calling into pm_runtime(). It
>> > should directly call omap_device APIs.
>> 
>> The original version of this patch did directly call the omap_device
>> APIs.  However, Paul didn't like that approach and there were
>> use-counting problems to handle as well (e.g. if a driver was not (yet)
>> in use, it would already be disabled, and a suspend would trigger an
>> omap_device_disable() which would fail since device was already
>> disabled.)
>> 
>> Paul and I agreed that this current approach would solve the
>> use-counting problems, but you're correct in that it introduces
>> new problems as these functions can _possibly_ sleep/schedule.
>> 
>> That being said, have you actually seen problems here?   I would like
>> to see how they are triggered?
>
> Scenario 1:
>
> Consider the case where a driver has already called
> pm_runtime_put_sync as part of aggressive clock cutting
> scheme. Subsequently, when system suspend is called,
> pm_runtime_put_sync is called again.


> Due to atomic_dec_and_test check
> on usage_count, the second call goes through w/o error.

I don't think you're fully understanding the point of the put/get in the
suspend/resume path.

The reason for the 'put' in the suspend path is because the PM core has
done a 'get' in dpm_prepare() [c.f. drivers/base/power/main.c] so that
runtime PM transitions are prevented during suspend/resume.

On OMAP, we want to allow those transitions to happen so we can use the
same runtime PM calls in the idle and suspend paths.

> At System resume, pm_runtime_get_sync is called twice, once by resume,
> once by the driver itself. 

Yes, but there is a 'put_sync' in between those done by the PM core
during resume (c.f. dpm_complete() in drivers/base/power/main.c]

> Because of the extra reference count, the
> aggressive clock control by the driver is broken, as call to put_sync
> by a driver reduces to usage_count to 1. As a result clock transition
> in Idle-path is broken.


> Scenario2:
>
> Tested GPIO for Suspend with these changes & obtained dump of Circular 
> Locking dependencies:

I don't think these to problems are related, AFAICT, they are separate
issues.  I'll respond to this scenario in a different reply.

>> 
>> [...]
>> 
>> The UART driver is a special case as it is managed from deep inside the
>> idle path.
>> 
>> > Can you feedback on my observations and comment on the approach taken
>> > for these drivers?
>> 
>> I cannot comment until I understand why these drivers need to
>> enable/disable with interrupts off.
>> 
>> In general, any use of the interrupts-off versions of the omap_device
>> APIs need to be thorougly justified.
>> 
>> Even the UART one will go away when the omap-serial driver is converted
>> to runtime PM.
>>
>
> For GPIO, it is a must to relinquish clocks in Idle-path as it is
> possible to have a GPIO bank requested and still allow PER domain to
> go to OFF.

These the kind of things that I would expect to be discussed/described
in the changelogs to patches posted to the list.

> For USB, this is required because of a h/w issue.

Again, patches with descriptive changelogs describing the "h/w issue"
please.

> My point is, just by exposing a _omap_hwmod_idle()(mutex-free version) will 
> not let us call pm_runtime APIs in Idle path as we are not supposed to enable 
> interrupts there. We have faced problems with USB big-time in Idle path while 
> using pm_runtime functions. So, we are resorting to calling 
> omap_device_idle() in CPU_Idle path, and runtime functions in thread-context. 
>
> Is this acceptable, given the limitations?

No, this is not (yet) acceptable.

The limitations you've come up against are not fully understood yet.
Rather trying to hack around the limitations, we need to fully
understand the root causes, which has not yet been done.

It's possible (and likely) that there are different ways to handle these
problems, and entirely possible that there are locking issues we have to
resolve in the omap_device/hwmod layers as well.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Issues with calling pm_runtime functions in platform_pm_suspend_noirq/IRQ disabled context.

Reply via email to