Re: Common clock and dvfs

2011-05-06 Thread Paul Walmsley

Not that this is particularly related to DVFS, but:

On Thu, 5 May 2011, Colin Cross wrote:

> On Thu, May 5, 2011 at 2:08 PM, Cousson, Benoit  wrote:
>
> > Colin Cross wrote:
> 
> >> omap_hwmod is entirely omap specific, and any generic solution cannot
> >> be based on it.
> >
> > For the moment, because it is a fairly new design, but nothing should
> > prevent us to make it generic if this abstraction is relevant for other SoC.
> 
> That's not how you design abstractions.

Oh really?

> You can't abstract one case, without considering other SoCs, and then 
> make it generic if it fits other SoCs - it will never fit other SoCs.  
> You have to consider all the cases you want it to cover, and design an 
> abstraction that makes sense for the superset.

In actual practice, one often does not know in advance the entire universe 
of cases that one needs to cover.  Even just for one SoC.

Consider that you mentioned earlier that you had to rewrite the Tegra 
clock code several times.  Now, add several other families of SoCs to the 
requirements.  If the documentation for these chips is even available at 
all, it is often misleading or wrong.

Attempting to create an abstraction before one knows the underlying 
requirements of what one is actually trying to abstract is a plan for 
intense suffering.  There's little glory in it.

...

In the specific case of omap_hwmod, the core of the omap_hwmod data 
structures were designed such that they could apply to any SoC with a 
complex interconnect.  The design was based on hardware principles common 
to any SoC: interconnects, IP blocks, reset lines, etc.  There are 
OMAP-specific parts, but if others found omap_hwmod useful, they're 
trivial to abstract.  We haven't sought to force it on others.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Common clock and dvfs

2011-05-06 Thread MyungJoo Ham
On Fri, May 6, 2011 at 6:08 AM, Cousson, Benoit  wrote:
[]
>
> Devices will indeed never care about voltage directly, but that will happen
> indirectly because of:
> - voltage domains dependency: Changing the MPU or IVA voltage domain might
> force the CORE voltage to increase its voltage due to HW limitation. We
> cannot have the CPU at 1GHz while the interconnect is at the lowest OPP.
> - voltage domain increase due to one device frequency increase might force
> the other voltage domain devices to increase their frequency.
> - Thermal management might be a good example as well, but in general
> changing the main contributors frequency (MPU, GPU) should be enough.
>
> In both cases, the indirect voltage change will trigger potentially
> frequency change.
>
> vdd1 <--> vdd2
>  |         |
>  ++    ++
>  |    |    |    |
> devA devB devC devD
>
> With such partitioning, an increase of devA OPP, will increase vdd1 that
> will trigger an increase of vdd2 that will then broadcast to devices that
> belong to it. devC and devD might or not increase their frequency to reduce
> the energy consumption.
> Any devices like processors that can run fast and idle must run at the max
> frequency allowed by the current voltage.

As long as the voltage change in vdd1, which changes vdd2 (vdd1 and 2
are consumers of the same regulator, right?), can update OPP entries
related (enable/disable entries), devfreq can handle this.

If the clocks and devices (A~D) related are using devfreq, disabling,
enabling, and adding OPPs will instantly affect devfreq and adjust
clock frequency based on the enabled OPP entries only. Thus, if a
module is increasing the voltage, it just needs to disable some
low-voltage OPP entries although some set_min/max APIs mentioned by
Colin will be more useful.


-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Common clock and dvfs

2011-05-05 Thread Colin Cross
On Thu, May 5, 2011 at 2:08 PM, Cousson, Benoit  wrote:
> On 5/5/2011 8:11 AM, Colin Cross wrote:
>>
>> On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit  wrote:
>>>
>>> (Cc folks with some DVFS interest)
>>>
>>> Hi Colin,
>>>
>>> On Fri, 22 Apr 2011, Colin Cross wrote:

 Now that we are approaching a common clock management implementation,
 I was thinking it might be the right place to put a common dvfs
 implementation as well.

 It is very common for SoC manufacturers to provide a table of the
 minimum voltage required on a voltage rail for a clock to run at a
 given frequency.  There may be multiple clocks in a voltage rail that
 each can specify their own minimum voltage, and one clock may affect
 multiple voltage rails.  I have seen two ways to handle keeping the
 clocks and voltages within spec:

 The Tegra way is to put everything dvfs related under the clock
 framework.  Enabling (or preparing, in the new clock world) or raising
 the frequency calls dvfs_set_rate before touching the clock, which
 looks up the required voltage on a voltage rail, aggregates it with
 the other voltage requests, and passes the minimum voltage required to
 the regulator api.  Disabling or unpreparing, or lowering the
 frequency changes the clock first, and then calls dvfs_set_rate.  For
 a generic implementation, an SoC would provide the clock/dvfs
 framework with a list of clocks, the voltages required for each
 frequency step on the clock, and the regulator name to change.  The
 frequency/voltage tables are similar to OPP, except that OPP gets
 voltages for a device instead of a clock.  In a few odd cases (Tegra
 always has a few odd cases), a clock that is internal to a device and
 not exposed to the clock framework (pclk output on the display, for
 example) has a voltage requirement, which requires some devices to
 manually call dvfs_set_rate directly, but with a common clock
 framework it would probably be possible for the display driver to
 export pclk as a real clock.
>>>
>>> Those kinds of exceptions are somehow the rules for an OMAP4 device. Most
>>> scalable devices are using some internal dividers or even internal PLL to
>>> control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430
>>> Data
>>> Manual [1] is providing the various clock rate limitation depending of
>>> the
>>> OPP).
>>> And none of these internal dividers are handled by the clock fmwk today.
>>>
>>> For sure, it should be possible to extend the clock data with internal
>>> devices clock nodes (like the UART baud rate divider for example), but
>>> then
>>> we will have to handle a bunch of nodes that may not be always available
>>> depending of device state. In order to do that, you have to tie these
>>> clocks
>>> node to the device that contains them.
>>
>> I agree there are cases where the clock framework may not be a fit for
>> a specific divider, but it would be simple to export the same
>> dvfs_set_rate functions that the generic clk_set_rate calls, and allow
>> drivers that need to scale their own clocks to take advantage of the
>> common tables.
>>
>>> And for the clocks that do not belong to any device, like most PRCM
>>> source
>>> clocks or DPLL inside OMAP, we can easily define a PRCM device or several
>>> CM
>>> (Clock Manager) devices that will handle all these clock nodes.
>>>
 The proposed OMAP4 way (I believe, correct me if I am wrong) is to
 create a new api outside the clock api that calls into both the clock
 api and the regulator api in the correct order for each operation,
 using OPP to determine the voltage.  This has a few disadvantages
 (obviously, I am biased, having written the Tegra code) - clocks and
 voltages are tied to a device, which is not always the case for
 platforms outside of OMAP, and drivers must know if their hardware
 requires voltage scaling.  The clock api becomes unsafe to use on any
 device that requires dvfs, as it could change the frequency higher
 than the supported voltage.
>>>
>>> You have to tie clock and voltage to a device. Most of the time a clock
>>> does
>>> not have any clear relation with a voltage domain. It can even cross
>>> power /
>>> voltage domain without any issue.
>>> The efficiency of the DVFS technique is mainly due to the reduction of
>>> the
>>> voltage rail that supply a device. In order to achieve that you have to
>>> reduce the clock rate of one or several clocks nodes that supply the
>>> critical path inside the HW.
>>
>> A clock crossing a voltage domain is not a problem, a single clock can
>> have relationships to multiple regulators.  But a clock does not need
>> to be tied to a device.  From the silicon perspective, it doesn't
>> matter how you divide up the devices in the kernel, a clock is just a
>> line toggling at a rate, and the maximum speed it can toggle is
>> determined by the silicon it f

Re: Common clock and dvfs

2011-05-05 Thread Cousson, Benoit

On 5/5/2011 8:11 AM, Colin Cross wrote:

On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit  wrote:

(Cc folks with some DVFS interest)

Hi Colin,

On Fri, 22 Apr 2011, Colin Cross wrote:


Now that we are approaching a common clock management implementation,
I was thinking it might be the right place to put a common dvfs
implementation as well.

It is very common for SoC manufacturers to provide a table of the
minimum voltage required on a voltage rail for a clock to run at a
given frequency.  There may be multiple clocks in a voltage rail that
each can specify their own minimum voltage, and one clock may affect
multiple voltage rails.  I have seen two ways to handle keeping the
clocks and voltages within spec:

The Tegra way is to put everything dvfs related under the clock
framework.  Enabling (or preparing, in the new clock world) or raising
the frequency calls dvfs_set_rate before touching the clock, which
looks up the required voltage on a voltage rail, aggregates it with
the other voltage requests, and passes the minimum voltage required to
the regulator api.  Disabling or unpreparing, or lowering the
frequency changes the clock first, and then calls dvfs_set_rate.  For
a generic implementation, an SoC would provide the clock/dvfs
framework with a list of clocks, the voltages required for each
frequency step on the clock, and the regulator name to change.  The
frequency/voltage tables are similar to OPP, except that OPP gets
voltages for a device instead of a clock.  In a few odd cases (Tegra
always has a few odd cases), a clock that is internal to a device and
not exposed to the clock framework (pclk output on the display, for
example) has a voltage requirement, which requires some devices to
manually call dvfs_set_rate directly, but with a common clock
framework it would probably be possible for the display driver to
export pclk as a real clock.


Those kinds of exceptions are somehow the rules for an OMAP4 device. Most
scalable devices are using some internal dividers or even internal PLL to
control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 Data
Manual [1] is providing the various clock rate limitation depending of the
OPP).
And none of these internal dividers are handled by the clock fmwk today.

For sure, it should be possible to extend the clock data with internal
devices clock nodes (like the UART baud rate divider for example), but then
we will have to handle a bunch of nodes that may not be always available
depending of device state. In order to do that, you have to tie these clocks
node to the device that contains them.


I agree there are cases where the clock framework may not be a fit for
a specific divider, but it would be simple to export the same
dvfs_set_rate functions that the generic clk_set_rate calls, and allow
drivers that need to scale their own clocks to take advantage of the
common tables.


And for the clocks that do not belong to any device, like most PRCM source
clocks or DPLL inside OMAP, we can easily define a PRCM device or several CM
(Clock Manager) devices that will handle all these clock nodes.


The proposed OMAP4 way (I believe, correct me if I am wrong) is to
create a new api outside the clock api that calls into both the clock
api and the regulator api in the correct order for each operation,
using OPP to determine the voltage.  This has a few disadvantages
(obviously, I am biased, having written the Tegra code) - clocks and
voltages are tied to a device, which is not always the case for
platforms outside of OMAP, and drivers must know if their hardware
requires voltage scaling.  The clock api becomes unsafe to use on any
device that requires dvfs, as it could change the frequency higher
than the supported voltage.


You have to tie clock and voltage to a device. Most of the time a clock does
not have any clear relation with a voltage domain. It can even cross power /
voltage domain without any issue.
The efficiency of the DVFS technique is mainly due to the reduction of the
voltage rail that supply a device. In order to achieve that you have to
reduce the clock rate of one or several clocks nodes that supply the
critical path inside the HW.


A clock crossing a voltage domain is not a problem, a single clock can
have relationships to multiple regulators.  But a clock does not need
to be tied to a device.  From the silicon perspective, it doesn't
matter how you divide up the devices in the kernel, a clock is just a
line toggling at a rate, and the maximum speed it can toggle is
determined by the silicon it feeds and the voltage that silicon is
operating at.  If a device can be turned on or off, that's a clock
gate, and the line downstream from the clock gate is a separate clock.


Fully agree.

Just to clarify the terminology, I'm using device to represent the IP 
block as well. The mapping is not necessarily one to one, but for most 
relevant IPs this is mostly true. In our case, the hwmod will represent 
the HW device.


My point is that a S

Re: Common clock and dvfs

2011-05-05 Thread Mark Brown
On Wed, May 04, 2011 at 11:50:52PM -0700, Colin Cross wrote:

> True, that was an oversimplificaiton. I meant the minimum voltage that
> scales with clock frequencies only depends on the clock frequency, not
> the device.  Devices do need to be able to specify a higher minimum
> voltage, and the regulator api needs to handle it.

The regulator API already supports this so we're fine there.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Common clock and dvfs

2011-05-04 Thread Colin Cross
On Wed, May 4, 2011 at 11:35 PM, Paul Walmsley  wrote:
> On Wed, 4 May 2011, Colin Cross wrote:
>
>> Imagine a chip where a clock can feed devices A, B, and C.  If the
>> devices are always clocked at the same rate, and can't gate their
>> clocks, the minimum voltage that can be applied to a rail is
>> determined ONLY by the rate of the clock.
>
> That's not so -- although admittedly it's a side issue, and not
> particularly related to DVFS.
>
> For example, the device may have some external I/O lines which need to be
> at least some minimum voltage level for the externally-connected device to
> function.  This minimum voltage level can be unrelated to the device's
> clock frequency.

True, that was an oversimplificaiton. I meant the minimum voltage that
scales with clock frequencies only depends on the clock frequency, not
the device.  Devices do need to be able to specify a higher minimum
voltage, and the regulator api needs to handle it.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Common clock and dvfs

2011-05-04 Thread Paul Walmsley
On Wed, 4 May 2011, Colin Cross wrote:

> Imagine a chip where a clock can feed devices A, B, and C.  If the
> devices are always clocked at the same rate, and can't gate their
> clocks, the minimum voltage that can be applied to a rail is
> determined ONLY by the rate of the clock.

That's not so -- although admittedly it's a side issue, and not 
particularly related to DVFS.

For example, the device may have some external I/O lines which need to be 
at least some minimum voltage level for the externally-connected device to 
function.  This minimum voltage level can be unrelated to the device's 
clock frequency.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Common clock and dvfs

2011-05-04 Thread Paul Walmsley
On Thu, 5 May 2011, Cousson, Benoit wrote:

> Those kinds of exceptions are somehow the rules for an OMAP4 device. 
> Most scalable devices are using some internal dividers or even internal 
> PLL to control the scalable clock rate (DSS, HSI, MMC, McBSP... the 
> OMAP4430 Data Manual [1] is providing the various clock rate limitation 
> depending of the OPP). And none of these internal dividers are handled 
> by the clock fmwk today.

That's mostly because no one has taken the time to implement them, not 
really for any technical reason.

> For sure, it should be possible to extend the clock data with internal 
> devices clock nodes (like the UART baud rate divider for example), but 
> then we will have to handle a bunch of nodes that may not be always 
> available depending of device state. In order to do that, you have to 
> tie these clocks node to the device that contains them.

It's only necessary to do that for the device where the clock's control 
registers are located.  In many cases (almost all on OMAP), this is a 
different device from the device that the clock actually drives.

> And for the clocks that do not belong to any device, like most PRCM 
> source clocks or DPLL inside OMAP, we can easily define a PRCM device or 
> several CM (Clock Manager) devices that will handle all these clock 
> nodes.
> 
> > The proposed OMAP4 way (I believe, correct me if I am wrong) is to 
> > create a new api outside the clock api that calls into both the clock 
> > api and the regulator api in the correct order for each operation, 
> > using OPP to determine the voltage.  This has a few disadvantages 
> > (obviously, I am biased, having written the Tegra code) - clocks and 
> > voltages are tied to a device, which is not always the case for 
> > platforms outside of OMAP, and drivers must know if their hardware 
> > requires voltage scaling.  The clock api becomes unsafe to use on any 
> > device that requires dvfs, as it could change the frequency higher 
> > than the supported voltage.
> 
> You have to tie clock and voltage to a device. 

As you mentioned above, there are several clocks that aren't associated 
with any specific "device" outside of the clock itself, or which are 
associated with multiple devices.

> Most of the time a clock does not have any clear relation with a voltage 
> domain.  It can even cross power / voltage domain without any issue.

Each instance of a clock signal -- a conductor on a chip that carries an 
AC signal that is used to drive some gates -- can only be driven by 
one voltage rail.  How could it be otherwise?

In the unusual instances where a clock crosses voltage rails (by virtue of 
some gates between the rails that handle the translation) and it is 
important for Linux to know this, then in the Linux-OMAP code, the 
intention is for separate struct clks to be used for the clock signals on 
either side of the voltage rail crossing.

> The clock node itself does not know anything about the device and that's 
> why it should not be the proper structure to do DVFS.

What aspects of the device are you referring to that the clock node would 
need to know?

> OMAP moved away from using the clock nodes to represent IP blocks 
> because the clock abstraction was not enough to represent the way an IP 
> is interacting with clocks. That's why omap_hwmod was introduced to 
> represent an IP block.

omap_hwmod was introduced to represent IP blocks and their 
interconnection.  Separating IP block gating from individual clock gating 
was one part of this, but not the only one; and gating isn't really 
related to DVFS.

> Because the clock is not the central piece of the DVFS sequence, I don't 
> think it deserves to handle the whole sequence including voltage 
> scaling.
> 
> A change to a clock rate might trigger a voltage change, but the 
> opposite is true as well. A reduction of the voltage could trigger the 
> clock rate change inside all the devices that belong to the voltage 
> domain. Because of that, both fmwks are siblings. This is not a 
> parent-child relationship.

What's the use case for voltage reduction that isn't triggered by a clock 
rate reduction?

> Another important point is that in order to trigger a DVFS sequence you 
> have to do some voting to take into account shared clock and shared 
> voltage domains.
> 
> Moreover, playing directly with a clock rate is not necessarily 
> appropriate or sufficient for some devices. For example, the 
> interconnect should expose a BW knob instead of a clock rate one.  In 
> general, some more abstract information like BW, latency or performance 
> level (P-state) should be the ones to be exposed at driver level.

It's definitely true, that, say, the SDMA driver should not specify its 
interconnect bandwidth requirements in terms of an interconnect clock 
frequency.  It should specify some variant of bytes per second.  But 
that's only possible because the goal is to provide the interconnect 
driver with have enough informat

Re: Common clock and dvfs

2011-05-04 Thread Colin Cross
On Wed, May 4, 2011 at 10:08 PM, Cousson, Benoit  wrote:
> (Cc folks with some DVFS interest)
>
> Hi Colin,
>
> On Fri, 22 Apr 2011, Colin Cross wrote:
>>
>> Now that we are approaching a common clock management implementation,
>> I was thinking it might be the right place to put a common dvfs
>> implementation as well.
>>
>> It is very common for SoC manufacturers to provide a table of the
>> minimum voltage required on a voltage rail for a clock to run at a
>> given frequency.  There may be multiple clocks in a voltage rail that
>> each can specify their own minimum voltage, and one clock may affect
>> multiple voltage rails.  I have seen two ways to handle keeping the
>> clocks and voltages within spec:
>>
>> The Tegra way is to put everything dvfs related under the clock
>> framework.  Enabling (or preparing, in the new clock world) or raising
>> the frequency calls dvfs_set_rate before touching the clock, which
>> looks up the required voltage on a voltage rail, aggregates it with
>> the other voltage requests, and passes the minimum voltage required to
>> the regulator api.  Disabling or unpreparing, or lowering the
>> frequency changes the clock first, and then calls dvfs_set_rate.  For
>> a generic implementation, an SoC would provide the clock/dvfs
>> framework with a list of clocks, the voltages required for each
>> frequency step on the clock, and the regulator name to change.  The
>> frequency/voltage tables are similar to OPP, except that OPP gets
>> voltages for a device instead of a clock.  In a few odd cases (Tegra
>> always has a few odd cases), a clock that is internal to a device and
>> not exposed to the clock framework (pclk output on the display, for
>> example) has a voltage requirement, which requires some devices to
>> manually call dvfs_set_rate directly, but with a common clock
>> framework it would probably be possible for the display driver to
>> export pclk as a real clock.
>
> Those kinds of exceptions are somehow the rules for an OMAP4 device. Most
> scalable devices are using some internal dividers or even internal PLL to
> control the scalable clock rate (DSS, HSI, MMC, McBSP... the OMAP4430 Data
> Manual [1] is providing the various clock rate limitation depending of the
> OPP).
> And none of these internal dividers are handled by the clock fmwk today.
>
> For sure, it should be possible to extend the clock data with internal
> devices clock nodes (like the UART baud rate divider for example), but then
> we will have to handle a bunch of nodes that may not be always available
> depending of device state. In order to do that, you have to tie these clocks
> node to the device that contains them.

I agree there are cases where the clock framework may not be a fit for
a specific divider, but it would be simple to export the same
dvfs_set_rate functions that the generic clk_set_rate calls, and allow
drivers that need to scale their own clocks to take advantage of the
common tables.

> And for the clocks that do not belong to any device, like most PRCM source
> clocks or DPLL inside OMAP, we can easily define a PRCM device or several CM
> (Clock Manager) devices that will handle all these clock nodes.
>
>> The proposed OMAP4 way (I believe, correct me if I am wrong) is to
>> create a new api outside the clock api that calls into both the clock
>> api and the regulator api in the correct order for each operation,
>> using OPP to determine the voltage.  This has a few disadvantages
>> (obviously, I am biased, having written the Tegra code) - clocks and
>> voltages are tied to a device, which is not always the case for
>> platforms outside of OMAP, and drivers must know if their hardware
>> requires voltage scaling.  The clock api becomes unsafe to use on any
>> device that requires dvfs, as it could change the frequency higher
>> than the supported voltage.
>
> You have to tie clock and voltage to a device. Most of the time a clock does
> not have any clear relation with a voltage domain. It can even cross power /
> voltage domain without any issue.
> The efficiency of the DVFS technique is mainly due to the reduction of the
> voltage rail that supply a device. In order to achieve that you have to
> reduce the clock rate of one or several clocks nodes that supply the
> critical path inside the HW.

A clock crossing a voltage domain is not a problem, a single clock can
have relationships to multiple regulators.  But a clock does not need
to be tied to a device.  From the silicon perspective, it doesn't
matter how you divide up the devices in the kernel, a clock is just a
line toggling at a rate, and the maximum speed it can toggle is
determined by the silicon it feeds and the voltage that silicon is
operating at.  If a device can be turned on or off, that's a clock
gate, and the line downstream from the clock gate is a separate clock.

> The clock node itself does not know anything about the device and that's why
> it should not be the proper structure to do DVFS.

One of us is co

Re: Common clock and dvfs

2011-05-04 Thread Cousson, Benoit

(Cc folks with some DVFS interest)

Hi Colin,

On Fri, 22 Apr 2011, Colin Cross wrote:

Now that we are approaching a common clock management implementation,
I was thinking it might be the right place to put a common dvfs
implementation as well.

It is very common for SoC manufacturers to provide a table of the
minimum voltage required on a voltage rail for a clock to run at a
given frequency.  There may be multiple clocks in a voltage rail that
each can specify their own minimum voltage, and one clock may affect
multiple voltage rails.  I have seen two ways to handle keeping the
clocks and voltages within spec:

The Tegra way is to put everything dvfs related under the clock
framework.  Enabling (or preparing, in the new clock world) or raising
the frequency calls dvfs_set_rate before touching the clock, which
looks up the required voltage on a voltage rail, aggregates it with
the other voltage requests, and passes the minimum voltage required to
the regulator api.  Disabling or unpreparing, or lowering the
frequency changes the clock first, and then calls dvfs_set_rate.  For
a generic implementation, an SoC would provide the clock/dvfs
framework with a list of clocks, the voltages required for each
frequency step on the clock, and the regulator name to change.  The
frequency/voltage tables are similar to OPP, except that OPP gets
voltages for a device instead of a clock.  In a few odd cases (Tegra
always has a few odd cases), a clock that is internal to a device and
not exposed to the clock framework (pclk output on the display, for
example) has a voltage requirement, which requires some devices to
manually call dvfs_set_rate directly, but with a common clock
framework it would probably be possible for the display driver to
export pclk as a real clock.


Those kinds of exceptions are somehow the rules for an OMAP4 device. 
Most scalable devices are using some internal dividers or even internal 
PLL to control the scalable clock rate (DSS, HSI, MMC, McBSP... the 
OMAP4430 Data Manual [1] is providing the various clock rate limitation 
depending of the OPP).

And none of these internal dividers are handled by the clock fmwk today.

For sure, it should be possible to extend the clock data with internal 
devices clock nodes (like the UART baud rate divider for example), but 
then we will have to handle a bunch of nodes that may not be always 
available depending of device state. In order to do that, you have to 
tie these clocks node to the device that contains them.


And for the clocks that do not belong to any device, like most PRCM 
source clocks or DPLL inside OMAP, we can easily define a PRCM device or 
several CM (Clock Manager) devices that will handle all these clock nodes.



The proposed OMAP4 way (I believe, correct me if I am wrong) is to
create a new api outside the clock api that calls into both the clock
api and the regulator api in the correct order for each operation,
using OPP to determine the voltage.  This has a few disadvantages
(obviously, I am biased, having written the Tegra code) - clocks and
voltages are tied to a device, which is not always the case for
platforms outside of OMAP, and drivers must know if their hardware
requires voltage scaling.  The clock api becomes unsafe to use on any
device that requires dvfs, as it could change the frequency higher
than the supported voltage.


You have to tie clock and voltage to a device. Most of the time a clock 
does not have any clear relation with a voltage domain. It can even 
cross power / voltage domain without any issue.
The efficiency of the DVFS technique is mainly due to the reduction of 
the voltage rail that supply a device. In order to achieve that you have 
to reduce the clock rate of one or several clocks nodes that supply the 
critical path inside the HW.


The clock node itself does not know anything about the device and that's 
why it should not be the proper structure to do DVFS.


OMAP moved away from using the clock nodes to represent IP blocks 
because the clock abstraction was not enough to represent the way an IP 
is interacting with clocks. That's why omap_hwmod was introduced to 
represent an IP block.



Is the clock api the right place to do dvfs, or should the clock api
be kept simple, and more complicated operations like dvfs be kept
outside?


In term of SW layering, so far we have the clock fmwk and the regulator 
fmwk. Since DVFS is about both clock and voltage scaling, it makes more 
sense to me to handle DVFS on top of both existing fmwks. Let stick to 
the "do one thing and do it well" principle instead of hacking an 
existing fmwk with what I consider to be an unrelated functionality.


Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so 
extending this fmwk to a devfreq kind of fwmk seems a more logical 
approach to me.


The important point is that IMO, the device should be the central 
component of any DVFS implementation. Both clock and voltage are just 
some device resources that have