Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-30 Thread Nishanth Menon
On 05/29/2014 11:47 PM, Mike Turquette wrote:
> Quoting Nishanth Menon (2014-05-29 16:22:45)
>> On 05/26/2014 08:07 AM, Thierry Reding wrote:
>>> On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
 Quoting Thierry Reding (2014-05-14 07:27:40)
>>> [...]
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
>
> Has anyone ever looked into using a different framework to model all of
> these requirements? PM QoS looks like it might fit, but if none of the
> existing frameworks have what we need, perhaps something new can be
> created.

 It has been discussed. Using a QoS throughput constraint could help
 scale frequency. But this deserves a wider discussion and starts to
 stray into both PM QoS territory and also into "should we have a DVFS
 framework" territory.
>>>
>>> I've looked into this for a bit and it doesn't look like PM QoS is going
>>> to be a good match after all. One of the issues I found was that PM QoS
>>> deals with individual devices and there's no builtin way to collect the
>>> requests from multiple devices to produce a global constraint. So if we
>>> want to add something like that either the API would need to be extended
>>> or it would need to be tacked on using the notifier mechanism and some
>>> way of tracking (and filtering) the individual devices.
>>>
>>> Looking at devfreq it seems to be the DVFS framework that you mentioned,
>>> but from what I can tell it suffers from mostly the same problems. The
>>> governor applies some frequency scaling policy to a single device and
>>> does not allow multiple devices to register constraints against a single
>>> (global) constraint so that the result can be accumulated.
>>>
>>> For Tegra EMC scaling what we need is something more along the lines of
>>> this: we have a resource (external memory) that is shared by multiple
>>> devices in the system. Each of those devices requires a certain amount
>>> of that resource (memory bandwidth). The resource driver will need to
>>> accumulate all requests for the resource and apply the resulting
>>> constraint so that all requests can be satisfied.
>>>
>>> One solution I could imagine to make this work with PM QoS would be to
>>> add the concept of a pm_qos_group to manage a set of pm_qos_requests,
>>> but that will require a bunch of extra checks to make sure that requests
>>> are of the correct type and so on. In other words it would still be
>>> tacked on.
>>
>> just a minor note from previous experience: We(at TI) had attempted in
>> our product kernel[1] to use QoS constraint for certain SoCs for
>> rather unspectacular results.
>>
>> Our use case was similar: devices -> L3(core bus)->memory. We had the
>> following intent:
>> a) wanted to scale L3 based on QoS requests coming in from various
>> device drivers. intent was to scale either to 133MHz or 266MHz (two
>> OPPs we supported on our devices) based on performance needs -> So we
>> asked drivers to report QoS requirements using an standard function -
>> except drivers cannot always report it satisfactorily - example bursty
>> transfer devices - ended up with consolidated requests > total
>> bandwidth possible on the bus -> (and never in practise hitting the
>> lower frequency).
> 
> My opinion on why L3 QoS failed on OMAP is that we only used it for
> DVFS. The voltage domain corresponding to the L3 interconnect had only
> two OPPs, which meant that drivers submitting their constraints only had
> two options: slow vs fast in the best case, or non-functional vs
> functional in the worst case (some IPs simply did not work at the lower
> OPP).
> 
> But the big failure in my opinion was that we did not use any of the
> traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
> the Arteris NoC[1] which has plenty of capabilities for setting
> initiator priorities and bandwidth-throttling on a point-to-point basis,
> which is exactly that QoS is for. If these had been exposed a bit more
> in software then I imagine some of the constraints that always resulted
> in running at the fast OPP might have instead resulted in running at the
> slow OPP, but with a fine-grained mixture of varying bandwidth
> allocations and access priorities.

Thanks Mike for 

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-30 Thread Nishanth Menon
On 05/29/2014 11:47 PM, Mike Turquette wrote:
 Quoting Nishanth Menon (2014-05-29 16:22:45)
 On 05/26/2014 08:07 AM, Thierry Reding wrote:
 On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
 Quoting Thierry Reding (2014-05-14 07:27:40)
 [...]
 As for shared clocks I'm only aware of one use-case, namely EMC scaling.
 Using clocks for that doesn't seem like the best option to me. While it
 can probably fix the immediate issue of choosing an appropriate
 frequency for the EMC clock it isn't a complete solution for the problem
 that we're trying to solve. From what I understand EMC scaling is one
 part of ensuring quality of service. The current implementations of that
 seems to abuse clocks (essentially one X.emc clock per X clock) to
 signal the amount of memory bandwidth required by any given device. But
 there are other parts to the puzzle. Latency allowance is one. The value
 programmed to the latency allowance registers for example depends on the
 EMC frequency.

 Has anyone ever looked into using a different framework to model all of
 these requirements? PM QoS looks like it might fit, but if none of the
 existing frameworks have what we need, perhaps something new can be
 created.

 It has been discussed. Using a QoS throughput constraint could help
 scale frequency. But this deserves a wider discussion and starts to
 stray into both PM QoS territory and also into should we have a DVFS
 framework territory.

 I've looked into this for a bit and it doesn't look like PM QoS is going
 to be a good match after all. One of the issues I found was that PM QoS
 deals with individual devices and there's no builtin way to collect the
 requests from multiple devices to produce a global constraint. So if we
 want to add something like that either the API would need to be extended
 or it would need to be tacked on using the notifier mechanism and some
 way of tracking (and filtering) the individual devices.

 Looking at devfreq it seems to be the DVFS framework that you mentioned,
 but from what I can tell it suffers from mostly the same problems. The
 governor applies some frequency scaling policy to a single device and
 does not allow multiple devices to register constraints against a single
 (global) constraint so that the result can be accumulated.

 For Tegra EMC scaling what we need is something more along the lines of
 this: we have a resource (external memory) that is shared by multiple
 devices in the system. Each of those devices requires a certain amount
 of that resource (memory bandwidth). The resource driver will need to
 accumulate all requests for the resource and apply the resulting
 constraint so that all requests can be satisfied.

 One solution I could imagine to make this work with PM QoS would be to
 add the concept of a pm_qos_group to manage a set of pm_qos_requests,
 but that will require a bunch of extra checks to make sure that requests
 are of the correct type and so on. In other words it would still be
 tacked on.

 just a minor note from previous experience: We(at TI) had attempted in
 our product kernel[1] to use QoS constraint for certain SoCs for
 rather unspectacular results.

 Our use case was similar: devices - L3(core bus)-memory. We had the
 following intent:
 a) wanted to scale L3 based on QoS requests coming in from various
 device drivers. intent was to scale either to 133MHz or 266MHz (two
 OPPs we supported on our devices) based on performance needs - So we
 asked drivers to report QoS requirements using an standard function -
 except drivers cannot always report it satisfactorily - example bursty
 transfer devices - ended up with consolidated requests  total
 bandwidth possible on the bus - (and never in practise hitting the
 lower frequency).
 
 My opinion on why L3 QoS failed on OMAP is that we only used it for
 DVFS. The voltage domain corresponding to the L3 interconnect had only
 two OPPs, which meant that drivers submitting their constraints only had
 two options: slow vs fast in the best case, or non-functional vs
 functional in the worst case (some IPs simply did not work at the lower
 OPP).
 
 But the big failure in my opinion was that we did not use any of the
 traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
 the Arteris NoC[1] which has plenty of capabilities for setting
 initiator priorities and bandwidth-throttling on a point-to-point basis,
 which is exactly that QoS is for. If these had been exposed a bit more
 in software then I imagine some of the constraints that always resulted
 in running at the fast OPP might have instead resulted in running at the
 slow OPP, but with a fine-grained mixture of varying bandwidth
 allocations and access priorities.

Thanks Mike for reminding me about it, I agree that modelling of
complete bus capability was never complete or accurate. that is just
one factor, not to mention some of those abilities are not modifiable
on secure devices without adequate handlers inside secure side 

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-29 Thread Mike Turquette
Quoting Nishanth Menon (2014-05-29 16:22:45)
> On 05/26/2014 08:07 AM, Thierry Reding wrote:
> > On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
> >> Quoting Thierry Reding (2014-05-14 07:27:40)
> > [...]
> >>> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> >>> Using clocks for that doesn't seem like the best option to me. While it
> >>> can probably fix the immediate issue of choosing an appropriate
> >>> frequency for the EMC clock it isn't a complete solution for the problem
> >>> that we're trying to solve. From what I understand EMC scaling is one
> >>> part of ensuring quality of service. The current implementations of that
> >>> seems to abuse clocks (essentially one X.emc clock per X clock) to
> >>> signal the amount of memory bandwidth required by any given device. But
> >>> there are other parts to the puzzle. Latency allowance is one. The value
> >>> programmed to the latency allowance registers for example depends on the
> >>> EMC frequency.
> >>>
> >>> Has anyone ever looked into using a different framework to model all of
> >>> these requirements? PM QoS looks like it might fit, but if none of the
> >>> existing frameworks have what we need, perhaps something new can be
> >>> created.
> >>
> >> It has been discussed. Using a QoS throughput constraint could help
> >> scale frequency. But this deserves a wider discussion and starts to
> >> stray into both PM QoS territory and also into "should we have a DVFS
> >> framework" territory.
> > 
> > I've looked into this for a bit and it doesn't look like PM QoS is going
> > to be a good match after all. One of the issues I found was that PM QoS
> > deals with individual devices and there's no builtin way to collect the
> > requests from multiple devices to produce a global constraint. So if we
> > want to add something like that either the API would need to be extended
> > or it would need to be tacked on using the notifier mechanism and some
> > way of tracking (and filtering) the individual devices.
> > 
> > Looking at devfreq it seems to be the DVFS framework that you mentioned,
> > but from what I can tell it suffers from mostly the same problems. The
> > governor applies some frequency scaling policy to a single device and
> > does not allow multiple devices to register constraints against a single
> > (global) constraint so that the result can be accumulated.
> > 
> > For Tegra EMC scaling what we need is something more along the lines of
> > this: we have a resource (external memory) that is shared by multiple
> > devices in the system. Each of those devices requires a certain amount
> > of that resource (memory bandwidth). The resource driver will need to
> > accumulate all requests for the resource and apply the resulting
> > constraint so that all requests can be satisfied.
> > 
> > One solution I could imagine to make this work with PM QoS would be to
> > add the concept of a pm_qos_group to manage a set of pm_qos_requests,
> > but that will require a bunch of extra checks to make sure that requests
> > are of the correct type and so on. In other words it would still be
> > tacked on.
> 
> just a minor note from previous experience: We(at TI) had attempted in
> our product kernel[1] to use QoS constraint for certain SoCs for
> rather unspectacular results.
> 
> Our use case was similar: devices -> L3(core bus)->memory. We had the
> following intent:
> a) wanted to scale L3 based on QoS requests coming in from various
> device drivers. intent was to scale either to 133MHz or 266MHz (two
> OPPs we supported on our devices) based on performance needs -> So we
> asked drivers to report QoS requirements using an standard function -
> except drivers cannot always report it satisfactorily - example bursty
> transfer devices - ended up with consolidated requests > total
> bandwidth possible on the bus -> (and never in practise hitting the
> lower frequency).

My opinion on why L3 QoS failed on OMAP is that we only used it for
DVFS. The voltage domain corresponding to the L3 interconnect had only
two OPPs, which meant that drivers submitting their constraints only had
two options: slow vs fast in the best case, or non-functional vs
functional in the worst case (some IPs simply did not work at the lower
OPP).

But the big failure in my opinion was that we did not use any of the
traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
the Arteris NoC[1] which has plenty of capabilities for setting
initiator priorities and bandwidth-throttling on a point-to-point basis,
which is exactly that QoS is for. If these had been exposed a bit more
in software then I imagine some of the constraints that always resulted
in running at the fast OPP might have instead resulted in running at the
slow OPP, but with a fine-grained mixture of varying bandwidth
allocations and access priorities.

All of the hardware in the world doesn't do any good if we don't have
software that uses it. Anyways, just my 

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-29 Thread Nishanth Menon
On 05/26/2014 08:07 AM, Thierry Reding wrote:
> On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
>> Quoting Thierry Reding (2014-05-14 07:27:40)
> [...]
>>> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
>>> Using clocks for that doesn't seem like the best option to me. While it
>>> can probably fix the immediate issue of choosing an appropriate
>>> frequency for the EMC clock it isn't a complete solution for the problem
>>> that we're trying to solve. From what I understand EMC scaling is one
>>> part of ensuring quality of service. The current implementations of that
>>> seems to abuse clocks (essentially one X.emc clock per X clock) to
>>> signal the amount of memory bandwidth required by any given device. But
>>> there are other parts to the puzzle. Latency allowance is one. The value
>>> programmed to the latency allowance registers for example depends on the
>>> EMC frequency.
>>>
>>> Has anyone ever looked into using a different framework to model all of
>>> these requirements? PM QoS looks like it might fit, but if none of the
>>> existing frameworks have what we need, perhaps something new can be
>>> created.
>>
>> It has been discussed. Using a QoS throughput constraint could help
>> scale frequency. But this deserves a wider discussion and starts to
>> stray into both PM QoS territory and also into "should we have a DVFS
>> framework" territory.
> 
> I've looked into this for a bit and it doesn't look like PM QoS is going
> to be a good match after all. One of the issues I found was that PM QoS
> deals with individual devices and there's no builtin way to collect the
> requests from multiple devices to produce a global constraint. So if we
> want to add something like that either the API would need to be extended
> or it would need to be tacked on using the notifier mechanism and some
> way of tracking (and filtering) the individual devices.
> 
> Looking at devfreq it seems to be the DVFS framework that you mentioned,
> but from what I can tell it suffers from mostly the same problems. The
> governor applies some frequency scaling policy to a single device and
> does not allow multiple devices to register constraints against a single
> (global) constraint so that the result can be accumulated.
> 
> For Tegra EMC scaling what we need is something more along the lines of
> this: we have a resource (external memory) that is shared by multiple
> devices in the system. Each of those devices requires a certain amount
> of that resource (memory bandwidth). The resource driver will need to
> accumulate all requests for the resource and apply the resulting
> constraint so that all requests can be satisfied.
> 
> One solution I could imagine to make this work with PM QoS would be to
> add the concept of a pm_qos_group to manage a set of pm_qos_requests,
> but that will require a bunch of extra checks to make sure that requests
> are of the correct type and so on. In other words it would still be
> tacked on.

just a minor note from previous experience: We(at TI) had attempted in
our product kernel[1] to use QoS constraint for certain SoCs for
rather unspectacular results.

Our use case was similar: devices -> L3(core bus)->memory. We had the
following intent:
a) wanted to scale L3 based on QoS requests coming in from various
device drivers. intent was to scale either to 133MHz or 266MHz (two
OPPs we supported on our devices) based on performance needs -> So we
asked drivers to report QoS requirements using an standard function -
except drivers cannot always report it satisfactorily - example bursty
transfer devices - ended up with consolidated requests > total
bandwidth possible on the bus -> (and never in practise hitting the
lower frequency).
b) timing closure issues on certain devices such as USB - which can
only function based on async bridge closure requirements on the core
bus etc.. these would require bus to be at higher frequency - QoS
model was "misused" in such requirements.
b.1) a variation: interdependent constraints -> if MPU is > freq X,
timing closure required L3 to be at 266MHz. again - it is not a QoS
requirement perse, just a dependency requirement that cannot easily be
addressed doing a pure QoS like framework solution.

Even though EMC does sound like (a) - I suspect you might want to be
100% sure that you dont have variations of (b) in the SoC as well and
betting completely on QoS approach might not actually work in practice.

> 
> Adding the linux-pm mailing list for more visibility. Perhaps somebody
For folks new on the discussion: complete thread:
http://thread.gmane.org/gmane.linux.drivers.devicetree/73967

> has some ideas on how to extend any of the existing frameworks to make
> it work for Tegra's EMC scaling (or how to implement the requirements of
> Tegra's EMC scaling within the existing frameworks).
> 


[1]
https://android.googlesource.com/kernel/omap.git/+/android-omap-panda-3.0/arch/arm/plat-omap/omap-pm-helper.c

-- 
Regards,

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-29 Thread Nishanth Menon
On 05/26/2014 08:07 AM, Thierry Reding wrote:
 On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
 Quoting Thierry Reding (2014-05-14 07:27:40)
 [...]
 As for shared clocks I'm only aware of one use-case, namely EMC scaling.
 Using clocks for that doesn't seem like the best option to me. While it
 can probably fix the immediate issue of choosing an appropriate
 frequency for the EMC clock it isn't a complete solution for the problem
 that we're trying to solve. From what I understand EMC scaling is one
 part of ensuring quality of service. The current implementations of that
 seems to abuse clocks (essentially one X.emc clock per X clock) to
 signal the amount of memory bandwidth required by any given device. But
 there are other parts to the puzzle. Latency allowance is one. The value
 programmed to the latency allowance registers for example depends on the
 EMC frequency.

 Has anyone ever looked into using a different framework to model all of
 these requirements? PM QoS looks like it might fit, but if none of the
 existing frameworks have what we need, perhaps something new can be
 created.

 It has been discussed. Using a QoS throughput constraint could help
 scale frequency. But this deserves a wider discussion and starts to
 stray into both PM QoS territory and also into should we have a DVFS
 framework territory.
 
 I've looked into this for a bit and it doesn't look like PM QoS is going
 to be a good match after all. One of the issues I found was that PM QoS
 deals with individual devices and there's no builtin way to collect the
 requests from multiple devices to produce a global constraint. So if we
 want to add something like that either the API would need to be extended
 or it would need to be tacked on using the notifier mechanism and some
 way of tracking (and filtering) the individual devices.
 
 Looking at devfreq it seems to be the DVFS framework that you mentioned,
 but from what I can tell it suffers from mostly the same problems. The
 governor applies some frequency scaling policy to a single device and
 does not allow multiple devices to register constraints against a single
 (global) constraint so that the result can be accumulated.
 
 For Tegra EMC scaling what we need is something more along the lines of
 this: we have a resource (external memory) that is shared by multiple
 devices in the system. Each of those devices requires a certain amount
 of that resource (memory bandwidth). The resource driver will need to
 accumulate all requests for the resource and apply the resulting
 constraint so that all requests can be satisfied.
 
 One solution I could imagine to make this work with PM QoS would be to
 add the concept of a pm_qos_group to manage a set of pm_qos_requests,
 but that will require a bunch of extra checks to make sure that requests
 are of the correct type and so on. In other words it would still be
 tacked on.

just a minor note from previous experience: We(at TI) had attempted in
our product kernel[1] to use QoS constraint for certain SoCs for
rather unspectacular results.

Our use case was similar: devices - L3(core bus)-memory. We had the
following intent:
a) wanted to scale L3 based on QoS requests coming in from various
device drivers. intent was to scale either to 133MHz or 266MHz (two
OPPs we supported on our devices) based on performance needs - So we
asked drivers to report QoS requirements using an standard function -
except drivers cannot always report it satisfactorily - example bursty
transfer devices - ended up with consolidated requests  total
bandwidth possible on the bus - (and never in practise hitting the
lower frequency).
b) timing closure issues on certain devices such as USB - which can
only function based on async bridge closure requirements on the core
bus etc.. these would require bus to be at higher frequency - QoS
model was misused in such requirements.
b.1) a variation: interdependent constraints - if MPU is  freq X,
timing closure required L3 to be at 266MHz. again - it is not a QoS
requirement perse, just a dependency requirement that cannot easily be
addressed doing a pure QoS like framework solution.

Even though EMC does sound like (a) - I suspect you might want to be
100% sure that you dont have variations of (b) in the SoC as well and
betting completely on QoS approach might not actually work in practice.

 
 Adding the linux-pm mailing list for more visibility. Perhaps somebody
For folks new on the discussion: complete thread:
http://thread.gmane.org/gmane.linux.drivers.devicetree/73967

 has some ideas on how to extend any of the existing frameworks to make
 it work for Tegra's EMC scaling (or how to implement the requirements of
 Tegra's EMC scaling within the existing frameworks).
 


[1]
https://android.googlesource.com/kernel/omap.git/+/android-omap-panda-3.0/arch/arm/plat-omap/omap-pm-helper.c

-- 
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a 

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-29 Thread Mike Turquette
Quoting Nishanth Menon (2014-05-29 16:22:45)
 On 05/26/2014 08:07 AM, Thierry Reding wrote:
  On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
  Quoting Thierry Reding (2014-05-14 07:27:40)
  [...]
  As for shared clocks I'm only aware of one use-case, namely EMC scaling.
  Using clocks for that doesn't seem like the best option to me. While it
  can probably fix the immediate issue of choosing an appropriate
  frequency for the EMC clock it isn't a complete solution for the problem
  that we're trying to solve. From what I understand EMC scaling is one
  part of ensuring quality of service. The current implementations of that
  seems to abuse clocks (essentially one X.emc clock per X clock) to
  signal the amount of memory bandwidth required by any given device. But
  there are other parts to the puzzle. Latency allowance is one. The value
  programmed to the latency allowance registers for example depends on the
  EMC frequency.
 
  Has anyone ever looked into using a different framework to model all of
  these requirements? PM QoS looks like it might fit, but if none of the
  existing frameworks have what we need, perhaps something new can be
  created.
 
  It has been discussed. Using a QoS throughput constraint could help
  scale frequency. But this deserves a wider discussion and starts to
  stray into both PM QoS territory and also into should we have a DVFS
  framework territory.
  
  I've looked into this for a bit and it doesn't look like PM QoS is going
  to be a good match after all. One of the issues I found was that PM QoS
  deals with individual devices and there's no builtin way to collect the
  requests from multiple devices to produce a global constraint. So if we
  want to add something like that either the API would need to be extended
  or it would need to be tacked on using the notifier mechanism and some
  way of tracking (and filtering) the individual devices.
  
  Looking at devfreq it seems to be the DVFS framework that you mentioned,
  but from what I can tell it suffers from mostly the same problems. The
  governor applies some frequency scaling policy to a single device and
  does not allow multiple devices to register constraints against a single
  (global) constraint so that the result can be accumulated.
  
  For Tegra EMC scaling what we need is something more along the lines of
  this: we have a resource (external memory) that is shared by multiple
  devices in the system. Each of those devices requires a certain amount
  of that resource (memory bandwidth). The resource driver will need to
  accumulate all requests for the resource and apply the resulting
  constraint so that all requests can be satisfied.
  
  One solution I could imagine to make this work with PM QoS would be to
  add the concept of a pm_qos_group to manage a set of pm_qos_requests,
  but that will require a bunch of extra checks to make sure that requests
  are of the correct type and so on. In other words it would still be
  tacked on.
 
 just a minor note from previous experience: We(at TI) had attempted in
 our product kernel[1] to use QoS constraint for certain SoCs for
 rather unspectacular results.
 
 Our use case was similar: devices - L3(core bus)-memory. We had the
 following intent:
 a) wanted to scale L3 based on QoS requests coming in from various
 device drivers. intent was to scale either to 133MHz or 266MHz (two
 OPPs we supported on our devices) based on performance needs - So we
 asked drivers to report QoS requirements using an standard function -
 except drivers cannot always report it satisfactorily - example bursty
 transfer devices - ended up with consolidated requests  total
 bandwidth possible on the bus - (and never in practise hitting the
 lower frequency).

My opinion on why L3 QoS failed on OMAP is that we only used it for
DVFS. The voltage domain corresponding to the L3 interconnect had only
two OPPs, which meant that drivers submitting their constraints only had
two options: slow vs fast in the best case, or non-functional vs
functional in the worst case (some IPs simply did not work at the lower
OPP).

But the big failure in my opinion was that we did not use any of the
traffic-shaping or priority handling features of the L3 NoC. OMAP4 used
the Arteris NoC[1] which has plenty of capabilities for setting
initiator priorities and bandwidth-throttling on a point-to-point basis,
which is exactly that QoS is for. If these had been exposed a bit more
in software then I imagine some of the constraints that always resulted
in running at the fast OPP might have instead resulted in running at the
slow OPP, but with a fine-grained mixture of varying bandwidth
allocations and access priorities.

All of the hardware in the world doesn't do any good if we don't have
software that uses it. Anyways, just my $0.02.

Regards,
Mike

[1] http://www.arteris.com/OMAP4_QoS_security_NoC

 b) timing closure issues on certain devices such as USB - which can
 only function based on 

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-26 Thread Thierry Reding
On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
> Quoting Thierry Reding (2014-05-14 07:27:40)
[...]
> > As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> > Using clocks for that doesn't seem like the best option to me. While it
> > can probably fix the immediate issue of choosing an appropriate
> > frequency for the EMC clock it isn't a complete solution for the problem
> > that we're trying to solve. From what I understand EMC scaling is one
> > part of ensuring quality of service. The current implementations of that
> > seems to abuse clocks (essentially one X.emc clock per X clock) to
> > signal the amount of memory bandwidth required by any given device. But
> > there are other parts to the puzzle. Latency allowance is one. The value
> > programmed to the latency allowance registers for example depends on the
> > EMC frequency.
> > 
> > Has anyone ever looked into using a different framework to model all of
> > these requirements? PM QoS looks like it might fit, but if none of the
> > existing frameworks have what we need, perhaps something new can be
> > created.
> 
> It has been discussed. Using a QoS throughput constraint could help
> scale frequency. But this deserves a wider discussion and starts to
> stray into both PM QoS territory and also into "should we have a DVFS
> framework" territory.

I've looked into this for a bit and it doesn't look like PM QoS is going
to be a good match after all. One of the issues I found was that PM QoS
deals with individual devices and there's no builtin way to collect the
requests from multiple devices to produce a global constraint. So if we
want to add something like that either the API would need to be extended
or it would need to be tacked on using the notifier mechanism and some
way of tracking (and filtering) the individual devices.

Looking at devfreq it seems to be the DVFS framework that you mentioned,
but from what I can tell it suffers from mostly the same problems. The
governor applies some frequency scaling policy to a single device and
does not allow multiple devices to register constraints against a single
(global) constraint so that the result can be accumulated.

For Tegra EMC scaling what we need is something more along the lines of
this: we have a resource (external memory) that is shared by multiple
devices in the system. Each of those devices requires a certain amount
of that resource (memory bandwidth). The resource driver will need to
accumulate all requests for the resource and apply the resulting
constraint so that all requests can be satisfied.

One solution I could imagine to make this work with PM QoS would be to
add the concept of a pm_qos_group to manage a set of pm_qos_requests,
but that will require a bunch of extra checks to make sure that requests
are of the correct type and so on. In other words it would still be
tacked on.

Adding the linux-pm mailing list for more visibility. Perhaps somebody
has some ideas on how to extend any of the existing frameworks to make
it work for Tegra's EMC scaling (or how to implement the requirements of
Tegra's EMC scaling within the existing frameworks).

Thierry


pgp5HG66XLSjA.pgp
Description: PGP signature


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-26 Thread Thierry Reding
On Wed, May 14, 2014 at 12:35:18PM -0700, Mike Turquette wrote:
 Quoting Thierry Reding (2014-05-14 07:27:40)
[...]
  As for shared clocks I'm only aware of one use-case, namely EMC scaling.
  Using clocks for that doesn't seem like the best option to me. While it
  can probably fix the immediate issue of choosing an appropriate
  frequency for the EMC clock it isn't a complete solution for the problem
  that we're trying to solve. From what I understand EMC scaling is one
  part of ensuring quality of service. The current implementations of that
  seems to abuse clocks (essentially one X.emc clock per X clock) to
  signal the amount of memory bandwidth required by any given device. But
  there are other parts to the puzzle. Latency allowance is one. The value
  programmed to the latency allowance registers for example depends on the
  EMC frequency.
  
  Has anyone ever looked into using a different framework to model all of
  these requirements? PM QoS looks like it might fit, but if none of the
  existing frameworks have what we need, perhaps something new can be
  created.
 
 It has been discussed. Using a QoS throughput constraint could help
 scale frequency. But this deserves a wider discussion and starts to
 stray into both PM QoS territory and also into should we have a DVFS
 framework territory.

I've looked into this for a bit and it doesn't look like PM QoS is going
to be a good match after all. One of the issues I found was that PM QoS
deals with individual devices and there's no builtin way to collect the
requests from multiple devices to produce a global constraint. So if we
want to add something like that either the API would need to be extended
or it would need to be tacked on using the notifier mechanism and some
way of tracking (and filtering) the individual devices.

Looking at devfreq it seems to be the DVFS framework that you mentioned,
but from what I can tell it suffers from mostly the same problems. The
governor applies some frequency scaling policy to a single device and
does not allow multiple devices to register constraints against a single
(global) constraint so that the result can be accumulated.

For Tegra EMC scaling what we need is something more along the lines of
this: we have a resource (external memory) that is shared by multiple
devices in the system. Each of those devices requires a certain amount
of that resource (memory bandwidth). The resource driver will need to
accumulate all requests for the resource and apply the resulting
constraint so that all requests can be satisfied.

One solution I could imagine to make this work with PM QoS would be to
add the concept of a pm_qos_group to manage a set of pm_qos_requests,
but that will require a bunch of extra checks to make sure that requests
are of the correct type and so on. In other words it would still be
tacked on.

Adding the linux-pm mailing list for more visibility. Perhaps somebody
has some ideas on how to extend any of the existing frameworks to make
it work for Tegra's EMC scaling (or how to implement the requirements of
Tegra's EMC scaling within the existing frameworks).

Thierry


pgp5HG66XLSjA.pgp
Description: PGP signature


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-16 Thread Mike Turquette
Quoting Stephen Warren (2014-05-15 13:20:21)
> On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
> > On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
> >> * PGP Signed by an unknown key
> >>
> >> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> >>> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
>  Add shared and cbus clocks to the Tegra124 clock implementation.
> >>>
>  diff --git a/include/dt-bindings/clock/tegra124-car.h 
>  b/include/dt-bindings/clock/tegra124-car.h
> >>>
>  +#define TEGRA124_CLK_C2BUS 401
>  +#define TEGRA124_CLK_C3BUS 402
>  +#define TEGRA124_CLK_GR3D_CBUS 403
>  +#define TEGRA124_CLK_GR2D_CBUS 404
> >>> ...
> >>>
> >>> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> >>> but are more a way of SW applying policy to the clock that do exist in
> >>> HW. As such, I'm not convinced it's a good idea to expose these clock
> >>> IDS to DT, since DT is supposed to represent the HW, and not be
> >>> influenced by internal SW implementation details.
> >>>
> >>> Do any DTs actually need to used these new clock IDs? I don't think we
> >>> could actually use these value in e.g. tegra124.dtsi's clocks
> >>> properties, since these clocks don't exist in HW. Was it your intent to
> >>> do that? If not, can't we just define these SW-internal clock IDs in the
> >>> header inside the Tegra clock driver, so the values are invisible to DT?
> >>
> >> I'm beginning to wonder if abusing clocks in this way is really the best
> >> solution. From what I understand there are two problems here that are
> >> mostly orthogonal though they're implemented using similar techniques.
> >>
> >> The reason for introducing cbus clocks are still unclear to me. From the
> >> cover letter of this patch series it seems like these should be
> >> completely hidden from drivers and as such they don't belong in device
> >> tree. Also if they are an implementation detail, why are they even
> >> implemented as clocks? Perhaps an example use-case would help illustrate
> >> the need for this.
> > 
> > We don't have a PLL per engine, hence we have to use a PLL as parent for
> > several module clocks. However you can't change a PLLs rate with
> > active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need 
> > to
> > change their parent to a different PLL, change the original PLL rate and 
> > change
> > the parent back to the original PLL, all while ensuring you never exceed the
> > maximum allowed clock at the current voltage. You also want to take into
> > account if a module is clocked so you don't bother handling clocks which are
> > disabled. (eg. if only the VIC clock is enabled, there is no point in 
> > changing
> > the MSENC parent). All this is handled by the 'cbus' clock.
> 
> Presumably though we can handle this "cbus" concept entirely inside the
> clock driver.
> 
> What happens right now is that when a DT node references a clock, the
> driver gets a clock and then manipulates it directly. What if the clock
> core was reworked a bit such that every single clock was a "cbus" clock.
> clk_get() wouldn't return the raw clock object itself, but rather a
> "clock client" object, which would forward requests on to the underlying
> clk. If there's only 1 clk_get(), there's only 1 client, so all requests
> get forwarded automatically. If there are n clk_get_requests(), the
> clock object gets to implement the appropriate voting/... algorithm to
> mediate the requests.

This was proposed before[1][2] and is something that would be great to
have. The scary thing is to start introducing policy into the clock
framework, which I'd like to avoid as much as possible. But arbitration
of resources (with requisite reference counting) is pretty much
non-existent for clock rates (last call to clk_set_rate always wins),
and is very rudimentary for prepare/enable (we have use counting, but it
does not track individual clients/clock consumers).

Revisiting Rabin's patches has been at the bottom of my todo list for a
while now. I'm happy for someone else to take a crack at it.

Regards,
Mike

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135290.html
[2] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135574.html

> 
> That way, we don't have to expose any of this logic in the device tree,
> or hopefully/mostly even outside the HW clock's implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-16 Thread Mike Turquette
Quoting Stephen Warren (2014-05-15 13:20:21)
 On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
  On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
  * PGP Signed by an unknown key
 
  On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
  On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
  Add shared and cbus clocks to the Tegra124 clock implementation.
 
  diff --git a/include/dt-bindings/clock/tegra124-car.h 
  b/include/dt-bindings/clock/tegra124-car.h
 
  +#define TEGRA124_CLK_C2BUS 401
  +#define TEGRA124_CLK_C3BUS 402
  +#define TEGRA124_CLK_GR3D_CBUS 403
  +#define TEGRA124_CLK_GR2D_CBUS 404
  ...
 
  I worry about this a bit. IIUC, these clocks don't actually exist in HW,
  but are more a way of SW applying policy to the clock that do exist in
  HW. As such, I'm not convinced it's a good idea to expose these clock
  IDS to DT, since DT is supposed to represent the HW, and not be
  influenced by internal SW implementation details.
 
  Do any DTs actually need to used these new clock IDs? I don't think we
  could actually use these value in e.g. tegra124.dtsi's clocks
  properties, since these clocks don't exist in HW. Was it your intent to
  do that? If not, can't we just define these SW-internal clock IDs in the
  header inside the Tegra clock driver, so the values are invisible to DT?
 
  I'm beginning to wonder if abusing clocks in this way is really the best
  solution. From what I understand there are two problems here that are
  mostly orthogonal though they're implemented using similar techniques.
 
  The reason for introducing cbus clocks are still unclear to me. From the
  cover letter of this patch series it seems like these should be
  completely hidden from drivers and as such they don't belong in device
  tree. Also if they are an implementation detail, why are they even
  implemented as clocks? Perhaps an example use-case would help illustrate
  the need for this.
  
  We don't have a PLL per engine, hence we have to use a PLL as parent for
  several module clocks. However you can't change a PLLs rate with
  active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need 
  to
  change their parent to a different PLL, change the original PLL rate and 
  change
  the parent back to the original PLL, all while ensuring you never exceed the
  maximum allowed clock at the current voltage. You also want to take into
  account if a module is clocked so you don't bother handling clocks which are
  disabled. (eg. if only the VIC clock is enabled, there is no point in 
  changing
  the MSENC parent). All this is handled by the 'cbus' clock.
 
 Presumably though we can handle this cbus concept entirely inside the
 clock driver.
 
 What happens right now is that when a DT node references a clock, the
 driver gets a clock and then manipulates it directly. What if the clock
 core was reworked a bit such that every single clock was a cbus clock.
 clk_get() wouldn't return the raw clock object itself, but rather a
 clock client object, which would forward requests on to the underlying
 clk. If there's only 1 clk_get(), there's only 1 client, so all requests
 get forwarded automatically. If there are n clk_get_requests(), the
 clock object gets to implement the appropriate voting/... algorithm to
 mediate the requests.

This was proposed before[1][2] and is something that would be great to
have. The scary thing is to start introducing policy into the clock
framework, which I'd like to avoid as much as possible. But arbitration
of resources (with requisite reference counting) is pretty much
non-existent for clock rates (last call to clk_set_rate always wins),
and is very rudimentary for prepare/enable (we have use counting, but it
does not track individual clients/clock consumers).

Revisiting Rabin's patches has been at the bottom of my todo list for a
while now. I'm happy for someone else to take a crack at it.

Regards,
Mike

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135290.html
[2] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-November/135574.html

 
 That way, we don't have to expose any of this logic in the device tree,
 or hopefully/mostly even outside the HW clock's implementation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Stephen Warren
On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
> On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
>> * PGP Signed by an unknown key
>>
>> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
>>> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
 Add shared and cbus clocks to the Tegra124 clock implementation.
>>>
 diff --git a/include/dt-bindings/clock/tegra124-car.h 
 b/include/dt-bindings/clock/tegra124-car.h
>>>
 +#define TEGRA124_CLK_C2BUS 401
 +#define TEGRA124_CLK_C3BUS 402
 +#define TEGRA124_CLK_GR3D_CBUS 403
 +#define TEGRA124_CLK_GR2D_CBUS 404
>>> ...
>>>
>>> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
>>> but are more a way of SW applying policy to the clock that do exist in
>>> HW. As such, I'm not convinced it's a good idea to expose these clock
>>> IDS to DT, since DT is supposed to represent the HW, and not be
>>> influenced by internal SW implementation details.
>>>
>>> Do any DTs actually need to used these new clock IDs? I don't think we
>>> could actually use these value in e.g. tegra124.dtsi's clocks
>>> properties, since these clocks don't exist in HW. Was it your intent to
>>> do that? If not, can't we just define these SW-internal clock IDs in the
>>> header inside the Tegra clock driver, so the values are invisible to DT?
>>
>> I'm beginning to wonder if abusing clocks in this way is really the best
>> solution. From what I understand there are two problems here that are
>> mostly orthogonal though they're implemented using similar techniques.
>>
>> The reason for introducing cbus clocks are still unclear to me. From the
>> cover letter of this patch series it seems like these should be
>> completely hidden from drivers and as such they don't belong in device
>> tree. Also if they are an implementation detail, why are they even
>> implemented as clocks? Perhaps an example use-case would help illustrate
>> the need for this.
> 
> We don't have a PLL per engine, hence we have to use a PLL as parent for
> several module clocks. However you can't change a PLLs rate with
> active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
> change their parent to a different PLL, change the original PLL rate and 
> change
> the parent back to the original PLL, all while ensuring you never exceed the
> maximum allowed clock at the current voltage. You also want to take into
> account if a module is clocked so you don't bother handling clocks which are
> disabled. (eg. if only the VIC clock is enabled, there is no point in changing
> the MSENC parent). All this is handled by the 'cbus' clock.

Presumably though we can handle this "cbus" concept entirely inside the
clock driver.

What happens right now is that when a DT node references a clock, the
driver gets a clock and then manipulates it directly. What if the clock
core was reworked a bit such that every single clock was a "cbus" clock.
clk_get() wouldn't return the raw clock object itself, but rather a
"clock client" object, which would forward requests on to the underlying
clk. If there's only 1 clk_get(), there's only 1 client, so all requests
get forwarded automatically. If there are n clk_get_requests(), the
clock object gets to implement the appropriate voting/... algorithm to
mediate the requests.

That way, we don't have to expose any of this logic in the device tree,
or hopefully/mostly even outside the HW clock's implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 09:35:18PM +0200, Mike Turquette wrote:
> Quoting Thierry Reding (2014-05-14 07:27:40)
> > On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > > 
> > > > diff --git a/include/dt-bindings/clock/tegra124-car.h 
> > > > b/include/dt-bindings/clock/tegra124-car.h
> > > 
> > > > +#define TEGRA124_CLK_C2BUS 401
> > > > +#define TEGRA124_CLK_C3BUS 402
> > > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > > ...
> > > 
> > > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > > but are more a way of SW applying policy to the clock that do exist in
> > > HW. As such, I'm not convinced it's a good idea to expose these clock
> > > IDS to DT, since DT is supposed to represent the HW, and not be
> > > influenced by internal SW implementation details.
> > > 
> > > Do any DTs actually need to used these new clock IDs? I don't think we
> > > could actually use these value in e.g. tegra124.dtsi's clocks
> > > properties, since these clocks don't exist in HW. Was it your intent to
> > > do that? If not, can't we just define these SW-internal clock IDs in the
> > > header inside the Tegra clock driver, so the values are invisible to DT?
> > 
> > I'm beginning to wonder if abusing clocks in this way is really the best
> > solution. From what I understand there are two problems here that are
> > mostly orthogonal though they're implemented using similar techniques.
> 
> Ack. "Virtual clocks" have been implemented by vendors before as a way
> to manage complicated clock rate changes. I do not think we should
> support such a method upstream.
> 
> I'm working with another engineer in Linaro on a "coordinated clock rate
> change" series that might help solve some of the problems that this
> patch series is trying to achieve.
> 

Any preview? :)

For us to be useful it needs to be possible to:

1) change to a different parent during a rate change
2) adjust a clocks divider when changing parents
3) ignore disabled child clocks
4) have notifiers to hook voltage scaling into

> > 
> > The reason for introducing cbus clocks are still unclear to me. From the
> > cover letter of this patch series it seems like these should be
> > completely hidden from drivers and as such they don't belong in device
> > tree. Also if they are an implementation detail, why are they even
> > implemented as clocks? Perhaps an example use-case would help illustrate
> > the need for this.
> 
> I also have this question. Does "cbus" come from your TRM or data sheet?
> Or is it purely a software solution to coordinating rate changes within
> known limits and for validated combinations?
> 

cbus is a software solution. It's not menioned in any TRM or hardware
document.

Cheers,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > 
> > > diff --git a/include/dt-bindings/clock/tegra124-car.h 
> > > b/include/dt-bindings/clock/tegra124-car.h
> > 
> > > +#define TEGRA124_CLK_C2BUS 401
> > > +#define TEGRA124_CLK_C3BUS 402
> > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > ...
> > 
> > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > but are more a way of SW applying policy to the clock that do exist in
> > HW. As such, I'm not convinced it's a good idea to expose these clock
> > IDS to DT, since DT is supposed to represent the HW, and not be
> > influenced by internal SW implementation details.
> > 
> > Do any DTs actually need to used these new clock IDs? I don't think we
> > could actually use these value in e.g. tegra124.dtsi's clocks
> > properties, since these clocks don't exist in HW. Was it your intent to
> > do that? If not, can't we just define these SW-internal clock IDs in the
> > header inside the Tegra clock driver, so the values are invisible to DT?
> 
> I'm beginning to wonder if abusing clocks in this way is really the best
> solution. From what I understand there are two problems here that are
> mostly orthogonal though they're implemented using similar techniques.
> 
> The reason for introducing cbus clocks are still unclear to me. From the
> cover letter of this patch series it seems like these should be
> completely hidden from drivers and as such they don't belong in device
> tree. Also if they are an implementation detail, why are they even
> implemented as clocks? Perhaps an example use-case would help illustrate
> the need for this.

We don't have a PLL per engine, hence we have to use a PLL as parent for
several module clocks. However you can't change a PLLs rate with
active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
change their parent to a different PLL, change the original PLL rate and change
the parent back to the original PLL, all while ensuring you never exceed the
maximum allowed clock at the current voltage. You also want to take into
account if a module is clocked so you don't bother handling clocks which are
disabled. (eg. if only the VIC clock is enabled, there is no point in changing
the MSENC parent). All this is handled by the 'cbus' clock.

> 
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
> 

There are more usecases:

1) sclk scaling, which is similar to emc in that it has many modules who want
to influence this clock. The problem here is that sclk clocks the AVP
and is used as a parent for the AHB and APB clocks. So there are many drivers
who want to vote on this.

2) thermal capping. We want to limit eg the GPU clockrate due to thermals.

3) 'cbus' scaling. We want to scale the PLLs clocking several modules and
therefore several drivers need to be able to 'vote' on the rate. Also here
we don't want to take disabled clocks into account for the final rate
calculation.

Case 1 and 2 could presumably be handled by PM QoS, although for case 2 we
need to enforce an upper bound rather than a minimum rate. The PM QoS
maintainer has up to now rejected any patches which add PM QoS constraints
to limit a clock or another variable. For case 1 we could add a 
'bus throughput' QoS parameter to control sclk. The units would then be
MiB/s or something simlar. However sclk also clocks the AVP and would be
rather strange to require a driver to set a certain bus throughput requirement
to ensure the AVP runs fast enough.

For case 3, I don't see any existing mechanism to handle this. I don't think
PM QoS is helping here because PM QoS is supposed to deal with higher level
units (eg MiB/s), but in this case the only relationship between the modules
is that we run from the same PLL. So there is no higher level unit which makes
sense here.


Cheers,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 07:58:26PM +0200, Andrew Bresticker wrote:
> On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
>  wrote:
> > As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> > Using clocks for that doesn't seem like the best option to me. While it
> > can probably fix the immediate issue of choosing an appropriate
> > frequency for the EMC clock it isn't a complete solution for the problem
> > that we're trying to solve. From what I understand EMC scaling is one
> > part of ensuring quality of service. The current implementations of that
> > seems to abuse clocks (essentially one X.emc clock per X clock) to
> > signal the amount of memory bandwidth required by any given device. But
> > there are other parts to the puzzle. Latency allowance is one. The value
> > programmed to the latency allowance registers for example depends on the
> > EMC frequency.
> >
> > Has anyone ever looked into using a different framework to model all of
> > these requirements? PM QoS looks like it might fit, but if none of the
> > existing frameworks have what we need, perhaps something new can be
> > created.
> 
> On Exynos we use devfreq, though in that case we monitor performance
> counters to determine how internal buses should be scaled - not sure
> if Tegra SoCs have similar counters that could be used for this
> purpose.  It seems like EMC scaling would fit nicely within the PM QoS
> framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

We do have counters, however, counters are reactive which is a problem
for some isochronous clients (eg. display). Counters also don't solve
the latency problem.

Cheers,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 07:58:26PM +0200, Andrew Bresticker wrote:
 On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
 thierry.red...@gmail.com wrote:
  As for shared clocks I'm only aware of one use-case, namely EMC scaling.
  Using clocks for that doesn't seem like the best option to me. While it
  can probably fix the immediate issue of choosing an appropriate
  frequency for the EMC clock it isn't a complete solution for the problem
  that we're trying to solve. From what I understand EMC scaling is one
  part of ensuring quality of service. The current implementations of that
  seems to abuse clocks (essentially one X.emc clock per X clock) to
  signal the amount of memory bandwidth required by any given device. But
  there are other parts to the puzzle. Latency allowance is one. The value
  programmed to the latency allowance registers for example depends on the
  EMC frequency.
 
  Has anyone ever looked into using a different framework to model all of
  these requirements? PM QoS looks like it might fit, but if none of the
  existing frameworks have what we need, perhaps something new can be
  created.
 
 On Exynos we use devfreq, though in that case we monitor performance
 counters to determine how internal buses should be scaled - not sure
 if Tegra SoCs have similar counters that could be used for this
 purpose.  It seems like EMC scaling would fit nicely within the PM QoS
 framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

We do have counters, however, counters are reactive which is a problem
for some isochronous clients (eg. display). Counters also don't solve
the latency problem.

Cheers,

Peter.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
  On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
   Add shared and cbus clocks to the Tegra124 clock implementation.
  
   diff --git a/include/dt-bindings/clock/tegra124-car.h 
   b/include/dt-bindings/clock/tegra124-car.h
  
   +#define TEGRA124_CLK_C2BUS 401
   +#define TEGRA124_CLK_C3BUS 402
   +#define TEGRA124_CLK_GR3D_CBUS 403
   +#define TEGRA124_CLK_GR2D_CBUS 404
  ...
  
  I worry about this a bit. IIUC, these clocks don't actually exist in HW,
  but are more a way of SW applying policy to the clock that do exist in
  HW. As such, I'm not convinced it's a good idea to expose these clock
  IDS to DT, since DT is supposed to represent the HW, and not be
  influenced by internal SW implementation details.
  
  Do any DTs actually need to used these new clock IDs? I don't think we
  could actually use these value in e.g. tegra124.dtsi's clocks
  properties, since these clocks don't exist in HW. Was it your intent to
  do that? If not, can't we just define these SW-internal clock IDs in the
  header inside the Tegra clock driver, so the values are invisible to DT?
 
 I'm beginning to wonder if abusing clocks in this way is really the best
 solution. From what I understand there are two problems here that are
 mostly orthogonal though they're implemented using similar techniques.
 
 The reason for introducing cbus clocks are still unclear to me. From the
 cover letter of this patch series it seems like these should be
 completely hidden from drivers and as such they don't belong in device
 tree. Also if they are an implementation detail, why are they even
 implemented as clocks? Perhaps an example use-case would help illustrate
 the need for this.

We don't have a PLL per engine, hence we have to use a PLL as parent for
several module clocks. However you can't change a PLLs rate with
active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
change their parent to a different PLL, change the original PLL rate and change
the parent back to the original PLL, all while ensuring you never exceed the
maximum allowed clock at the current voltage. You also want to take into
account if a module is clocked so you don't bother handling clocks which are
disabled. (eg. if only the VIC clock is enabled, there is no point in changing
the MSENC parent). All this is handled by the 'cbus' clock.

 
 As for shared clocks I'm only aware of one use-case, namely EMC scaling.
 Using clocks for that doesn't seem like the best option to me. While it
 can probably fix the immediate issue of choosing an appropriate
 frequency for the EMC clock it isn't a complete solution for the problem
 that we're trying to solve. From what I understand EMC scaling is one
 part of ensuring quality of service. The current implementations of that
 seems to abuse clocks (essentially one X.emc clock per X clock) to
 signal the amount of memory bandwidth required by any given device. But
 there are other parts to the puzzle. Latency allowance is one. The value
 programmed to the latency allowance registers for example depends on the
 EMC frequency.
 

There are more usecases:

1) sclk scaling, which is similar to emc in that it has many modules who want
to influence this clock. The problem here is that sclk clocks the AVP
and is used as a parent for the AHB and APB clocks. So there are many drivers
who want to vote on this.

2) thermal capping. We want to limit eg the GPU clockrate due to thermals.

3) 'cbus' scaling. We want to scale the PLLs clocking several modules and
therefore several drivers need to be able to 'vote' on the rate. Also here
we don't want to take disabled clocks into account for the final rate
calculation.

Case 1 and 2 could presumably be handled by PM QoS, although for case 2 we
need to enforce an upper bound rather than a minimum rate. The PM QoS
maintainer has up to now rejected any patches which add PM QoS constraints
to limit a clock or another variable. For case 1 we could add a 
'bus throughput' QoS parameter to control sclk. The units would then be
MiB/s or something simlar. However sclk also clocks the AVP and would be
rather strange to require a driver to set a certain bus throughput requirement
to ensure the AVP runs fast enough.

For case 3, I don't see any existing mechanism to handle this. I don't think
PM QoS is helping here because PM QoS is supposed to deal with higher level
units (eg MiB/s), but in this case the only relationship between the modules
is that we run from the same PLL. So there is no higher level unit which makes
sense here.


Cheers,

Peter.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Peter De Schrijver
On Wed, May 14, 2014 at 09:35:18PM +0200, Mike Turquette wrote:
 Quoting Thierry Reding (2014-05-14 07:27:40)
  On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
   On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
Add shared and cbus clocks to the Tegra124 clock implementation.
   
diff --git a/include/dt-bindings/clock/tegra124-car.h 
b/include/dt-bindings/clock/tegra124-car.h
   
+#define TEGRA124_CLK_C2BUS 401
+#define TEGRA124_CLK_C3BUS 402
+#define TEGRA124_CLK_GR3D_CBUS 403
+#define TEGRA124_CLK_GR2D_CBUS 404
   ...
   
   I worry about this a bit. IIUC, these clocks don't actually exist in HW,
   but are more a way of SW applying policy to the clock that do exist in
   HW. As such, I'm not convinced it's a good idea to expose these clock
   IDS to DT, since DT is supposed to represent the HW, and not be
   influenced by internal SW implementation details.
   
   Do any DTs actually need to used these new clock IDs? I don't think we
   could actually use these value in e.g. tegra124.dtsi's clocks
   properties, since these clocks don't exist in HW. Was it your intent to
   do that? If not, can't we just define these SW-internal clock IDs in the
   header inside the Tegra clock driver, so the values are invisible to DT?
  
  I'm beginning to wonder if abusing clocks in this way is really the best
  solution. From what I understand there are two problems here that are
  mostly orthogonal though they're implemented using similar techniques.
 
 Ack. Virtual clocks have been implemented by vendors before as a way
 to manage complicated clock rate changes. I do not think we should
 support such a method upstream.
 
 I'm working with another engineer in Linaro on a coordinated clock rate
 change series that might help solve some of the problems that this
 patch series is trying to achieve.
 

Any preview? :)

For us to be useful it needs to be possible to:

1) change to a different parent during a rate change
2) adjust a clocks divider when changing parents
3) ignore disabled child clocks
4) have notifiers to hook voltage scaling into

  
  The reason for introducing cbus clocks are still unclear to me. From the
  cover letter of this patch series it seems like these should be
  completely hidden from drivers and as such they don't belong in device
  tree. Also if they are an implementation detail, why are they even
  implemented as clocks? Perhaps an example use-case would help illustrate
  the need for this.
 
 I also have this question. Does cbus come from your TRM or data sheet?
 Or is it purely a software solution to coordinating rate changes within
 known limits and for validated combinations?
 

cbus is a software solution. It's not menioned in any TRM or hardware
document.

Cheers,

Peter.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-15 Thread Stephen Warren
On 05/15/2014 04:52 AM, Peter De Schrijver wrote:
 On Wed, May 14, 2014 at 04:27:40PM +0200, Thierry Reding wrote:
 * PGP Signed by an unknown key

 On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
 On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
 Add shared and cbus clocks to the Tegra124 clock implementation.

 diff --git a/include/dt-bindings/clock/tegra124-car.h 
 b/include/dt-bindings/clock/tegra124-car.h

 +#define TEGRA124_CLK_C2BUS 401
 +#define TEGRA124_CLK_C3BUS 402
 +#define TEGRA124_CLK_GR3D_CBUS 403
 +#define TEGRA124_CLK_GR2D_CBUS 404
 ...

 I worry about this a bit. IIUC, these clocks don't actually exist in HW,
 but are more a way of SW applying policy to the clock that do exist in
 HW. As such, I'm not convinced it's a good idea to expose these clock
 IDS to DT, since DT is supposed to represent the HW, and not be
 influenced by internal SW implementation details.

 Do any DTs actually need to used these new clock IDs? I don't think we
 could actually use these value in e.g. tegra124.dtsi's clocks
 properties, since these clocks don't exist in HW. Was it your intent to
 do that? If not, can't we just define these SW-internal clock IDs in the
 header inside the Tegra clock driver, so the values are invisible to DT?

 I'm beginning to wonder if abusing clocks in this way is really the best
 solution. From what I understand there are two problems here that are
 mostly orthogonal though they're implemented using similar techniques.

 The reason for introducing cbus clocks are still unclear to me. From the
 cover letter of this patch series it seems like these should be
 completely hidden from drivers and as such they don't belong in device
 tree. Also if they are an implementation detail, why are they even
 implemented as clocks? Perhaps an example use-case would help illustrate
 the need for this.
 
 We don't have a PLL per engine, hence we have to use a PLL as parent for
 several module clocks. However you can't change a PLLs rate with
 active clients. So for scaling the PLL clocking eg. VIC or MSENC, you need to
 change their parent to a different PLL, change the original PLL rate and 
 change
 the parent back to the original PLL, all while ensuring you never exceed the
 maximum allowed clock at the current voltage. You also want to take into
 account if a module is clocked so you don't bother handling clocks which are
 disabled. (eg. if only the VIC clock is enabled, there is no point in changing
 the MSENC parent). All this is handled by the 'cbus' clock.

Presumably though we can handle this cbus concept entirely inside the
clock driver.

What happens right now is that when a DT node references a clock, the
driver gets a clock and then manipulates it directly. What if the clock
core was reworked a bit such that every single clock was a cbus clock.
clk_get() wouldn't return the raw clock object itself, but rather a
clock client object, which would forward requests on to the underlying
clk. If there's only 1 clk_get(), there's only 1 client, so all requests
get forwarded automatically. If there are n clk_get_requests(), the
clock object gets to implement the appropriate voting/... algorithm to
mediate the requests.

That way, we don't have to expose any of this logic in the device tree,
or hopefully/mostly even outside the HW clock's implementation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Mike Turquette
Quoting Thierry Reding (2014-05-14 07:27:40)
> On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> > On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > > Add shared and cbus clocks to the Tegra124 clock implementation.
> > 
> > > diff --git a/include/dt-bindings/clock/tegra124-car.h 
> > > b/include/dt-bindings/clock/tegra124-car.h
> > 
> > > +#define TEGRA124_CLK_C2BUS 401
> > > +#define TEGRA124_CLK_C3BUS 402
> > > +#define TEGRA124_CLK_GR3D_CBUS 403
> > > +#define TEGRA124_CLK_GR2D_CBUS 404
> > ...
> > 
> > I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> > but are more a way of SW applying policy to the clock that do exist in
> > HW. As such, I'm not convinced it's a good idea to expose these clock
> > IDS to DT, since DT is supposed to represent the HW, and not be
> > influenced by internal SW implementation details.
> > 
> > Do any DTs actually need to used these new clock IDs? I don't think we
> > could actually use these value in e.g. tegra124.dtsi's clocks
> > properties, since these clocks don't exist in HW. Was it your intent to
> > do that? If not, can't we just define these SW-internal clock IDs in the
> > header inside the Tegra clock driver, so the values are invisible to DT?
> 
> I'm beginning to wonder if abusing clocks in this way is really the best
> solution. From what I understand there are two problems here that are
> mostly orthogonal though they're implemented using similar techniques.

Ack. "Virtual clocks" have been implemented by vendors before as a way
to manage complicated clock rate changes. I do not think we should
support such a method upstream.

I'm working with another engineer in Linaro on a "coordinated clock rate
change" series that might help solve some of the problems that this
patch series is trying to achieve.

> 
> The reason for introducing cbus clocks are still unclear to me. From the
> cover letter of this patch series it seems like these should be
> completely hidden from drivers and as such they don't belong in device
> tree. Also if they are an implementation detail, why are they even
> implemented as clocks? Perhaps an example use-case would help illustrate
> the need for this.

I also have this question. Does "cbus" come from your TRM or data sheet?
Or is it purely a software solution to coordinating rate changes within
known limits and for validated combinations?

> 
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
> 
> Has anyone ever looked into using a different framework to model all of
> these requirements? PM QoS looks like it might fit, but if none of the
> existing frameworks have what we need, perhaps something new can be
> created.

It has been discussed. Using a QoS throughput constraint could help
scale frequency. But this deserves a wider discussion and starts to
stray into both PM QoS territory and also into "should we have a DVFS
framework" territory.

Regards,
Mike

> 
> Thierry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Andrew Bresticker
On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
 wrote:
> As for shared clocks I'm only aware of one use-case, namely EMC scaling.
> Using clocks for that doesn't seem like the best option to me. While it
> can probably fix the immediate issue of choosing an appropriate
> frequency for the EMC clock it isn't a complete solution for the problem
> that we're trying to solve. From what I understand EMC scaling is one
> part of ensuring quality of service. The current implementations of that
> seems to abuse clocks (essentially one X.emc clock per X clock) to
> signal the amount of memory bandwidth required by any given device. But
> there are other parts to the puzzle. Latency allowance is one. The value
> programmed to the latency allowance registers for example depends on the
> EMC frequency.
>
> Has anyone ever looked into using a different framework to model all of
> these requirements? PM QoS looks like it might fit, but if none of the
> existing frameworks have what we need, perhaps something new can be
> created.

On Exynos we use devfreq, though in that case we monitor performance
counters to determine how internal buses should be scaled - not sure
if Tegra SoCs have similar counters that could be used for this
purpose.  It seems like EMC scaling would fit nicely within the PM QoS
framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

-Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Thierry Reding
On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> > Add shared and cbus clocks to the Tegra124 clock implementation.
> 
> > diff --git a/include/dt-bindings/clock/tegra124-car.h 
> > b/include/dt-bindings/clock/tegra124-car.h
> 
> > +#define TEGRA124_CLK_C2BUS 401
> > +#define TEGRA124_CLK_C3BUS 402
> > +#define TEGRA124_CLK_GR3D_CBUS 403
> > +#define TEGRA124_CLK_GR2D_CBUS 404
> ...
> 
> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> but are more a way of SW applying policy to the clock that do exist in
> HW. As such, I'm not convinced it's a good idea to expose these clock
> IDS to DT, since DT is supposed to represent the HW, and not be
> influenced by internal SW implementation details.
> 
> Do any DTs actually need to used these new clock IDs? I don't think we
> could actually use these value in e.g. tegra124.dtsi's clocks
> properties, since these clocks don't exist in HW. Was it your intent to
> do that? If not, can't we just define these SW-internal clock IDs in the
> header inside the Tegra clock driver, so the values are invisible to DT?

I'm beginning to wonder if abusing clocks in this way is really the best
solution. From what I understand there are two problems here that are
mostly orthogonal though they're implemented using similar techniques.

The reason for introducing cbus clocks are still unclear to me. From the
cover letter of this patch series it seems like these should be
completely hidden from drivers and as such they don't belong in device
tree. Also if they are an implementation detail, why are they even
implemented as clocks? Perhaps an example use-case would help illustrate
the need for this.

As for shared clocks I'm only aware of one use-case, namely EMC scaling.
Using clocks for that doesn't seem like the best option to me. While it
can probably fix the immediate issue of choosing an appropriate
frequency for the EMC clock it isn't a complete solution for the problem
that we're trying to solve. From what I understand EMC scaling is one
part of ensuring quality of service. The current implementations of that
seems to abuse clocks (essentially one X.emc clock per X clock) to
signal the amount of memory bandwidth required by any given device. But
there are other parts to the puzzle. Latency allowance is one. The value
programmed to the latency allowance registers for example depends on the
EMC frequency.

Has anyone ever looked into using a different framework to model all of
these requirements? PM QoS looks like it might fit, but if none of the
existing frameworks have what we need, perhaps something new can be
created.

Thierry


pgpWkv55z1CPo.pgp
Description: PGP signature


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Thierry Reding
On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
 On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
  Add shared and cbus clocks to the Tegra124 clock implementation.
 
  diff --git a/include/dt-bindings/clock/tegra124-car.h 
  b/include/dt-bindings/clock/tegra124-car.h
 
  +#define TEGRA124_CLK_C2BUS 401
  +#define TEGRA124_CLK_C3BUS 402
  +#define TEGRA124_CLK_GR3D_CBUS 403
  +#define TEGRA124_CLK_GR2D_CBUS 404
 ...
 
 I worry about this a bit. IIUC, these clocks don't actually exist in HW,
 but are more a way of SW applying policy to the clock that do exist in
 HW. As such, I'm not convinced it's a good idea to expose these clock
 IDS to DT, since DT is supposed to represent the HW, and not be
 influenced by internal SW implementation details.
 
 Do any DTs actually need to used these new clock IDs? I don't think we
 could actually use these value in e.g. tegra124.dtsi's clocks
 properties, since these clocks don't exist in HW. Was it your intent to
 do that? If not, can't we just define these SW-internal clock IDs in the
 header inside the Tegra clock driver, so the values are invisible to DT?

I'm beginning to wonder if abusing clocks in this way is really the best
solution. From what I understand there are two problems here that are
mostly orthogonal though they're implemented using similar techniques.

The reason for introducing cbus clocks are still unclear to me. From the
cover letter of this patch series it seems like these should be
completely hidden from drivers and as such they don't belong in device
tree. Also if they are an implementation detail, why are they even
implemented as clocks? Perhaps an example use-case would help illustrate
the need for this.

As for shared clocks I'm only aware of one use-case, namely EMC scaling.
Using clocks for that doesn't seem like the best option to me. While it
can probably fix the immediate issue of choosing an appropriate
frequency for the EMC clock it isn't a complete solution for the problem
that we're trying to solve. From what I understand EMC scaling is one
part of ensuring quality of service. The current implementations of that
seems to abuse clocks (essentially one X.emc clock per X clock) to
signal the amount of memory bandwidth required by any given device. But
there are other parts to the puzzle. Latency allowance is one. The value
programmed to the latency allowance registers for example depends on the
EMC frequency.

Has anyone ever looked into using a different framework to model all of
these requirements? PM QoS looks like it might fit, but if none of the
existing frameworks have what we need, perhaps something new can be
created.

Thierry


pgpWkv55z1CPo.pgp
Description: PGP signature


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Andrew Bresticker
On Wed, May 14, 2014 at 7:27 AM, Thierry Reding
thierry.red...@gmail.com wrote:
 As for shared clocks I'm only aware of one use-case, namely EMC scaling.
 Using clocks for that doesn't seem like the best option to me. While it
 can probably fix the immediate issue of choosing an appropriate
 frequency for the EMC clock it isn't a complete solution for the problem
 that we're trying to solve. From what I understand EMC scaling is one
 part of ensuring quality of service. The current implementations of that
 seems to abuse clocks (essentially one X.emc clock per X clock) to
 signal the amount of memory bandwidth required by any given device. But
 there are other parts to the puzzle. Latency allowance is one. The value
 programmed to the latency allowance registers for example depends on the
 EMC frequency.

 Has anyone ever looked into using a different framework to model all of
 these requirements? PM QoS looks like it might fit, but if none of the
 existing frameworks have what we need, perhaps something new can be
 created.

On Exynos we use devfreq, though in that case we monitor performance
counters to determine how internal buses should be scaled - not sure
if Tegra SoCs have similar counters that could be used for this
purpose.  It seems like EMC scaling would fit nicely within the PM QoS
framework, perhaps with a new PM_QOS_MEMORY_THROUGHPUT class.

-Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-14 Thread Mike Turquette
Quoting Thierry Reding (2014-05-14 07:27:40)
 On Tue, May 13, 2014 at 12:09:49PM -0600, Stephen Warren wrote:
  On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
   Add shared and cbus clocks to the Tegra124 clock implementation.
  
   diff --git a/include/dt-bindings/clock/tegra124-car.h 
   b/include/dt-bindings/clock/tegra124-car.h
  
   +#define TEGRA124_CLK_C2BUS 401
   +#define TEGRA124_CLK_C3BUS 402
   +#define TEGRA124_CLK_GR3D_CBUS 403
   +#define TEGRA124_CLK_GR2D_CBUS 404
  ...
  
  I worry about this a bit. IIUC, these clocks don't actually exist in HW,
  but are more a way of SW applying policy to the clock that do exist in
  HW. As such, I'm not convinced it's a good idea to expose these clock
  IDS to DT, since DT is supposed to represent the HW, and not be
  influenced by internal SW implementation details.
  
  Do any DTs actually need to used these new clock IDs? I don't think we
  could actually use these value in e.g. tegra124.dtsi's clocks
  properties, since these clocks don't exist in HW. Was it your intent to
  do that? If not, can't we just define these SW-internal clock IDs in the
  header inside the Tegra clock driver, so the values are invisible to DT?
 
 I'm beginning to wonder if abusing clocks in this way is really the best
 solution. From what I understand there are two problems here that are
 mostly orthogonal though they're implemented using similar techniques.

Ack. Virtual clocks have been implemented by vendors before as a way
to manage complicated clock rate changes. I do not think we should
support such a method upstream.

I'm working with another engineer in Linaro on a coordinated clock rate
change series that might help solve some of the problems that this
patch series is trying to achieve.

 
 The reason for introducing cbus clocks are still unclear to me. From the
 cover letter of this patch series it seems like these should be
 completely hidden from drivers and as such they don't belong in device
 tree. Also if they are an implementation detail, why are they even
 implemented as clocks? Perhaps an example use-case would help illustrate
 the need for this.

I also have this question. Does cbus come from your TRM or data sheet?
Or is it purely a software solution to coordinating rate changes within
known limits and for validated combinations?

 
 As for shared clocks I'm only aware of one use-case, namely EMC scaling.
 Using clocks for that doesn't seem like the best option to me. While it
 can probably fix the immediate issue of choosing an appropriate
 frequency for the EMC clock it isn't a complete solution for the problem
 that we're trying to solve. From what I understand EMC scaling is one
 part of ensuring quality of service. The current implementations of that
 seems to abuse clocks (essentially one X.emc clock per X clock) to
 signal the amount of memory bandwidth required by any given device. But
 there are other parts to the puzzle. Latency allowance is one. The value
 programmed to the latency allowance registers for example depends on the
 EMC frequency.
 
 Has anyone ever looked into using a different framework to model all of
 these requirements? PM QoS looks like it might fit, but if none of the
 existing frameworks have what we need, perhaps something new can be
 created.

It has been discussed. Using a QoS throughput constraint could help
scale frequency. But this deserves a wider discussion and starts to
stray into both PM QoS territory and also into should we have a DVFS
framework territory.

Regards,
Mike

 
 Thierry
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-13 Thread Andrew Bresticker
On Tue, May 13, 2014 at 11:09 AM, Stephen Warren  wrote:
> On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
>> Add shared and cbus clocks to the Tegra124 clock implementation.
>
>> diff --git a/include/dt-bindings/clock/tegra124-car.h 
>> b/include/dt-bindings/clock/tegra124-car.h
>
>> +#define TEGRA124_CLK_C2BUS 401
>> +#define TEGRA124_CLK_C3BUS 402
>> +#define TEGRA124_CLK_GR3D_CBUS 403
>> +#define TEGRA124_CLK_GR2D_CBUS 404
> ...
>
> I worry about this a bit. IIUC, these clocks don't actually exist in HW,
> but are more a way of SW applying policy to the clock that do exist in
> HW. As such, I'm not convinced it's a good idea to expose these clock
> IDS to DT, since DT is supposed to represent the HW, and not be
> influenced by internal SW implementation details.

The purpose of these IDs, I believe, is to be used in bindings for consumer
devices so that the driver can get() the shared clock.

> Do any DTs actually need to used these new clock IDs?

AFAIK no driver currently supports the use of these shared clocks, but
they could.
In the chromium kernel, for example, we have devices which require an
"emc" clock
in their bindings that the driver uses to vote on SDRAM frequency.  I
guess we could
avoid specifying these clocks in the DT by creating aliases for them,
but I'm not sure
that's any better.

> I don't think we could actually use these value in e.g. tegra124.dtsi's clocks
> properties, since these clocks don't exist in HW.

The DT IDs are simply used to look up the struct clk corresponding to the ID
in the per-SoC clks array.  They aren't used for indexing into a hardware
register or anything like that, though they happen to currently be assigned so
that they match the corresponding clock's hardware ID.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-13 Thread Stephen Warren
On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
> Add shared and cbus clocks to the Tegra124 clock implementation.

> diff --git a/include/dt-bindings/clock/tegra124-car.h 
> b/include/dt-bindings/clock/tegra124-car.h

> +#define TEGRA124_CLK_C2BUS 401
> +#define TEGRA124_CLK_C3BUS 402
> +#define TEGRA124_CLK_GR3D_CBUS 403
> +#define TEGRA124_CLK_GR2D_CBUS 404
...

I worry about this a bit. IIUC, these clocks don't actually exist in HW,
but are more a way of SW applying policy to the clock that do exist in
HW. As such, I'm not convinced it's a good idea to expose these clock
IDS to DT, since DT is supposed to represent the HW, and not be
influenced by internal SW implementation details.

Do any DTs actually need to used these new clock IDs? I don't think we
could actually use these value in e.g. tegra124.dtsi's clocks
properties, since these clocks don't exist in HW. Was it your intent to
do that? If not, can't we just define these SW-internal clock IDs in the
header inside the Tegra clock driver, so the values are invisible to DT?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] clk: tegra: Implement Tegra124 shared/cbus clks

2014-05-13 Thread Andrew Bresticker
On Tue, May 13, 2014 at 11:09 AM, Stephen Warren swar...@wwwdotorg.org wrote:
 On 05/13/2014 08:06 AM, Peter De Schrijver wrote:
 Add shared and cbus clocks to the Tegra124 clock implementation.

 diff --git a/include/dt-bindings/clock/tegra124-car.h 
 b/include/dt-bindings/clock/tegra124-car.h

 +#define TEGRA124_CLK_C2BUS 401
 +#define TEGRA124_CLK_C3BUS 402
 +#define TEGRA124_CLK_GR3D_CBUS 403
 +#define TEGRA124_CLK_GR2D_CBUS 404
 ...

 I worry about this a bit. IIUC, these clocks don't actually exist in HW,
 but are more a way of SW applying policy to the clock that do exist in
 HW. As such, I'm not convinced it's a good idea to expose these clock
 IDS to DT, since DT is supposed to represent the HW, and not be
 influenced by internal SW implementation details.

The purpose of these IDs, I believe, is to be used in bindings for consumer
devices so that the driver can get() the shared clock.

 Do any DTs actually need to used these new clock IDs?

AFAIK no driver currently supports the use of these shared clocks, but
they could.
In the chromium kernel, for example, we have devices which require an
emc clock
in their bindings that the driver uses to vote on SDRAM frequency.  I
guess we could
avoid specifying these clocks in the DT by creating aliases for them,
but I'm not sure
that's any better.

 I don't think we could actually use these value in e.g. tegra124.dtsi's clocks
 properties, since these clocks don't exist in HW.

The DT IDs are simply used to look up the struct clk corresponding to the ID
in the per-SoC clks array.  They aren't used for indexing into a hardware
register or anything like that, though they happen to currently be assigned so
that they match the corresponding clock's hardware ID.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/