Re: [PATCH RFC] powercap/drivers/energy_model: protocode: Add powercap energy model based

2020-09-11 Thread Daniel Lezcano


Hi Amit,

thanks for taking the time to review the series and read this long
description.

On 10/09/2020 11:48, Amit Kucheria wrote:

[ ... ]

>> +
>> +config POWERCAP_EM
>> +bool "Energy model based power capping"
>> +   depends on ENERGY_MODEL
>> +   default y
> 
> Don't make it default ;-)

Thanks for pointing this out :D

[ ... ]

>> +   pc_package = powercap_em_register(pct, "package", pc_soc,
>> + _ops, 1, _ops);
> 
> Will the soc and package hierarchy eventually be dynamically read from
> devicetree or similar and these hardcoded registration removed?

Yes, that's correct.

> For the rest of the devices, IMO, it makes sense to use the genpd
> hierarchy to reflect the powercap hierarchy. I whipped up the
> following patch to show how it might be achieved. What needs to be
> done is to now reflect the parent-child/sibling relationships of genpd
> into powercap. Initially I thought we'd need to additional DT
> properties in the genpd bindings but I think we might be able to read
> the device-specific energy model data directly to populate the
> powercap constraints.

We need a way to describe the power constraints relationship of the
domains to populate the hierarchy of the powercap directories.

In the DT it may look like that (very roughly):

power-constraints {
power-constraint: package {
compatible = "power-constraint,virtual";
};

power-constraint: perfdomain0 {
compatible = "power-constraint,cpu";
device = <_l0>;
parent = <>;
};

power-constraint: perfdomain1 {
compatible = "power-constraint,cpu";
device = <_b0>;
parent = <>;
};

power-constraint: gpu {
compatible = "power-constraint,gpu";
parent = <>
};
};


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog


Re: [PATCH RFC] powercap/drivers/energy_model: protocode: Add powercap energy model based

2020-09-10 Thread Amit Kucheria
Hi Daniel,

On Tue, Jul 7, 2020 at 10:45 PM Daniel Lezcano
 wrote:
>
> On the embedded world, the complexity of the SoC leads to an
> increasing number of hotspots which need to be monitored and mitigated
> as a whole in order to prevent the temperature to go above the
> normative and legally stated 'skin temperature'.
>
> Another aspect is to sustain the performance for a given power budget,
> for example virtual reality where the user can feel dizziness if the
> GPU performance is capped while a big CPU is processing something
> else. Or reduce the battery charging because the dissipated power is
> too high compared with the power consumed by other devices.
>
> Nowadays, the current thermal daemons are abusing the thermal
> framework cooling device state to force a specific and arbitraty state

typo: arbitrary

> without taking care of the governor decisions. Given the closed loop
> of some governors that can confuse the logic or directly enter in
> a decision conflict.
>
> As the number of cooling device support is limited today to the CPU
> and the GPU, the thermal daemons have little control on the power
> dissipation of the system. The out of tree solutions are hacking
> around here and there in the drivers, in the frameworks to have
> control on the devices.
>
> The recent introduction of the energy model allows to get power
> information related to a gpu or a cpu device with a limited support.
>
> Thanks of the current work of Lukasz Luba:
>
>https://lkml.org/lkml/2020/5/27/406
>
> The energy model is now being improved to be generic and extended to
> all devices, so giving the opportunity to SoC vendor to define the
> device energy model.
>
> On the other side, the powercap infrastructure is a perfect fit to define
> power constraints in a hierarchical way.
>
> The proposal is to use the powercap framework with the energy model in
> order to create a hierarchy of constraints the SoC vendor is able to
> define and assign a power budget on some nodes to cap the power.
>
> Example of constraints hierarchy:
>
> Soc
>   |
>   |-- gpu
>   |
>   `-- package
> |
> |-- perfdomain0
> | |
> | |-- cpu0
> | |
> | |-- cpu1
> | |
> | |-- cpu2
> | |
> | `-- cpu3
> |
> `-- perfdomain1
>   |
>   |-- cpu4
>   |
>   `-- cpu5
>
> The leaves of the tree are the real devices, the intermediate nodes
> are virtual, aggregating the children constraints and power

Consider rephrasing as: aggregating the constraints and power
characteristics of their children.

> characteristics.
>
> For example: cpu[0-3] have 179mW max, 'perfdomain0' has 716mW max,
> cpu[4-5] have 1130mw max each, 'perfordomain1' has 2260mW. It results
> 'package' has 2260 + 716 = 2976mW max.
>
> Each node have a weight on a 2^10 basis, in order to reflect the

Consider rephrasing as: node has a weight on a scale of 0 to 1024

> percentage of power distribution of the children's node. This
> percentage is used to dispatch the power limit to the children.
>
> For example: package has 2976mW max, the weigths for the children are:

typo: weights

>
>   perfdomain0: (716 * 1024) / 2976 = 246
>   perfdomain1: (2260 * 1024) / 2976 = 778
>
> If we want to apply a power limit constraint of 1500mW at the package
> level, the power limit will be distributed along the children as:
>
>   perfdomain0: (1500 * 246) / 1024 = 360mW
>   perfdomain1: (1500 * 778) / 1024 = 1140mW
>
> This simple approach allows to do a fair distribution of the power
> limit but it will be replaced by a more complex mechanism where the
> power limit will be dynamically adjusted depending on the power
> consumption of the different devices. This is an algorithm with auto
> power balancing with unused power. When an allocated power budget is
> not used by a device, the siblings can share this free power until the
> device needs more power.
>
> The algorithm was presented during the ELC:
>
> https://ossna2020.sched.com/event/c3Wf/ideas-for-finer-grained-control-over-your-heat-budget-amit-kucheria-daniel-lezcano-linaro
>
> Given the complexity of the code, it sounds reasonable to provide a
> first stone of the edifice allowing at least the thermal daemons to
> stop abusing the thermal framework where the primary goal is to
> protect the silicone, not cap the power.

typo: silicon

>
> However, one question remains: how do we describe the hierarchy?


> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/powercap/Kconfig   |   8 +
>  drivers/powercap/Makefile  |   1 +
>  drivers/powercap/powercap_em.c | 485 +
>  include/linux/cpuhotplug.h |   1 +
>  4 files changed, 495 insertions(+)
>  create mode 100644 drivers/powercap/powercap_em.c
>
> diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
> index 

[PATCH RFC] powercap/drivers/energy_model: protocode: Add powercap energy model based

2020-07-07 Thread Daniel Lezcano
On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
GPU performance is capped while a big CPU is processing something
else. Or reduce the battery charging because the dissipated power is
too high compared with the power consumed by other devices.

Nowadays, the current thermal daemons are abusing the thermal
framework cooling device state to force a specific and arbitraty state
without taking care of the governor decisions. Given the closed loop
of some governors that can confuse the logic or directly enter in
a decision conflict.

As the number of cooling device support is limited today to the CPU
and the GPU, the thermal daemons have little control on the power
dissipation of the system. The out of tree solutions are hacking
around here and there in the drivers, in the frameworks to have
control on the devices.

The recent introduction of the energy model allows to get power
information related to a gpu or a cpu device with a limited support.

Thanks of the current work of Lukasz Luba:

   https://lkml.org/lkml/2020/5/27/406

The energy model is now being improved to be generic and extended to
all devices, so giving the opportunity to SoC vendor to define the
device energy model.

On the other side, the powercap infrastructure is a perfect fit to define
power constraints in a hierarchical way.

The proposal is to use the powercap framework with the energy model in
order to create a hierarchy of constraints the SoC vendor is able to
define and assign a power budget on some nodes to cap the power.

Example of constraints hierarchy:

Soc
  |
  |-- gpu
  |
  `-- package
|
|-- perfdomain0
| |
| |-- cpu0
| |
| |-- cpu1
| |
| |-- cpu2
| |
| `-- cpu3
|
`-- perfdomain1
  |
  |-- cpu4
  |
  `-- cpu5

The leaves of the tree are the real devices, the intermediate nodes
are virtual, aggregating the children constraints and power
characteristics.

For example: cpu[0-3] have 179mW max, 'perfdomain0' has 716mW max,
cpu[4-5] have 1130mw max each, 'perfordomain1' has 2260mW. It results
'package' has 2260 + 716 = 2976mW max.

Each node have a weight on a 2^10 basis, in order to reflect the
percentage of power distribution of the children's node. This
percentage is used to dispatch the power limit to the children.

For example: package has 2976mW max, the weigths for the children are:

  perfdomain0: (716 * 1024) / 2976 = 246
  perfdomain1: (2260 * 1024) / 2976 = 778

If we want to apply a power limit constraint of 1500mW at the package
level, the power limit will be distributed along the children as:

  perfdomain0: (1500 * 246) / 1024 = 360mW
  perfdomain1: (1500 * 778) / 1024 = 1140mW

This simple approach allows to do a fair distribution of the power
limit but it will be replaced by a more complex mechanism where the
power limit will be dynamically adjusted depending on the power
consumption of the different devices. This is an algorithm with auto
power balancing with unused power. When an allocated power budget is
not used by a device, the siblings can share this free power until the
device needs more power.

The algorithm was presented during the ELC:

https://ossna2020.sched.com/event/c3Wf/ideas-for-finer-grained-control-over-your-heat-budget-amit-kucheria-daniel-lezcano-linaro

Given the complexity of the code, it sounds reasonable to provide a
first stone of the edifice allowing at least the thermal daemons to
stop abusing the thermal framework where the primary goal is to
protect the silicone, not cap the power.

However, one question remains: how do we describe the hierarchy?

Signed-off-by: Daniel Lezcano 
---
 drivers/powercap/Kconfig   |   8 +
 drivers/powercap/Makefile  |   1 +
 drivers/powercap/powercap_em.c | 485 +
 include/linux/cpuhotplug.h |   1 +
 4 files changed, 495 insertions(+)
 create mode 100644 drivers/powercap/powercap_em.c

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index ebc4d4578339..57f2e9f31560 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -43,4 +43,12 @@ config IDLE_INJECT
  CPUs for power capping. Idle period can be injected
  synchronously on a set of specified CPUs or alternatively
  on a per CPU basis.
+
+config POWERCAP_EM
+bool "Energy model based power capping"
+   depends on ENERGY_MODEL
+   default y
+   help
+ This enables support for the power capping using the energy
+ model