from:"Daniel Lezcano"

[PATCH] thermal/core: Make 'forced_passive' as obsolete candidate

2020-12-08 Thread Daniel Lezcano

The passive file in sysfs forces the usage of a passive trip point set
by the userspace when a broken BIOS does not provide the mitigation
temperature for such thermal zone. The hardware evolved a lot since
2008 as a good thermal management is no longer an option.

Linux on the other side also provides now a way to load fixed ACPI
table via the option ACPI_TABLE_UPGRADE, so additionnal trip point
could be added there.

Set the option obsolete and plan to remove it, so the corresponding
code can be removed from the core code and allow more cleanups the
thermal framework deserves.

Signed-off-by: Daniel Lezcano 
---
 Documentation/ABI/obsolete/sysfs-thermal-passive | 13 +
 drivers/thermal/thermal_sysfs.c  |  2 ++
 2 files changed, 15 insertions(+)
 create mode 100644 Documentation/ABI/obsolete/sysfs-thermal-passive

diff --git a/Documentation/ABI/obsolete/sysfs-thermal-passive 
b/Documentation/ABI/obsolete/sysfs-thermal-passive
new file mode 100644
index ..2510724cc165
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-thermal-passive
@@ -0,0 +1,13 @@
+What:  /sys/class/thermal/thermal_zone*/passive
+Date:  December 2008
+KernelVersion: 2.6.28
+Contact:   Daniel Lezcano 
+Description:
+
+  The passive file in sysfs forces the usage of a passive trip point
+  set by the userspace when a broken BIOS does not provide the
+  mitigation temperature for such thermal zone. However, the Linux
+  kernel evolved a lot since 2008 as well as the hardware and it is
+  able to manage correctly the thermal envelope. It does also provide
+  a way to load fixed ACPI table via the option ACPI_TABLE_UPGRADE, so
+  additionnal trip point could be added there.
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 0866e949339b..578099b520b1 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -232,6 +232,8 @@ passive_store(struct device *dev, struct device_attribute 
*attr,
if (state && state < 1000)
return -EINVAL;
 
+   pr_warn("%s: Consider the 'passive' option obsolete\n", tz->type);
+
if (state && !tz->forced_passive) {
if (!tz->passive_delay)
tz->passive_delay = 1000;
-- 
2.25.1

Re: [PATCH] thermal/core: Emit a warning if the thermal zone is updated without ops

2020-12-08 Thread Daniel Lezcano

On 08/12/2020 15:37, Lukasz Luba wrote:
> 
> 
> On 12/8/20 1:51 PM, Daniel Lezcano wrote:
>>
>> Hi Lukasz,
>>
>> On 08/12/2020 10:36, Lukasz Luba wrote:
>>> Hi Daniel,
>>
>> [ ... ]
>>
>>>>      static void thermal_zone_device_init(struct thermal_zone_device
>>>> *tz)
>>>> @@ -553,11 +555,9 @@ void thermal_zone_device_update(struct
>>>> thermal_zone_device *tz,
>>>>    if (atomic_read(_suspend))
>>>>    return;
>>>>    -    if (!tz->ops->get_temp)
>>>> +    if (update_temperature(tz))
>>>>    return;
>>>>    -    update_temperature(tz);
>>>> -
>>>
>>> I think the patch does a bit more. Previously we continued running the
>>> code below even when the thermal_zone_get_temp() returned an error (due
>>> to various reasons). Now we stop and probably would not schedule next
>>> polling, not calling:
>>> handle_thermal_trip() and monitor_thermal_zone()
>>
>> I agree there is a change in the behavior.
>>
>>> I would left update_temperature(tz) as it was and not check the return.
>>> The function thermal_zone_get_temp() can protect itself from missing
>>> tz->ops->get_temp(), so we should be safe.
>>>
>>> What do you think?
>>
>> Does it make sense to handle the trip point if we are unable to read the
>> temperature?
>>
>> The lines following the update_temperature() are:
>>
>>   - thermal_zone_set_trips() which needs a correct tz->temperature
>>
>>   - handle_thermal_trip() which needs a correct tz->temperature to
>> compare with
>>
>>   - monitor_thermal_zone() which needs a consistent tz->passive. This one
>> is updated by the governor which is in an inconsistent state because the
>> temperature is not updated.
>>
>> The problem I see here is how the interrupt mode and the polling mode
>> are existing in the same code path.
>>
>> The interrupt mode can call thermal_notify_framework() for critical/hot
>> trip points without being followed by a monitoring. But for the other
>> trip points, the get_temp is needed.
> 
> Yes, I agree that we can bail out when there is no .get_temp() callback
> and even not schedule next polling in such case.
> But I am just not sure if we can bail out and not schedule the next
> polling, when there is .get_temp() populated and the driver returned
> an error only at that moment, e.g. indicating some internal temporary,
> issue like send queue full, so such as -EBUSY, or -EAGAIN, etc.
> The thermal_zone_get_temp() would pass the error to update_temperature()
> but we return, losing the next try. We would not check the temperature
> again.

Hmm, right. I agree with your point.

What about the following changes:

 - Add the new APIs:

   thermal_zone_device_critical(struct thermal_zone_device *tz);
 => emergency poweroff

   thermal_zone_device_hot(struct thermal_zone_device *tz);
 => userspace notification

 - Add a big fat WARN when thermal_zone_device_update is called with
.get_temp == NULL because that must not happen.

If the .get_temp is NULL it is because we only have a HOT/CRITICAL
thermal trip points where we don't care about the temperature and
governor decision, right ?





-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH] thermal/core: Emit a warning if the thermal zone is updated without ops

2020-12-08 Thread Daniel Lezcano

Hi Lukasz,

On 08/12/2020 10:36, Lukasz Luba wrote:
> Hi Daniel,

[ ... ]

>>     static void thermal_zone_device_init(struct thermal_zone_device *tz)
>> @@ -553,11 +555,9 @@ void thermal_zone_device_update(struct
>> thermal_zone_device *tz,
>>   if (atomic_read(_suspend))
>>   return;
>>   -    if (!tz->ops->get_temp)
>> +    if (update_temperature(tz))
>>   return;
>>   -    update_temperature(tz);
>> -
> 
> I think the patch does a bit more. Previously we continued running the
> code below even when the thermal_zone_get_temp() returned an error (due
> to various reasons). Now we stop and probably would not schedule next
> polling, not calling:
> handle_thermal_trip() and monitor_thermal_zone()

I agree there is a change in the behavior.

> I would left update_temperature(tz) as it was and not check the return.
> The function thermal_zone_get_temp() can protect itself from missing
> tz->ops->get_temp(), so we should be safe.
> 
> What do you think?

Does it make sense to handle the trip point if we are unable to read the
temperature?

The lines following the update_temperature() are:

 - thermal_zone_set_trips() which needs a correct tz->temperature

 - handle_thermal_trip() which needs a correct tz->temperature to
compare with

 - monitor_thermal_zone() which needs a consistent tz->passive. This one
is updated by the governor which is in an inconsistent state because the
temperature is not updated.

The problem I see here is how the interrupt mode and the polling mode
are existing in the same code path.

The interrupt mode can call thermal_notify_framework() for critical/hot
trip points without being followed by a monitoring. But for the other
trip points, the get_temp is needed.

IMHO, we should return if update_temperature() is failing.

Perhaps, it would make sense to simply prevent to register a thermal
zone if the get_temp ops is not defined.

AFAICS, if the interrupt mode without get_temp callback are for hot and
critical trip points which can be directly invoked from the sensor via a
specified callback, no thermal zone would be needed in this case.

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

[PATCH v2 1/4] thermal/core: Rename passive_delay and polling_delay with units

2020-12-07 Thread Daniel Lezcano

The input unit used by the thermal framework is the msec but it uses
the jiffies to set the timers.

As it is stored in the thermal zone device structure, everytime the
timer is setup at each polling interval, the msecs to jiffies
conversion happens. The jiffies is the unit the thermal framework is
using, so keeping it under the jiffies instead of msecs will save some
pointless conversion.

Set the scene to directly store the delays under their jiffies
form by adding to the variable name the 'ms' suffix.

Signed-off-by: Daniel Lezcano 
---
 drivers/platform/x86/acerhdf.c   |  2 +-
 drivers/thermal/da9062-thermal.c |  4 ++--
 drivers/thermal/gov_power_allocator.c|  2 +-
 drivers/thermal/thermal_core.c   | 10 +-
 drivers/thermal/thermal_of.c | 16 
 drivers/thermal/thermal_sysfs.c  |  6 +++---
 .../thermal/ti-soc-thermal/ti-thermal-common.c   |  2 +-
 include/linux/thermal.h  |  8 
 8 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index b6aa6e5514f4..7b26f601b407 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -336,7 +336,7 @@ static void acerhdf_check_param(struct thermal_zone_device 
*thermal)
pr_notice("interval changed to: %d\n", interval);
 
if (thermal)
-   thermal->polling_delay = interval*1000;
+   thermal->polling_delay_ms = interval*1000;
 
prev_interval = interval;
}
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
index 4d74994f160a..ebb3d0b4a5be 100644
--- a/drivers/thermal/da9062-thermal.c
+++ b/drivers/thermal/da9062-thermal.c
@@ -95,7 +95,7 @@ static void da9062_thermal_poll_on(struct work_struct *work)
thermal_zone_device_update(thermal->zone,
   THERMAL_EVENT_UNSPECIFIED);
 
-   delay = msecs_to_jiffies(thermal->zone->passive_delay);
+   delay = msecs_to_jiffies(thermal->zone->passive_delay_ms);
queue_delayed_work(system_freezable_wq, >work, delay);
return;
}
@@ -245,7 +245,7 @@ static int da9062_thermal_probe(struct platform_device 
*pdev)
 
dev_dbg(>dev,
"TJUNC temperature polling period set at %d ms\n",
-   thermal->zone->passive_delay);
+   thermal->zone->passive_delay_ms);
 
ret = platform_get_irq_byname(pdev, "THERMAL");
if (ret < 0) {
diff --git a/drivers/thermal/gov_power_allocator.c 
b/drivers/thermal/gov_power_allocator.c
index 7a4170a0b51f..f7a663f698d4 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -258,7 +258,7 @@ static u32 pid_controller(struct thermal_zone_device *tz,
 * power being applied, slowing down the controller)
 */
d = mul_frac(tz->tzp->k_d, err - params->prev_err);
-   d = div_frac(d, tz->passive_delay);
+   d = div_frac(d, tz->passive_delay_ms);
params->prev_err = err;
 
power_range = p + i + d;
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 1bd23ff2247b..5b500d72aab4 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -317,9 +317,9 @@ static void monitor_thermal_zone(struct thermal_zone_device 
*tz)
mutex_lock(>lock);
 
if (!stop && tz->passive)
-   thermal_zone_device_set_polling(tz, tz->passive_delay);
-   else if (!stop && tz->polling_delay)
-   thermal_zone_device_set_polling(tz, tz->polling_delay);
+   thermal_zone_device_set_polling(tz, tz->passive_delay_ms);
+   else if (!stop && tz->polling_delay_ms)
+   thermal_zone_device_set_polling(tz, tz->polling_delay_ms);
else
thermal_zone_device_set_polling(tz, 0);
 
@@ -1340,8 +1340,8 @@ thermal_zone_device_register(const char *type, int trips, 
int mask,
tz->device.class = _class;
tz->devdata = devdata;
tz->trips = trips;
-   tz->passive_delay = passive_delay;
-   tz->polling_delay = polling_delay;
+   tz->passive_delay_ms = passive_delay;
+   tz->polling_delay_ms = polling_delay;
 
/* sys I/F */
/* Add nodes that are always present via .groups */
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 69ef12f852b7..c01248e48c09 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -51,8 +51,8 @@ struct __thermal_bind_params {
 
 /**
  * struct __thermal_zone - internal representation of a thermal zone
- * @passive_del

[PATCH v2 4/4] thermal/core: Remove ms based delay fields

2020-12-07 Thread Daniel Lezcano

The code does no longer use the ms unit based fields to set the
delays as they are replaced by the jiffies.

Remove them and replace their user to use the jiffies version instead.

Signed-off-by: Daniel Lezcano 
---
 drivers/platform/x86/acerhdf.c | 3 ++-
 drivers/thermal/da9062-thermal.c   | 4 ++--
 drivers/thermal/gov_power_allocator.c  | 2 +-
 drivers/thermal/thermal_core.c | 2 +-
 drivers/thermal/thermal_core.h | 2 --
 drivers/thermal/thermal_sysfs.c| 2 +-
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 6 --
 include/linux/thermal.h| 7 ---
 8 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 7b26f601b407..b7dbcf6be13e 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -336,7 +336,8 @@ static void acerhdf_check_param(struct thermal_zone_device 
*thermal)
pr_notice("interval changed to: %d\n", interval);
 
if (thermal)
-   thermal->polling_delay_ms = interval*1000;
+   thermal->polling_delay_jiffies =
+   msecs_to_jiffies(interval * 1000);
 
prev_interval = interval;
}
diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c
index ebb3d0b4a5be..180edec34e07 100644
--- a/drivers/thermal/da9062-thermal.c
+++ b/drivers/thermal/da9062-thermal.c
@@ -95,7 +95,7 @@ static void da9062_thermal_poll_on(struct work_struct *work)
thermal_zone_device_update(thermal->zone,
   THERMAL_EVENT_UNSPECIFIED);
 
-   delay = msecs_to_jiffies(thermal->zone->passive_delay_ms);
+   delay = thermal->zone->passive_delay_jiffies;
queue_delayed_work(system_freezable_wq, >work, delay);
return;
}
@@ -245,7 +245,7 @@ static int da9062_thermal_probe(struct platform_device 
*pdev)
 
dev_dbg(>dev,
"TJUNC temperature polling period set at %d ms\n",
-   thermal->zone->passive_delay_ms);
+   jiffies_to_msecs(thermal->zone->passive_delay_jiffies));
 
ret = platform_get_irq_byname(pdev, "THERMAL");
if (ret < 0) {
diff --git a/drivers/thermal/gov_power_allocator.c 
b/drivers/thermal/gov_power_allocator.c
index f7a663f698d4..f8c3d1e40b86 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -258,7 +258,7 @@ static u32 pid_controller(struct thermal_zone_device *tz,
 * power being applied, slowing down the controller)
 */
d = mul_frac(tz->tzp->k_d, err - params->prev_err);
-   d = div_frac(d, tz->passive_delay_ms);
+   d = div_frac(d, jiffies_to_msecs(tz->passive_delay_jiffies));
params->prev_err = err;
 
power_range = p + i + d;
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 16ef5d652d85..aff15c6b1c61 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -313,7 +313,7 @@ static void monitor_thermal_zone(struct thermal_zone_device 
*tz)
 
if (!stop && tz->passive)
thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
-   else if (!stop && tz->polling_delay_ms)
+   else if (!stop && tz->polling_delay_jiffies)
thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
else
thermal_zone_device_set_polling(tz, 0);
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 2c9551ed5ef8..5baa308ee7a5 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -131,7 +131,6 @@ void thermal_zone_set_trips(struct thermal_zone_device *tz);
 static inline void thermal_zone_set_passive_delay(
struct thermal_zone_device *tz, int delay_ms)
 {
-   tz->passive_delay_ms = delay_ms;
tz->passive_delay_jiffies = msecs_to_jiffies(delay_ms);
if (delay_ms > 1000)
tz->passive_delay_jiffies = 
round_jiffies(tz->passive_delay_jiffies);
@@ -140,7 +139,6 @@ static inline void thermal_zone_set_passive_delay(
 static inline void thermal_zone_set_polling_delay(
struct thermal_zone_device *tz, int delay_ms)
 {
-   tz->polling_delay_ms = delay_ms;
tz->polling_delay_jiffies = msecs_to_jiffies(delay_ms);
if (delay_ms > 1000)
tz->polling_delay_jiffies = 
round_jiffies(tz->polling_delay_jiffies);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 9598b288a0a1..8532b1dd0608 100644
--- a/drivers/thermal/thermal_sys

[PATCH v2 3/4] thermal/core: Use precomputed jiffies for the polling

2020-12-07 Thread Daniel Lezcano

The delays are also stored in jiffies based unit. Use them instead of
the ms.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 08c6e4e36896..16ef5d652d85 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -291,14 +291,9 @@ static int __init thermal_register_governors(void)
 static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
int delay)
 {
-   if (delay > 1000)
+   if (delay)
mod_delayed_work(system_freezable_power_efficient_wq,
->poll_queue,
-round_jiffies(msecs_to_jiffies(delay)));
-   else if (delay)
-   mod_delayed_work(system_freezable_power_efficient_wq,
->poll_queue,
-msecs_to_jiffies(delay));
+>poll_queue, delay);
else
cancel_delayed_work(>poll_queue);
 }
@@ -317,9 +312,9 @@ static void monitor_thermal_zone(struct thermal_zone_device 
*tz)
mutex_lock(>lock);
 
if (!stop && tz->passive)
-   thermal_zone_device_set_polling(tz, tz->passive_delay_ms);
+   thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
else if (!stop && tz->polling_delay_ms)
-   thermal_zone_device_set_polling(tz, tz->polling_delay_ms);
+   thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
else
thermal_zone_device_set_polling(tz, 0);
 
-- 
2.17.1

[PATCH v2 2/4] thermal/core: Precompute the delays from msecs to jiffies

2020-12-07 Thread Daniel Lezcano

The delays are stored in ms units and when the polling function is
called this delay is converted into jiffies at each call.

Instead of doing the conversion again and again, compute the jiffies
at init time and use the value directly when setting the polling.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c  |  5 +++--
 drivers/thermal/thermal_core.h  | 18 ++
 drivers/thermal/thermal_sysfs.c |  4 ++--
 include/linux/thermal.h |  7 +++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 5b500d72aab4..08c6e4e36896 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1340,8 +1340,9 @@ thermal_zone_device_register(const char *type, int trips, 
int mask,
tz->device.class = _class;
tz->devdata = devdata;
tz->trips = trips;
-   tz->passive_delay_ms = passive_delay;
-   tz->polling_delay_ms = polling_delay;
+
+   thermal_zone_set_passive_delay(tz, passive_delay);
+   thermal_zone_set_polling_delay(tz, polling_delay);
 
/* sys I/F */
/* Add nodes that are always present via .groups */
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 8df600fa7b79..2c9551ed5ef8 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -128,6 +128,24 @@ int thermal_build_list_of_policies(char *buf);
 /* Helpers */
 void thermal_zone_set_trips(struct thermal_zone_device *tz);
 
+static inline void thermal_zone_set_passive_delay(
+   struct thermal_zone_device *tz, int delay_ms)
+{
+   tz->passive_delay_ms = delay_ms;
+   tz->passive_delay_jiffies = msecs_to_jiffies(delay_ms);
+   if (delay_ms > 1000)
+   tz->passive_delay_jiffies = 
round_jiffies(tz->passive_delay_jiffies);
+}
+
+static inline void thermal_zone_set_polling_delay(
+   struct thermal_zone_device *tz, int delay_ms)
+{
+   tz->polling_delay_ms = delay_ms;
+   tz->polling_delay_jiffies = msecs_to_jiffies(delay_ms);
+   if (delay_ms > 1000)
+   tz->polling_delay_jiffies = 
round_jiffies(tz->polling_delay_jiffies);
+}
+
 /* sysfs I/F */
 int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
 void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f465462d8aa1..9598b288a0a1 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -234,11 +234,11 @@ passive_store(struct device *dev, struct device_attribute 
*attr,
 
if (state && !tz->forced_passive) {
if (!tz->passive_delay_ms)
-   tz->passive_delay_ms = 1000;
+   thermal_zone_set_passive_delay(tz, 1000);
thermal_zone_device_rebind_exception(tz, "Processor",
 sizeof("Processor"));
} else if (!state && tz->forced_passive) {
-   tz->passive_delay_ms = 0;
+   thermal_zone_set_passive_delay(tz, 0);
thermal_zone_device_unbind_exception(tz, "Processor",
 sizeof("Processor"));
}
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 230d451bf335..5dd9bdb6c6ad 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -118,9 +118,14 @@ struct thermal_cooling_device {
  * @trips_disabled;bitmap for disabled trips
  * @passive_delay_ms:  number of milliseconds to wait between polls when
  * performing passive cooling.
+ * @passive_delay_jiffies: number of jiffies to wait between polls when
+ * performing passive cooling.
  * @polling_delay_ms:  number of milliseconds to wait between polls when
  * checking whether trip points have been crossed (0 for
  * interrupt driven systems)
+ * @polling_delay_jiffies: number of jiffies to wait between polls when
+ * checking whether trip points have been crossed (0 for
+ * interrupt driven systems)
  * @temperature:   current temperature.  This is only for core code,
  * drivers should use thermal_zone_get_temp() to get the
  * current temperature
@@ -161,6 +166,8 @@ struct thermal_zone_device {
unsigned long trips_disabled;   /* bitmap for disabled trips */
int passive_delay_ms;
int polling_delay_ms;
+   int passive_delay_jiffies;
+   int polling_delay_jiffies;
int temperature;
int last_temperature;
int emul_temperature;
-- 
2.17.1

[PATCH] thermal/core: Emit a warning if the thermal zone is updated without ops

2020-12-07 Thread Daniel Lezcano

The actual code is silently ignoring a thermal zone update when a
driver is requesting it without a get_temp ops set.

That looks not correct, as the caller should not have called this
function if the thermal zone is unable to read the temperature.

That makes the code less robust as the check won't detect the driver
is inconsistently using the thermal API and that does not help to
improve the framework as these circumvolutions hide the problem at the
source.

In order to detect the situation when it happens, let's add a warning
when the update is requested without the get_temp() ops set.

Any warning emitted will have to be fixed at the source of the
problem: the caller must not call thermal_zone_device_update if there
is not get_temp callback set.

As the check is done in thermal_zone_get_temperature() via the
update_temperature() function, it is pointless to have the check and
the WARN in the thermal_zone_device_update() function. Just remove the
check and let the next call to raise the warning.

Cc: Thara Gopinath 
Cc: Amit Kucheria 
Cc: linux...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 90e38cc199f4..1bd23ff2247b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -448,17 +448,17 @@ static void handle_thermal_trip(struct 
thermal_zone_device *tz, int trip)
monitor_thermal_zone(tz);
 }
 
-static void update_temperature(struct thermal_zone_device *tz)
+static int update_temperature(struct thermal_zone_device *tz)
 {
int temp, ret;
 
ret = thermal_zone_get_temp(tz, );
if (ret) {
if (ret != -EAGAIN)
-   dev_warn(>device,
-"failed to read out thermal zone (%d)\n",
-ret);
-   return;
+   dev_warn_once(>device,
+ "failed to read out thermal zone (%d)\n",
+ ret);
+   return ret;
}
 
mutex_lock(>lock);
@@ -469,6 +469,8 @@ static void update_temperature(struct thermal_zone_device 
*tz)
trace_thermal_temperature(tz);
 
thermal_genl_sampling_temp(tz->id, temp);
+
+   return 0;
 }
 
 static void thermal_zone_device_init(struct thermal_zone_device *tz)
@@ -553,11 +555,9 @@ void thermal_zone_device_update(struct thermal_zone_device 
*tz,
if (atomic_read(_suspend))
return;
 
-   if (!tz->ops->get_temp)
+   if (update_temperature(tz))
return;
 
-   update_temperature(tz);
-
thermal_zone_set_trips(tz);
 
tz->notify_event = event;
-- 
2.17.1

Re: [PATCH v2 2/2] platform/x86/drivers/acerhdf: Check the interval value when it is set

2020-12-07 Thread Daniel Lezcano

On 07/12/2020 15:54, Hans de Goede wrote:
> Hi,
> 
> On 12/4/20 12:43 PM, Daniel Lezcano wrote:
>> On 03/12/2020 22:22, Peter Kästle wrote:
>>> 3. Dezember 2020 08:17, "Daniel Lezcano"  
>>> schrieb:
>>>
>>>> Currently the code checks the interval value when the temperature is
>>>> read which is bad for two reasons:
>>>>
>>>> - checking and setting the interval in the get_temp callback is
>>>> inaccurate and awful, that can be done when changing the value.
>>>>
>>>> - Changing the thermal zone structure internals is an abuse of the
>>>> exported structure, moreover no lock is taken here.
>>>>
>>>> The goal of this patch is to solve the first item by using the 'set'
>>>> function called when changing the interval. The check is done there
>>>> and removed from the get_temp function. If the thermal zone was not
>>>> initialized yet, the interval is not updated in this case as that will
>>>> happen in the init function when registering the thermal zone device.
>>>
>>> Thanks for your effort.  This improves the code, good finding.
>>>
>>>  
>>>> I don't have any hardware to test the changes.
>>>
>>> Tests successfully executed on my good old AOA110.
>>>
>>>
>>>> Signed-off-by: Daniel Lezcano 
>>>
>>> Acked-by: Peter Kaestle 
>>
>> Thanks for testing the changes.
>>
>> Shall pick the patches through the thermal tree ?
> 
> I can take them through the drivers/platform/x86 (pdx86) tree,
> but if you prefer to take them upstream through the thermal tree,
> then that is fine too...
> 
> Here is my ack (as pdx86 maintainer) for taking them through
> the thermal tree:
> 
> Acked-by: Hans de Goede 

Thanks. I'll take them through the thermal tree.

  -- Daniel


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

[PATCH RFD] thermal: core: Browse the trip points in reverse order and exit

2020-12-06 Thread Daniel Lezcano

When reading the code it is unclear if the loop is there to handle one
trip point or all the trip points above a certain temperature.

With the current code, the throttle function is called for every trip
point crossed and it is ambiguous if that is made on purpose.

Even digging into the history up to April, 16th 2015, the initial git
import of the acpi implementation of the loop before being converted
into a generic code, it is unclear if all trip points must be handled
or not.

When the code was intially written, was it assuming there can be only
one passive or active trip point ?

The cooling effect of the system will be very different depending on
how this loop is considered.

Even the IPA governor is filtering out multiple trip points to stick
to one value. Was the code to accomodate with a bogus loop and based
on the multiple non-critical trip points ?

Another example: arch/arm64/boot/dts/exynos/exynos5433-tmu.dtsi
defines 7 passive trip points, each of them mapped to the *same*
cooling device but with different min-max states. That means the
handle trip points loop will go through all these seven trip points
and change the state of the cooling device seven times.

On the other hand, a thermal zone can have two different cooling
devices mapped to two trip points, shall we consider having both
cooling devices acting together when the second trip point is crossed
(eg. 1st trip point: cpufreq cooling device + 2nd trip point: fan), or
the second cooling device takes over the first one ? IOW, combine the
cooling effects or not ?

Depending on the expected behavior, the loop can be done in the
reverse order and exit after processing the highest trip point.

Cc: Lukasz Luba 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Zhang Rui 
Cc: Thara Gopinath 
Cc: Amit Kucheria 
Cc: Len Brown 
Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9d6a7b7f8ab4..d745b62a63af 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -562,8 +562,9 @@ void thermal_zone_device_update(struct thermal_zone_device 
*tz,
 
tz->notify_event = event;
 
-   for (count = 0; count < tz->trips; count++)
-   handle_thermal_trip(tz, count);
+   for (count = tz->trips - 1; count >= 0; count--)
+   if (handle_thermal_trip(tz, count))
+   break;
 
monitor_thermal_zone(tz);
 }
-- 
2.17.1

Re: [PATCH v3 2/2] clocksource: arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI

2020-12-05 Thread Daniel Lezcano



Hi Marc,

On 05/12/2020 19:22, Marc Zyngier wrote:
> Hi Daniel,
> 
> On 2020-12-05 11:15, Daniel Lezcano wrote:
>> Hi Marc,
>>
>> are you fine with this patch ?
> 
> I am, although there still isn't any justification for the pos/lsb
> rework in the commit message (and calling that variable lsb is somewhat
> confusing). If you are going to apply it, please consider adding
> the additional comment below.

Ok, I will do that.

Thanks for the additional comment

  -- Daniel


>> On 04/12/2020 08:31, Keqian Zhu wrote:
>>> ARM virtual counter supports event stream, it can only trigger an event
>>> when the trigger bit (the value of CNTKCTL_EL1.EVNTI) of CNTVCT_EL0
>>> changes,
>>> so the actual period of event stream is 2^(cntkctl_evnti + 1). For
>>> example,
>>> when the trigger bit is 0, then virtual counter trigger an event for
>>> every
>>> two cycles.
> 
> "While we're at it, rework the way we compute the trigger bit position by
>  making it more obvious that when bits [n:n-1] are both set (with n being
>  the most significant bit), we pick bit (n + 1)."
> 
> With that:
> 
> Acked-by: Marc Zyngier 
> 
> Thanks,
> 
>     M.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCHv2] clocksource: dw_apb_timer_of: add error handling if no clock available

2020-12-05 Thread Daniel Lezcano

On 05/12/2020 11:55, Dinh Nguyen wrote:
> Hi Daniel,
> 
> On 12/5/20 2:50 AM, Daniel Lezcano wrote:
>> On 04/12/2020 23:39, Dinh Nguyen wrote:
>>>
>>>
>>> On 12/4/20 2:00 PM, Daniel Lezcano wrote:
>>>> On 04/12/2020 16:36, Dinh Nguyen wrote:
>>>>> commit ("b0fc70ce1f02 arm64: berlin: Select DW_APB_TIMER_OF") added
>>>>> the
>>>>> support for the dw_apb_timer into the arm64 defconfig. However, for
>>>>> some
>>>>> platforms like the Intel Stratix10 and Agilex, the clock manager
>>>>> doesn't
>>>>> get loaded until after the timer driver get loaded. Thus, the driver
>>>>> hits
>>>>> the panic "No clock nor clock-frequency property for" because it
>>>>> cannot
>>>>> properly get the clock.
>>>>>
>>>>> This patch adds the error handling needed for the timer driver so that
>>>>> the kernel can continue booting instead of just hitting the panic.
>>>>>
>>>>> Signed-off-by: Dinh Nguyen 
>>>>
>>>> Did you have time to test the different combinations ?
>>>
>>> I did test both versions and did not see any difference between the two.
>>> On both versions, the kernel was able to continue to boot after trying
>>> to probe the timer driver.
>>
>> Great, thanks!
>>
> 
> I forgot to test this on ARM 32-bit system that actually uses one of
> these timers as a clocksource. The v2 patch would fail. The return of
> PTR_ERR(timer_clk) needs an IS_ERR(timer_clk) check.
> 
> I have sent a v3.
> 
> Sorry about that.

No problem, thanks for resending.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH] drivers: thermal: Add NULL pointer check before using cooling device stats

2020-12-05 Thread Daniel Lezcano

On 04/12/2020 20:09, Manaf Meethalavalappu Pallikunhi wrote:
> There is a possible chance that some cooling device stats buffer
> allocation fails due to very high cooling device max state value.
> Later cooling device update or cooling stats sysfs will try to
> access stats data for the same cooling device. It will lead to
> NULL pointer dereference issue.
> 
> Add a NULL pointer check before accessing thermal cooling device
> stats data. It fixes the following bug
> 
> [ 26.812833] Unable to handle kernel NULL pointer dereference at virtual 
> address 0004
> [ 27.122960] Call trace:
> [ 27.122963] do_raw_spin_lock+0x18/0xe8
> [ 27.122966] _raw_spin_lock+0x24/0x30
> [ 27.128157] thermal_cooling_device_stats_update+0x24/0x98
> [ 27.128162] cur_state_store+0x88/0xb8
> [ 27.128166] dev_attr_store+0x40/0x58
> [ 27.128169] sysfs_kf_write+0x50/0x68
> [ 27.133358] kernfs_fop_write+0x12c/0x1c8
> [ 27.133362] __vfs_write+0x54/0x160
> [ 27.152297] vfs_write+0xcc/0x188
> [ 27.157132] ksys_write+0x78/0x108
> [ 27.162050] ksys_write+0xf8/0x108
> [ 27.166968] __arm_smccc_hvc+0x158/0x4b0
> [ 27.166973] __arm_smccc_hvc+0x9c/0x4b0
> [ 27.186005] el0_svc+0x8/0xc
> 
> Signed-off-by: Manaf Meethalavalappu Pallikunhi 

The only place where it can crash is when the
thermal_cooling_device_stats_update() function is called.

The other places in show*/store* in the stats directory are inaccessible
as the sysfs entry is not showed up due to the
thermal_cooling_device_stats_setup() failing.

It would have been nice if the thermal_cooling_device_stats_update() was
not called at all but I don't see how we can do that without static keys
which is overkill for a degraded mode.

I guess having the kzallocation warning in the console output is enough
to warn the user the system is working without the stats for the cooling
device. That should not prevent the system functioning.

Can you resend with the check in thermal_cooling_device_stats_update() only?

Thanks

  -- Daniel


> ---
>  drivers/thermal/thermal_sysfs.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 473449b..a5e4855 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -827,6 +827,9 @@ void thermal_cooling_device_stats_update(struct 
> thermal_cooling_device *cdev,
>  {
>   struct cooling_dev_stats *stats = cdev->stats;
>  
> + if (!stats)
> + return;
> +
>   spin_lock(>lock);
>  
>   if (stats->state == new_state)
> @@ -848,6 +851,9 @@ static ssize_t total_trans_show(struct device *dev,
>   struct cooling_dev_stats *stats = cdev->stats;
>   int ret;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   spin_lock(>lock);
>   ret = sprintf(buf, "%u\n", stats->total_trans);
>   spin_unlock(>lock);
> @@ -864,6 +870,9 @@ time_in_state_ms_show(struct device *dev, struct 
> device_attribute *attr,
>   ssize_t len = 0;
>   int i;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   spin_lock(>lock);
>   update_time_in_state(stats);
>  
> @@ -882,8 +891,12 @@ reset_store(struct device *dev, struct device_attribute 
> *attr, const char *buf,
>  {
>   struct thermal_cooling_device *cdev = to_cooling_device(dev);
>   struct cooling_dev_stats *stats = cdev->stats;
> - int i, states = stats->max_states;
> + int i, states;
> +
> + if (!stats)
> + return -ENODEV;
>  
> + states = stats->max_states;
>   spin_lock(>lock);
>  
>   stats->total_trans = 0;
> @@ -907,6 +920,9 @@ static ssize_t trans_table_show(struct device *dev,
>   ssize_t len = 0;
>   int i, j;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   len += snprintf(buf + len, PAGE_SIZE - len, " From  :To\n");
>   len += snprintf(buf + len, PAGE_SIZE - len, "   : ");
>   for (i = 0; i < stats->max_states; i++) {
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v3 2/2] clocksource: arm_arch_timer: Correct fault programming of CNTKCTL_EL1.EVNTI

2020-12-05 Thread Daniel Lezcano




Hi Marc,

are you fine with this patch ?


On 04/12/2020 08:31, Keqian Zhu wrote:
> ARM virtual counter supports event stream, it can only trigger an event
> when the trigger bit (the value of CNTKCTL_EL1.EVNTI) of CNTVCT_EL0 changes,
> so the actual period of event stream is 2^(cntkctl_evnti + 1). For example,
> when the trigger bit is 0, then virtual counter trigger an event for every
> two cycles.
> 
> Fixes: 037f637767a8 ("drivers: clocksource: add support for ARM architected 
> timer event stream")
> Suggested-by: Marc Zyngier 
> Signed-off-by: Keqian Zhu 
> ---
>  drivers/clocksource/arm_arch_timer.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index 777d38cb39b0..d0177824c518 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -822,15 +822,24 @@ static void arch_timer_evtstrm_enable(int divider)
>  
>  static void arch_timer_configure_evtstream(void)
>  {
> - int evt_stream_div, pos;
> + int evt_stream_div, lsb;
> +
> + /*
> +  * As the event stream can at most be generated at half the frequency
> +  * of the counter, use half the frequency when computing the divider.
> +  */
> + evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ / 2;
> +
> + /*
> +  * Find the closest power of two to the divisor. If the adjacent bit
> +  * of lsb (last set bit, starts from 0) is set, then we use (lsb + 1).
> +  */
> + lsb = fls(evt_stream_div) - 1;
> + if (lsb > 0 && (evt_stream_div & BIT(lsb - 1)))
> + lsb++;
>  
> - /* Find the closest power of two to the divisor */
> - evt_stream_div = arch_timer_rate / ARCH_TIMER_EVT_STREAM_FREQ;
> - pos = fls(evt_stream_div);
> - if (pos > 1 && !(evt_stream_div & (1 << (pos - 2
> - pos--;
>   /* enable event stream */
> - arch_timer_evtstrm_enable(min(pos, 15));
> + arch_timer_evtstrm_enable(max(0, min(lsb, 15)));
>  }
>  
>  static void arch_counter_set_user_access(void)
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCHv2] clocksource: dw_apb_timer_of: add error handling if no clock available

2020-12-05 Thread Daniel Lezcano

On 04/12/2020 23:39, Dinh Nguyen wrote:
> 
> 
> On 12/4/20 2:00 PM, Daniel Lezcano wrote:
>> On 04/12/2020 16:36, Dinh Nguyen wrote:
>>> commit ("b0fc70ce1f02 arm64: berlin: Select DW_APB_TIMER_OF") added the
>>> support for the dw_apb_timer into the arm64 defconfig. However, for some
>>> platforms like the Intel Stratix10 and Agilex, the clock manager doesn't
>>> get loaded until after the timer driver get loaded. Thus, the driver
>>> hits
>>> the panic "No clock nor clock-frequency property for" because it cannot
>>> properly get the clock.
>>>
>>> This patch adds the error handling needed for the timer driver so that
>>> the kernel can continue booting instead of just hitting the panic.
>>>
>>> Signed-off-by: Dinh Nguyen 
>>
>> Did you have time to test the different combinations ?
> 
> I did test both versions and did not see any difference between the two.
> On both versions, the kernel was able to continue to boot after trying
> to probe the timer driver.

Great, thanks!


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCHv2] clocksource: dw_apb_timer_of: add error handling if no clock available

2020-12-04 Thread Daniel Lezcano

On 04/12/2020 16:36, Dinh Nguyen wrote:
> commit ("b0fc70ce1f02 arm64: berlin: Select DW_APB_TIMER_OF") added the
> support for the dw_apb_timer into the arm64 defconfig. However, for some
> platforms like the Intel Stratix10 and Agilex, the clock manager doesn't
> get loaded until after the timer driver get loaded. Thus, the driver hits
> the panic "No clock nor clock-frequency property for" because it cannot
> properly get the clock.
> 
> This patch adds the error handling needed for the timer driver so that
> the kernel can continue booting instead of just hitting the panic.
> 
> Signed-off-by: Dinh Nguyen 

Did you have time to test the different combinations ?

> ---
> v2: address comments from Daniel Lezcano
> update commit message
> ---
>  drivers/clocksource/dw_apb_timer_of.c | 60 ++-
>  1 file changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/clocksource/dw_apb_timer_of.c 
> b/drivers/clocksource/dw_apb_timer_of.c
> index ab3ddebe8344..809f4c9327f9 100644
> --- a/drivers/clocksource/dw_apb_timer_of.c
> +++ b/drivers/clocksource/dw_apb_timer_of.c
> @@ -14,12 +14,13 @@
>  #include 
>  #include 
>  
> -static void __init timer_get_base_and_rate(struct device_node *np,
> +static int __init timer_get_base_and_rate(struct device_node *np,
>   void __iomem **base, u32 *rate)
>  {
>   struct clk *timer_clk;
>   struct clk *pclk;
>   struct reset_control *rstc;
> + int ret;
>  
>   *base = of_iomap(np, 0);
>  
> @@ -46,55 +47,68 @@ static void __init timer_get_base_and_rate(struct 
> device_node *np,
>   pr_warn("pclk for %pOFn is present, but could not be 
> activated\n",
>   np);
>  
> + if (!of_property_read_u32(np, "clock-freq", rate) &&
> + !of_property_read_u32(np, "clock-frequency", rate))
> + return 0;
> +
>   timer_clk = of_clk_get_by_name(np, "timer");
> - if (IS_ERR(timer_clk))
> - goto try_clock_freq;
> + ret = PTR_ERR(timer_clk);
> + if (ret)
> + return ret;
>  
> - if (!clk_prepare_enable(timer_clk)) {
> - *rate = clk_get_rate(timer_clk);
> - return;
> - }
> + ret = clk_prepare_enable(timer_clk);
> + if (ret)
> + return ret;
> +
> + *rate = clk_get_rate(timer_clk);
> + if (!(*rate))
> + return -EINVAL;
>  
> -try_clock_freq:
> - if (of_property_read_u32(np, "clock-freq", rate) &&
> - of_property_read_u32(np, "clock-frequency", rate))
> - panic("No clock nor clock-frequency property for %pOFn", np);
> + return 0;
>  }
>  
> -static void __init add_clockevent(struct device_node *event_timer)
> +static int __init add_clockevent(struct device_node *event_timer)
>  {
>   void __iomem *iobase;
>   struct dw_apb_clock_event_device *ced;
>   u32 irq, rate;
> + int ret = 0;
>  
>   irq = irq_of_parse_and_map(event_timer, 0);
>   if (irq == 0)
>   panic("No IRQ for clock event timer");
>  
> - timer_get_base_and_rate(event_timer, , );
> + ret = timer_get_base_and_rate(event_timer, , );
> + if (ret)
> + return ret;
>  
>   ced = dw_apb_clockevent_init(-1, event_timer->name, 300, iobase, irq,
>rate);
>   if (!ced)
> - panic("Unable to initialise clockevent device");
> + return -EINVAL;
>  
>   dw_apb_clockevent_register(ced);
> +
> + return 0;
>  }
>  
>  static void __iomem *sched_io_base;
>  static u32 sched_rate;
>  
> -static void __init add_clocksource(struct device_node *source_timer)
> +static int __init add_clocksource(struct device_node *source_timer)
>  {
>   void __iomem *iobase;
>   struct dw_apb_clocksource *cs;
>   u32 rate;
> + int ret;
>  
> - timer_get_base_and_rate(source_timer, , );
> + ret = timer_get_base_and_rate(source_timer, , );
> + if (ret)
> + return ret;
>  
>   cs = dw_apb_clocksource_init(300, source_timer->name, iobase, rate);
>   if (!cs)
> - panic("Unable to initialise clocksource device");
> + return -EINVAL;
>  
>   dw_apb_clocksource_start(cs);
>   dw_apb_clocksource_register(cs);
> @@ -106,6 +120,8 @@ static void __init add_clocksource(struct device_node 
> *source_timer)
>*/
>   sched_io_base = iobase +

Re: [PATCH] drivers: thermal: Add NULL pointer check before using cooling device stats

2020-12-04 Thread Daniel Lezcano

On 04/12/2020 20:09, Manaf Meethalavalappu Pallikunhi wrote:
> There is a possible chance that some cooling device stats buffer
> allocation fails due to very high cooling device max state value.
> Later cooling device update or cooling stats sysfs will try to
> access stats data for the same cooling device. It will lead to
> NULL pointer dereference issue.
> 
> Add a NULL pointer check before accessing thermal cooling device
> stats data. It fixes the following bug

Why not create the 'stats' dir if the setup fails ?

> [ 26.812833] Unable to handle kernel NULL pointer dereference at virtual 
> address 0004
> [ 27.122960] Call trace:
> [ 27.122963] do_raw_spin_lock+0x18/0xe8
> [ 27.122966] _raw_spin_lock+0x24/0x30
> [ 27.128157] thermal_cooling_device_stats_update+0x24/0x98
> [ 27.128162] cur_state_store+0x88/0xb8
> [ 27.128166] dev_attr_store+0x40/0x58
> [ 27.128169] sysfs_kf_write+0x50/0x68
> [ 27.133358] kernfs_fop_write+0x12c/0x1c8
> [ 27.133362] __vfs_write+0x54/0x160
> [ 27.152297] vfs_write+0xcc/0x188
> [ 27.157132] ksys_write+0x78/0x108
> [ 27.162050] ksys_write+0xf8/0x108
> [ 27.166968] __arm_smccc_hvc+0x158/0x4b0
> [ 27.166973] __arm_smccc_hvc+0x9c/0x4b0
> [ 27.186005] el0_svc+0x8/0xc
> 
> Signed-off-by: Manaf Meethalavalappu Pallikunhi 
> ---
>  drivers/thermal/thermal_sysfs.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 473449b..a5e4855 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -827,6 +827,9 @@ void thermal_cooling_device_stats_update(struct 
> thermal_cooling_device *cdev,
>  {
>   struct cooling_dev_stats *stats = cdev->stats;
>  
> + if (!stats)
> + return;
> +
>   spin_lock(>lock);
>  
>   if (stats->state == new_state)
> @@ -848,6 +851,9 @@ static ssize_t total_trans_show(struct device *dev,
>   struct cooling_dev_stats *stats = cdev->stats;
>   int ret;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   spin_lock(>lock);
>   ret = sprintf(buf, "%u\n", stats->total_trans);
>   spin_unlock(>lock);
> @@ -864,6 +870,9 @@ time_in_state_ms_show(struct device *dev, struct 
> device_attribute *attr,
>   ssize_t len = 0;
>   int i;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   spin_lock(>lock);
>   update_time_in_state(stats);
>  
> @@ -882,8 +891,12 @@ reset_store(struct device *dev, struct device_attribute 
> *attr, const char *buf,
>  {
>   struct thermal_cooling_device *cdev = to_cooling_device(dev);
>   struct cooling_dev_stats *stats = cdev->stats;
> - int i, states = stats->max_states;
> + int i, states;
> +
> + if (!stats)
> + return -ENODEV;
>  
> + states = stats->max_states;
>   spin_lock(>lock);
>  
>   stats->total_trans = 0;
> @@ -907,6 +920,9 @@ static ssize_t trans_table_show(struct device *dev,
>   ssize_t len = 0;
>   int i, j;
>  
> + if (!stats)
> + return -ENODEV;
> +
>   len += snprintf(buf + len, PAGE_SIZE - len, " From  :To\n");
>   len += snprintf(buf + len, PAGE_SIZE - len, "   : ");
>   for (i = 0; i < stats->max_states; i++) {
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2 2/2] platform/x86/drivers/acerhdf: Check the interval value when it is set

2020-12-04 Thread Daniel Lezcano

On 03/12/2020 22:22, Peter Kästle wrote:
> 3. Dezember 2020 08:17, "Daniel Lezcano"  schrieb:
> 
>> Currently the code checks the interval value when the temperature is
>> read which is bad for two reasons:
>>
>> - checking and setting the interval in the get_temp callback is
>> inaccurate and awful, that can be done when changing the value.
>>
>> - Changing the thermal zone structure internals is an abuse of the
>> exported structure, moreover no lock is taken here.
>>
>> The goal of this patch is to solve the first item by using the 'set'
>> function called when changing the interval. The check is done there
>> and removed from the get_temp function. If the thermal zone was not
>> initialized yet, the interval is not updated in this case as that will
>> happen in the init function when registering the thermal zone device.
> 
> Thanks for your effort.  This improves the code, good finding.
> 
>  
>> I don't have any hardware to test the changes.
> 
> Tests successfully executed on my good old AOA110.
> 
> 
>> Signed-off-by: Daniel Lezcano 
> 
> Acked-by: Peter Kaestle 

Thanks for testing the changes.

Shall pick the patches through the thermal tree ?


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

[tip: timers/core] clocksource/drivers/ingenic: Fix section mismatch

2020-12-03 Thread tip-bot2 for Daniel Lezcano

The following commit has been merged into the timers/core branch of tip:

Commit-ID: 5bd7cb29eceb52e4b108917786fdbf2a2c2048ef
Gitweb:
https://git.kernel.org/tip/5bd7cb29eceb52e4b108917786fdbf2a2c2048ef
Author:Daniel Lezcano 
AuthorDate:Wed, 25 Nov 2020 11:23:45 +01:00
Committer: Daniel Lezcano 
CommitterDate: Thu, 03 Dec 2020 19:16:26 +01:00

clocksource/drivers/ingenic: Fix section mismatch

The function ingenic_tcu_get_clock() is annotated for the __init
section but it is actually called from the online cpu callback.

That will lead to a crash if a CPU is hotplugged after boot time.

Remove the __init annotation for the ingenic_tcu_get_clock()
function.

Fixes: f19d838d08fc (clocksource/drivers/ingenic: Add high resolution timer 
support for SMP/SMT)
Reported-by: kernel test robot 
Signed-off-by: Daniel Lezcano 
Reviewed-by: Paul Cercueil 
Tested-by: 周琰杰 (Zhou Yanjie) 
Link: 
https://lore.kernel.org/r/20201125102346.1816310-1-daniel.lezc...@linaro.org
---
 drivers/clocksource/ingenic-timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/ingenic-timer.c 
b/drivers/clocksource/ingenic-timer.c
index 58fd918..905fd6b 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -127,7 +127,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static struct clk * __init ingenic_tcu_get_clock(struct device_node *np, int 
id)
+static struct clk *ingenic_tcu_get_clock(struct device_node *np, int id)
 {
struct of_phandle_args args;

[PATCH 13/13] clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI

2020-12-03 Thread Daniel Lezcano

From: Kefeng Wang 

The riscv timer is set via SBI timer call, let's make RISCV_TIMER
depends on RISCV_SBI, and it also fixes some build issue.

Fixes: d5be89a8d118 ("RISC-V: Resurrect the MMIO timer implementation for 
M-mode systems")
Signed-off-by: Kefeng Wang 
Reviewed-by: Palmer Dabbelt 
Acked-by: Palmer Dabbelt 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20201028131230.72907-1-wangkefeng.w...@huawei.com
---
 drivers/clocksource/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 390c27cd926d..9f00b8385fd4 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -644,7 +644,7 @@ config ATCPIT100_TIMER
 
 config RISCV_TIMER
bool "Timer for the RISC-V platform" if COMPILE_TEST
-   depends on GENERIC_SCHED_CLOCK && RISCV
+   depends on GENERIC_SCHED_CLOCK && RISCV && RISCV_SBI
select TIMER_PROBE
select TIMER_OF
help
-- 
2.25.1

[PATCH 09/13] dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings

2020-12-03 Thread Daniel Lezcano

From: Marian-Cristian Rotariu 

Document RZ/G2H (R8A774E1) SoC in the Renesas TMU bindings.

Signed-off-by: Marian-Cristian Rotariu 

Signed-off-by: Lad Prabhakar 
Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Niklas Söderlund 
Signed-off-by: Daniel Lezcano 
Link: https://lore.kernel.org/r/20201110162014.3290109-2-geert+rene...@glider.be
---
 Documentation/devicetree/bindings/timer/renesas,tmu.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/timer/renesas,tmu.txt 
b/Documentation/devicetree/bindings/timer/renesas,tmu.txt
index 29159f4e65ab..a36cd61e74fb 100644
--- a/Documentation/devicetree/bindings/timer/renesas,tmu.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,tmu.txt
@@ -13,6 +13,7 @@ Required Properties:
 - "renesas,tmu-r8a774a1" for the r8a774A1 TMU
 - "renesas,tmu-r8a774b1" for the r8a774B1 TMU
 - "renesas,tmu-r8a774c0" for the r8a774C0 TMU
+- "renesas,tmu-r8a774e1" for the r8a774E1 TMU
 - "renesas,tmu-r8a7778" for the r8a7778 TMU
 - "renesas,tmu-r8a7779" for the r8a7779 TMU
 - "renesas,tmu-r8a77970" for the r8a77970 TMU
-- 
2.25.1

[PATCH 11/13] clocksource/drivers/cadence_ttc: Fix memory leak in ttc_setup_clockevent()

2020-12-03 Thread Daniel Lezcano

From: Yu Kuai 

If clk_notifier_register() failed, ttc_setup_clockevent() will return
without freeing 'ttcce', which will leak memory.

Fixes: 70504f311d4b ("clocksource/drivers/cadence_ttc: Convert init function to 
return error")
Reported-by: Hulk Robot 
Signed-off-by: Yu Kuai 
Signed-off-by: Daniel Lezcano 
Link: https://lore.kernel.org/r/20201116135123.2164033-1-yuku...@huawei.com
---
 drivers/clocksource/timer-cadence-ttc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/clocksource/timer-cadence-ttc.c 
b/drivers/clocksource/timer-cadence-ttc.c
index 80e960602030..4efd0cf3b602 100644
--- a/drivers/clocksource/timer-cadence-ttc.c
+++ b/drivers/clocksource/timer-cadence-ttc.c
@@ -413,10 +413,8 @@ static int __init ttc_setup_clockevent(struct clk *clk,
ttcce->ttc.clk = clk;
 
err = clk_prepare_enable(ttcce->ttc.clk);
-   if (err) {
-   kfree(ttcce);
-   return err;
-   }
+   if (err)
+   goto out_kfree;
 
ttcce->ttc.clk_rate_change_nb.notifier_call =
ttc_rate_change_clockevent_cb;
@@ -426,7 +424,7 @@ static int __init ttc_setup_clockevent(struct clk *clk,
>ttc.clk_rate_change_nb);
if (err) {
pr_warn("Unable to register clock notifier.\n");
-   return err;
+   goto out_kfree;
}
 
ttcce->ttc.freq = clk_get_rate(ttcce->ttc.clk);
@@ -455,15 +453,17 @@ static int __init ttc_setup_clockevent(struct clk *clk,
 
err = request_irq(irq, ttc_clock_event_interrupt,
  IRQF_TIMER, ttcce->ce.name, ttcce);
-   if (err) {
-   kfree(ttcce);
-   return err;
-   }
+   if (err)
+   goto out_kfree;
 
clockevents_config_and_register(>ce,
ttcce->ttc.freq / PRESCALE, 1, 0xfffe);
 
return 0;
+
+out_kfree:
+   kfree(ttcce);
+   return err;
 }
 
 static int __init ttc_timer_probe(struct platform_device *pdev)
-- 
2.25.1

[PATCH 10/13] dt-bindings: timer: renesas: tmu: Convert to json-schema

2020-12-03 Thread Daniel Lezcano

From: Geert Uytterhoeven 

Convert the Renesas R-Mobile/R-Car Timer Unit (TMU) Device Tree binding
documentation to json-schema.

Document missing properties.
Update the example to match reality.

Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Rob Herring 
Signed-off-by: Daniel Lezcano 
Reviewed-by: Niklas Söderlund 
Link: https://lore.kernel.org/r/20201110162014.3290109-3-geert+rene...@glider.be
---
 .../devicetree/bindings/timer/renesas,tmu.txt | 50 --
 .../bindings/timer/renesas,tmu.yaml   | 99 +++
 2 files changed, 99 insertions(+), 50 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.txt
 create mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.yaml

diff --git a/Documentation/devicetree/bindings/timer/renesas,tmu.txt 
b/Documentation/devicetree/bindings/timer/renesas,tmu.txt
deleted file mode 100644
index a36cd61e74fb..
--- a/Documentation/devicetree/bindings/timer/renesas,tmu.txt
+++ /dev/null
@@ -1,50 +0,0 @@
-* Renesas R-Mobile/R-Car Timer Unit (TMU)
-
-The TMU is a 32-bit timer/counter with configurable clock inputs and
-programmable compare match.
-
-Channels share hardware resources but their counter and compare match value
-are independent. The TMU hardware supports up to three channels.
-
-Required Properties:
-
-  - compatible: must contain one or more of the following:
-- "renesas,tmu-r8a7740" for the r8a7740 TMU
-- "renesas,tmu-r8a774a1" for the r8a774A1 TMU
-- "renesas,tmu-r8a774b1" for the r8a774B1 TMU
-- "renesas,tmu-r8a774c0" for the r8a774C0 TMU
-- "renesas,tmu-r8a774e1" for the r8a774E1 TMU
-- "renesas,tmu-r8a7778" for the r8a7778 TMU
-- "renesas,tmu-r8a7779" for the r8a7779 TMU
-- "renesas,tmu-r8a77970" for the r8a77970 TMU
-- "renesas,tmu-r8a77980" for the r8a77980 TMU
-- "renesas,tmu" for any TMU.
-  This is a fallback for the above renesas,tmu-* entries
-
-  - reg: base address and length of the registers block for the timer module.
-
-  - interrupts: interrupt-specifier for the timer, one per channel.
-
-  - clocks: a list of phandle + clock-specifier pairs, one for each entry
-in clock-names.
-  - clock-names: must contain "fck" for the functional clock.
-
-Optional Properties:
-
-  - #renesas,channels: number of channels implemented by the timer, must be 2
-or 3 (if not specified the value defaults to 3).
-
-
-Example: R8A7779 (R-Car H1) TMU0 node
-
-   tmu0: timer@ffd8 {
-   compatible = "renesas,tmu-r8a7779", "renesas,tmu";
-   reg = <0xffd8 0x30>;
-   interrupts = <0 32 IRQ_TYPE_LEVEL_HIGH>,
-<0 33 IRQ_TYPE_LEVEL_HIGH>,
-<0 34 IRQ_TYPE_LEVEL_HIGH>;
-   clocks = <_clks R8A7779_CLK_TMU0>;
-   clock-names = "fck";
-
-   #renesas,channels = <3>;
-   };
diff --git a/Documentation/devicetree/bindings/timer/renesas,tmu.yaml 
b/Documentation/devicetree/bindings/timer/renesas,tmu.yaml
new file mode 100644
index ..c54188731a1b
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/renesas,tmu.yaml
@@ -0,0 +1,99 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/renesas,tmu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Renesas R-Mobile/R-Car Timer Unit (TMU)
+
+maintainers:
+  - Geert Uytterhoeven 
+  - Laurent Pinchart 
+
+description:
+  The TMU is a 32-bit timer/counter with configurable clock inputs and
+  programmable compare match.
+
+  Channels share hardware resources but their counter and compare match value
+  are independent. The TMU hardware supports up to three channels.
+
+properties:
+  compatible:
+items:
+  - enum:
+  - renesas,tmu-r8a7740  # R-Mobile A1
+  - renesas,tmu-r8a774a1 # RZ/G2M
+  - renesas,tmu-r8a774b1 # RZ/G2N
+  - renesas,tmu-r8a774c0 # RZ/G2E
+  - renesas,tmu-r8a774e1 # RZ/G2H
+  - renesas,tmu-r8a7778  # R-Car M1A
+  - renesas,tmu-r8a7779  # R-Car H1
+  - renesas,tmu-r8a77970 # R-Car V3M
+  - renesas,tmu-r8a77980 # R-Car V3H
+  - const: renesas,tmu
+
+  reg:
+maxItems: 1
+
+  interrupts:
+minItems: 2
+maxItems: 3
+
+  clocks:
+maxItems: 1
+
+  clock-names:
+const: fck
+
+  power-domains:
+maxItems: 1
+
+  resets:
+maxItems: 1
+
+  '#renesas,channels':
+description:
+  Number of channels implemented by the timer.
+$ref: /schemas/types.yaml#/definitions/uint32
+enum: [ 2, 3 ]
+default: 3
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - power-domains
+
+if:
+  not:
+properties:
+  compatible

[PATCH 12/13] clocksource/drivers/ingenic: Fix section mismatch

2020-12-03 Thread Daniel Lezcano

The function ingenic_tcu_get_clock() is annotated for the __init
section but it is actually called from the online cpu callback.

That will lead to a crash if a CPU is hotplugged after boot time.

Remove the __init annotation for the ingenic_tcu_get_clock()
function.

Fixes: f19d838d08fc (clocksource/drivers/ingenic: Add high resolution timer 
support for SMP/SMT)
Reported-by: kernel test robot 
Signed-off-by: Daniel Lezcano 
Reviewed-by: Paul Cercueil 
Tested-by: 周琰杰 (Zhou Yanjie) 
Link: 
https://lore.kernel.org/r/20201125102346.1816310-1-daniel.lezc...@linaro.org
---
 drivers/clocksource/ingenic-timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/ingenic-timer.c 
b/drivers/clocksource/ingenic-timer.c
index 58fd9189fab7..905fd6b163a8 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -127,7 +127,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static struct clk * __init ingenic_tcu_get_clock(struct device_node *np, int 
id)
+static struct clk *ingenic_tcu_get_clock(struct device_node *np, int id)
 {
struct of_phandle_args args;
 
-- 
2.25.1

[PATCH 08/13] clocksource/drivers/orion: Add missing clk_disable_unprepare() on error path

2020-12-03 Thread Daniel Lezcano

From: Yang Yingliang 

After calling clk_prepare_enable(), clk_disable_unprepare() need
be called on error path.

Fixes: fbe4b3566ddc ("clocksource/drivers/orion: Convert init function...")
Reported-by: Hulk Robot 
Signed-off-by: Yang Yingliang 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/2020064706.3397156-1-yangyingli...@huawei.com
---
 drivers/clocksource/timer-orion.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/clocksource/timer-orion.c 
b/drivers/clocksource/timer-orion.c
index d01ff4181867..5101e834d78f 100644
--- a/drivers/clocksource/timer-orion.c
+++ b/drivers/clocksource/timer-orion.c
@@ -143,7 +143,8 @@ static int __init orion_timer_init(struct device_node *np)
irq = irq_of_parse_and_map(np, 1);
if (irq <= 0) {
pr_err("%pOFn: unable to parse timer1 irq\n", np);
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out_unprep_clk;
}
 
rate = clk_get_rate(clk);
@@ -160,7 +161,7 @@ static int __init orion_timer_init(struct device_node *np)
clocksource_mmio_readl_down);
if (ret) {
pr_err("Failed to initialize mmio timer\n");
-   return ret;
+   goto out_unprep_clk;
}
 
sched_clock_register(orion_read_sched_clock, 32, rate);
@@ -170,7 +171,7 @@ static int __init orion_timer_init(struct device_node *np)
  "orion_event", NULL);
if (ret) {
pr_err("%pOFn: unable to setup irq\n", np);
-   return ret;
+   goto out_unprep_clk;
}
 
ticks_per_jiffy = (clk_get_rate(clk) + HZ/2) / HZ;
@@ -183,5 +184,9 @@ static int __init orion_timer_init(struct device_node *np)
orion_delay_timer_init(rate);
 
return 0;
+
+out_unprep_clk:
+   clk_disable_unprepare(clk);
+   return ret;
 }
 TIMER_OF_DECLARE(orion_timer, "marvell,orion-timer", orion_timer_init);
-- 
2.25.1

[PATCH 07/13] clocksource/drivers/nps: Remove EZChip NPS clocksource driver

2020-12-03 Thread Daniel Lezcano

From: Vineet Gupta 

NPS platform has been removed from ARC port and there are no in-tree
users of it now. So RIP !

Cc: Daniel Lezcano 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta 
Signed-off-by: Daniel Lezcano 
Link: https://lore.kernel.org/r/20201105212210.1891598-2-vgu...@synopsys.com
---
 drivers/clocksource/Kconfig |  10 --
 drivers/clocksource/Makefile|   1 -
 drivers/clocksource/timer-nps.c | 284 
 3 files changed, 295 deletions(-)
 delete mode 100644 drivers/clocksource/timer-nps.c

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 68b087bff59c..390c27cd926d 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -275,16 +275,6 @@ config CLKSRC_TI_32K
  This option enables support for Texas Instruments 32.768 Hz 
clocksource
  available on many OMAP-like platforms.
 
-config CLKSRC_NPS
-   bool "NPS400 clocksource driver" if COMPILE_TEST
-   depends on !PHYS_ADDR_T_64BIT
-   select CLKSRC_MMIO
-   select TIMER_OF if OF
-   help
- NPS400 clocksource support.
- It has a 64-bit counter with update rate up to 1000MHz.
- This counter is accessed via couple of 32-bit memory-mapped registers.
-
 config CLKSRC_STM32
bool "Clocksource for STM32 SoCs" if !ARCH_STM32
depends on OF && ARM && (ARCH_STM32 || COMPILE_TEST)
diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
index 1c444cc3bb44..3c75cbbf8533 100644
--- a/drivers/clocksource/Makefile
+++ b/drivers/clocksource/Makefile
@@ -56,7 +56,6 @@ obj-$(CONFIG_CLKSRC_QCOM) += timer-qcom.o
 obj-$(CONFIG_MTK_TIMER)+= timer-mediatek.o
 obj-$(CONFIG_CLKSRC_PISTACHIO) += timer-pistachio.o
 obj-$(CONFIG_CLKSRC_TI_32K)+= timer-ti-32k.o
-obj-$(CONFIG_CLKSRC_NPS)   += timer-nps.o
 obj-$(CONFIG_OXNAS_RPS_TIMER)  += timer-oxnas-rps.o
 obj-$(CONFIG_OWL_TIMER)+= timer-owl.o
 obj-$(CONFIG_MILBEAUT_TIMER)   += timer-milbeaut.o
diff --git a/drivers/clocksource/timer-nps.c b/drivers/clocksource/timer-nps.c
deleted file mode 100644
index 7b6bb0df96ae..
--- a/drivers/clocksource/timer-nps.c
+++ /dev/null
@@ -1,284 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define NPS_MSU_TICK_LOW   0xC8
-#define NPS_CLUSTER_OFFSET 8
-#define NPS_CLUSTER_NUM16
-
-/* This array is per cluster of CPUs (Each NPS400 cluster got 256 CPUs) */
-static void *nps_msu_reg_low_addr[NPS_CLUSTER_NUM] __read_mostly;
-
-static int __init nps_get_timer_clk(struct device_node *node,
-unsigned long *timer_freq,
-struct clk **clk)
-{
-   int ret;
-
-   *clk = of_clk_get(node, 0);
-   ret = PTR_ERR_OR_ZERO(*clk);
-   if (ret) {
-   pr_err("timer missing clk\n");
-   return ret;
-   }
-
-   ret = clk_prepare_enable(*clk);
-   if (ret) {
-   pr_err("Couldn't enable parent clk\n");
-   clk_put(*clk);
-   return ret;
-   }
-
-   *timer_freq = clk_get_rate(*clk);
-   if (!(*timer_freq)) {
-   pr_err("Couldn't get clk rate\n");
-   clk_disable_unprepare(*clk);
-   clk_put(*clk);
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
-s

[PATCH 06/13] dt-bindings: timer: Add new OST support for the upcoming new driver.

2020-12-03 Thread Daniel Lezcano

From: 周琰杰 (Zhou Yanjie) 

The new OST has one global timer and two or four percpu timers, so there
will be three combinations in the upcoming new OST driver: the original
GLOBAL_TIMER + PERCPU_TIMER, the new GLOBAL_TIMER + PERCPU_TIMER0/1 and
GLOBAL_TIMER + PERCPU_TIMER0/1/2/3, For this, add the macro definition
about OST_CLK_PERCPU_TIMER0/1/2/3. And in order to ensure that all the
combinations work normally, the original ABI values of OST_CLK_PERCPU_TIMER
and OST_CLK_GLOBAL_TIMER need to be exchanged to ensure that in any
combinations, the clock can be registered (by calling clk_hw_register())
from index 0.

Before this patch, OST_CLK_PERCPU_TIMER and OST_CLK_GLOBAL_TIMER are only
used in two places, one is when using "assigned-clocks" to configure the
clocks in the DTS file; the other is when registering the clocks in the
sysost driver. When the values of these two ABIs are exchanged, the ABI
value used by sysost driver when registering the clock, and the ABI value
used by DTS when configuring the clock using "assigned-clocks" will also
change accordingly. Therefore, there is no situation that causes the wrong
clock to the configured. Therefore, exchanging ABI values will not cause
errors in the existing codes when registering and configuring the clocks.

Currently, in the mainline, only X1000 and X1830 are using sysost driver,
and the upcoming X2000 will also use sysost driver. This patch has been
tested on all three SoCs and all works fine.

Tested-by: 周正 (Zhou Zheng) 
Signed-off-by: 周琰杰 (Zhou Yanjie) 
Reviewed-by: Rob Herring 
Signed-off-by: Daniel Lezcano 
Link: https://lore.kernel.org/r/20201026155842.10196-2-zhouyan...@wanyeetech.com
---
 include/dt-bindings/clock/ingenic,sysost.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/dt-bindings/clock/ingenic,sysost.h 
b/include/dt-bindings/clock/ingenic,sysost.h
index 9ac88e90babf..063791b01ab3 100644
--- a/include/dt-bindings/clock/ingenic,sysost.h
+++ b/include/dt-bindings/clock/ingenic,sysost.h
@@ -1,12 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * This header provides clock numbers for the ingenic,tcu DT binding.
+ * This header provides clock numbers for the Ingenic OST DT binding.
  */
 
 #ifndef __DT_BINDINGS_CLOCK_INGENIC_OST_H__
 #define __DT_BINDINGS_CLOCK_INGENIC_OST_H__
 
-#define OST_CLK_PERCPU_TIMER   0
-#define OST_CLK_GLOBAL_TIMER   1
+#define OST_CLK_PERCPU_TIMER   1
+#define OST_CLK_GLOBAL_TIMER   0
+#define OST_CLK_PERCPU_TIMER0  1
+#define OST_CLK_PERCPU_TIMER1  2
+#define OST_CLK_PERCPU_TIMER2  3
+#define OST_CLK_PERCPU_TIMER3  4
 
 #endif /* __DT_BINDINGS_CLOCK_INGENIC_OST_H__ */
-- 
2.25.1

[PATCH 04/13] clocksource/drivers/sp804: Correct clk_get_rate handle

2020-12-03 Thread Daniel Lezcano

From: Kefeng Wang 

clk_get_rate won't return negative value, correct clk_get_rate handle.

Signed-off-by: Kefeng Wang 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20201029123317.90286-4-wangkefeng.w...@huawei.com
---
 drivers/clocksource/timer-sp804.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index d74788b47802..fcce839670cb 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -58,7 +58,6 @@ static struct sp804_clkevt sp804_clkevt[NR_TIMERS];
 
 static long __init sp804_get_clock_rate(struct clk *clk, const char *name)
 {
-   long rate;
int err;
 
if (!clk)
@@ -75,14 +74,7 @@ static long __init sp804_get_clock_rate(struct clk *clk, 
const char *name)
return err;
}
 
-   rate = clk_get_rate(clk);
-   if (rate < 0) {
-   pr_err("sp804: clock failed to get rate: %ld\n", rate);
-   clk_disable_unprepare(clk);
-   clk_put(clk);
-   }
-
-   return rate;
+   return clk_get_rate(clk);
 }
 
 static struct sp804_clkevt * __init sp804_clkevt_get(void __iomem *base)
-- 
2.25.1

[PATCH 03/13] clocksource/drivers/sp804: Use clk_prepare_enable and clk_disable_unprepare

2020-12-03 Thread Daniel Lezcano

From: Kefeng Wang 

Directly use clk_prepare_enable and clk_disable_unprepare.

Signed-off-by: Kefeng Wang 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20201029123317.90286-3-wangkefeng.w...@huawei.com
---
 drivers/clocksource/timer-sp804.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index 22a68cb83cf3..d74788b47802 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -68,17 +68,9 @@ static long __init sp804_get_clock_rate(struct clk *clk, 
const char *name)
return PTR_ERR(clk);
}
 
-   err = clk_prepare(clk);
-   if (err) {
-   pr_err("sp804: clock failed to prepare: %d\n", err);
-   clk_put(clk);
-   return err;
-   }
-
-   err = clk_enable(clk);
+   err = clk_prepare_enable(clk);
if (err) {
pr_err("sp804: clock failed to enable: %d\n", err);
-   clk_unprepare(clk);
clk_put(clk);
return err;
}
@@ -86,8 +78,7 @@ static long __init sp804_get_clock_rate(struct clk *clk, 
const char *name)
rate = clk_get_rate(clk);
if (rate < 0) {
pr_err("sp804: clock failed to get rate: %ld\n", rate);
-   clk_disable(clk);
-   clk_unprepare(clk);
+   clk_disable_unprepare(clk);
clk_put(clk);
}
 
-- 
2.25.1

[PATCH 02/13] clocksource/drivers/sp804: Make some symbol static

2020-12-03 Thread Daniel Lezcano

From: Kefeng Wang 

drivers/clocksource/timer-sp804.c:38:31: warning: symbol 'arm_sp804_timer' was 
not declared. Should it be static?
drivers/clocksource/timer-sp804.c:47:31: warning: symbol 'hisi_sp804_timer' was 
not declared. Should it be static?
drivers/clocksource/timer-sp804.c:120:12: warning: symbol 
'sp804_clocksource_and_sched_clock_init' was not declared. Should it be static?
drivers/clocksource/timer-sp804.c:219:12: warning: symbol 
'sp804_clockevents_init' was not declared. Should it be static?

And move __initdata after the variables.

Signed-off-by: Kefeng Wang 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20201029123317.90286-2-wangkefeng.w...@huawei.com
---
 drivers/clocksource/timer-sp804.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index db5330cc49bc..22a68cb83cf3 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -34,8 +34,7 @@
 #define HISI_TIMER_BGLOAD  0x20
 #define HISI_TIMER_BGLOAD_H0x24
 
-
-struct sp804_timer __initdata arm_sp804_timer = {
+static struct sp804_timer arm_sp804_timer __initdata = {
.load   = TIMER_LOAD,
.value  = TIMER_VALUE,
.ctrl   = TIMER_CTRL,
@@ -44,7 +43,7 @@ struct sp804_timer __initdata arm_sp804_timer = {
.width  = 32,
 };
 
-struct sp804_timer __initdata hisi_sp804_timer = {
+static struct sp804_timer hisi_sp804_timer __initdata = {
.load   = HISI_TIMER_LOAD,
.load_h = HISI_TIMER_LOAD_H,
.value  = HISI_TIMER_VALUE,
-- 
2.25.1

[PATCH 05/13] clocksource/drivers/sp804: Use pr_fmt

2020-12-03 Thread Daniel Lezcano

From: Kefeng Wang 

Add pr_fmt to prefix pr_ output.

Signed-off-by: Kefeng Wang 
Signed-off-by: Daniel Lezcano 
Link: 
https://lore.kernel.org/r/20201029123317.90286-5-wangkefeng.w...@huawei.com
---
 drivers/clocksource/timer-sp804.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index fcce839670cb..401d592e85f5 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -5,6 +5,9 @@
  *  Copyright (C) 1999 - 2003 ARM Limited
  *  Copyright (C) 2000 Deep Blue Solutions Ltd
  */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -63,13 +66,13 @@ static long __init sp804_get_clock_rate(struct clk *clk, 
const char *name)
if (!clk)
clk = clk_get_sys("sp804", name);
if (IS_ERR(clk)) {
-   pr_err("sp804: %s clock not found: %ld\n", name, PTR_ERR(clk));
+   pr_err("%s clock not found: %ld\n", name, PTR_ERR(clk));
return PTR_ERR(clk);
}
 
err = clk_prepare_enable(clk);
if (err) {
-   pr_err("sp804: clock failed to enable: %d\n", err);
+   pr_err("clock failed to enable: %d\n", err);
clk_put(clk);
return err;
}
@@ -218,7 +221,7 @@ static int __init sp804_clockevents_init(void __iomem 
*base, unsigned int irq,
 
if (request_irq(irq, sp804_timer_interrupt, IRQF_TIMER | IRQF_IRQPOLL,
"timer", _clockevent))
-   pr_err("%s: request_irq() failed\n", "timer");
+   pr_err("request_irq() failed\n");
clockevents_config_and_register(evt, rate, 0xf, 0x);
 
return 0;
@@ -280,7 +283,7 @@ static int __init sp804_of_init(struct device_node *np, 
struct sp804_timer *time
if (of_clk_get_parent_count(np) == 3) {
clk2 = of_clk_get(np, 1);
if (IS_ERR(clk2)) {
-   pr_err("sp804: %pOFn clock not found: %d\n", np,
+   pr_err("%pOFn clock not found: %d\n", np,
(int)PTR_ERR(clk2));
clk2 = NULL;
}
-- 
2.25.1

[PATCH 01/13] clocksource/drivers/sp804: Add static for functions such as sp804_clockevents_init()

2020-12-03 Thread Daniel Lezcano

From: Zhen Lei 

Add static for sp804_clocksource_and_sched_clock_init() and
sp804_clockevents_init(), they are only used in timer-sp804.c now.
Otherwise, the following warning will be reported:

drivers/clocksource/timer-sp804.c:68:12: warning: no previous prototype \
for 'sp804_clocksource_and_sched_clock_init' [-Wmissing-prototypes]
drivers/clocksource/timer-sp804.c:162:12: warning: no previous prototype \
for 'sp804_clockevents_init' [-Wmissing-prototypes]

Fixes: 975434f8b24a ("clocksource/drivers/sp804: Delete the leading "__" of 
some functions")
Fixes: 65f4d7ddc7b6 ("clocksource/drivers/sp804: Remove unused 
sp804_timer_disable() and timer-sp804.h")
Reported-by: kernel test robot 
Signed-off-by: Zhen Lei 
Signed-off-by: Daniel Lezcano 
Link: https://lore.kernel.org/r/20201021012259.2067-2-thunder.leiz...@huawei.com
---
 drivers/clocksource/timer-sp804.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index 6e8ad4a4ea3c..db5330cc49bc 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -117,10 +117,10 @@ static u64 notrace sp804_read(void)
return ~readl_relaxed(sched_clkevt->value);
 }
 
-int __init sp804_clocksource_and_sched_clock_init(void __iomem *base,
- const char *name,
- struct clk *clk,
- int use_sched_clock)
+static int __init sp804_clocksource_and_sched_clock_init(void __iomem *base,
+const char *name,
+struct clk *clk,
+int use_sched_clock)
 {
long rate;
struct sp804_clkevt *clkevt;
@@ -216,8 +216,8 @@ static struct clock_event_device sp804_clockevent = {
.rating = 300,
 };
 
-int __init sp804_clockevents_init(void __iomem *base, unsigned int irq,
- struct clk *clk, const char *name)
+static int __init sp804_clockevents_init(void __iomem *base, unsigned int irq,
+struct clk *clk, const char *name)
 {
struct clock_event_device *evt = _clockevent;
long rate;
-- 
2.25.1

[GIT PULL] timer drivers for v5.11

2020-12-03 Thread Daniel Lezcano

The following changes since commit b996544916429946bf4934c1c01a306d1690972c:

  tick: Get rid of tick_period (2020-11-19 10:48:29 +0100)

are available in the Git repository at:

  ssh://g...@git.linaro.org/people/daniel.lezcano/linux.git tags/timers-v5.11

for you to fetch changes up to ab3105446f1ec4e98fadfc998ee24feec271c16c:

  clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI
(2020-12-03 19:16:26 +0100)


- Add static annotation for the sp804 init functions (Zhen Lei)

- Code cleanups and error code path at init time fixes on the sp804
  (Kefen Wang)

- Add new OST timer driver device tree bindings (Zhou Yanjie)

- Remove EZChip NPS clocksource driver corresponding to the NPS
  platform which was removed from the ARC architecture (Vineet Gupta)

- Add missing clk_disable_unprepare() on error path for Orion (Yang
  Yingliang)

- Add device tree bindings documentation for Renesas r8a774e1
  (Marian-Cristian Rotariu)

- Convert Renesas TMU to json-schema (Geert Uytterhoeven)

- Fix memory leak on the error path at init time on the cadence_ttc
  driver (Yu Kuai)

- Fix section mismatch for Ingenic timer driver (Daniel Lezcano)

- Make RISCV_TIMER depends on RISCV_SBI (Kefeng Wang)


Daniel Lezcano (1):
  clocksource/drivers/ingenic: Fix section mismatch

Geert Uytterhoeven (1):
  dt-bindings: timer: renesas: tmu: Convert to json-schema

Kefeng Wang (5):
  clocksource/drivers/sp804: Make some symbol static
  clocksource/drivers/sp804: Use clk_prepare_enable and
clk_disable_unprepare
  clocksource/drivers/sp804: Correct clk_get_rate handle
  clocksource/drivers/sp804: Use pr_fmt
  clocksource/drivers/riscv: Make RISCV_TIMER depends on RISCV_SBI

Marian-Cristian Rotariu (1):
  dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings

Vineet Gupta (1):
  clocksource/drivers/nps: Remove EZChip NPS clocksource driver

Yang Yingliang (1):
  clocksource/drivers/orion: Add missing clk_disable_unprepare() on
error path

Yu Kuai (1):
  clocksource/drivers/cadence_ttc: Fix memory leak in
ttc_setup_clockevent()

Zhen Lei (1):
  clocksource/drivers/sp804: Add static for functions such as
sp804_clockevents_init()

周琰杰 (Zhou Yanjie) (1):
  dt-bindings: timer: Add new OST support for the upcoming new driver.

 Documentation/devicetree/bindings/timer/renesas,tmu.txt  |  49
--
 Documentation/devicetree/bindings/timer/renesas,tmu.yaml |  99
+++
 drivers/clocksource/Kconfig  |  12 +--
 drivers/clocksource/Makefile |   1 -
 drivers/clocksource/ingenic-timer.c  |   2 +-
 drivers/clocksource/timer-cadence-ttc.c  |  18 +-
 drivers/clocksource/timer-nps.c  | 284
--
 drivers/clocksource/timer-orion.c|  11 --
 drivers/clocksource/timer-sp804.c|  49
+-
 include/dt-bindings/clock/ingenic,sysost.h   |  10 --
 10 files changed, 142 insertions(+), 393 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.txt
 create mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.yaml
 delete mode 100644 drivers/clocksource/timer-nps.c

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status

2020-12-03 Thread Daniel Lezcano

On 03/12/2020 16:38, Lukasz Luba wrote:
> 
> 
> On 12/3/20 1:09 PM, Daniel Lezcano wrote:
>> On 18/11/2020 13:03, Lukasz Luba wrote:
>>> Devfreq cooling needs to now the correct status of the device in order
>>> to operate. Do not rely on Devfreq last_status which might be a stale
>>> data
>>> and get more up-to-date values of the load.
>>>
>>> Devfreq framework can change the device status in the background. To
>>> mitigate this situation make a copy of the status structure and use it
>>> for internal calculations.
>>>
>>> In addition this patch adds normalization function, which also makes
>>> sure
>>> that whatever data comes from the device, it is in a sane range.
>>>
>>> Signed-off-by: Lukasz Luba 
>>> ---
>>>   drivers/thermal/devfreq_cooling.c | 52 +--
>>>   1 file changed, 43 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/thermal/devfreq_cooling.c
>>> b/drivers/thermal/devfreq_cooling.c
>>> index 659c0143c9f0..925523694462 100644
>>> --- a/drivers/thermal/devfreq_cooling.c
>>> +++ b/drivers/thermal/devfreq_cooling.c
>>> @@ -227,20 +227,46 @@ static inline unsigned long
>>> get_total_power(struct devfreq_cooling_device *dfc,
>>>  voltage);
>>>   }
>>>   +static void _normalize_load(struct devfreq_dev_status *status)
>>> +{
>>> +    /* Make some space if needed */
>>> +    if (status->busy_time > 0x) {
>>> +    status->busy_time >>= 10;
>>> +    status->total_time >>= 10;
>>> +    }
>>> +
>>> +    if (status->busy_time > status->total_time)
>>> +    status->busy_time = status->total_time;
>>
>> How the condition above is possible?
> 
> They should, be checked by the driver, but I cannot trust
> and have to check for all corner cases: (div by 0, overflow
> one of them, etc). The busy_time and total_time are unsigned long,
> which means 4B on 32bit machines.
> If these values are coming from device counters, which count every
> busy cycle and total cycles of a clock of a device running at e.g.
> 1GHz they would overflow every ~4s.

I don't think it is up to this routine to check the driver is correctly
implemented, especially at every call to get_requested_power.

If the normalization ends up by doing this kind of thing, there is
certainly something wrong in the 'status' computation to be fixed before
submitting this series.


> Normally IPA polling are 1s and 100ms, it's platform specific. But there
> are also 'empty' periods when IPA sees temperature very low and does not
> even call the .get_requested_power() callbacks for the cooling devices,
> just grants max freq to all. This is problematic. I am investigating it
> and will propose a solution for IPA soon.
> 
> I would avoid all of this if devfreq core would have default for all
> devices a reliable polling timer... Let me check some possibilities also
> for this case.

Ok, may be create an API to compute the 'idle,busy,total times' to be
used by the different the devfreq drivers and then fix the overflow in
this common place.

>>> +    status->busy_time *= 100;
>>> +    status->busy_time /= status->total_time ? : 1;
>>> +
>>> +    /* Avoid division by 0 */
>>> +    status->busy_time = status->busy_time ? : 1;
>>> +    status->total_time = 100;
>>
>> Why not base the normalization on 1024? and use an intermediate u64.
> 
> You are the 2nd reviewer who is asking this. I tried to keep 'load' as
> in range [0, 100] since we also have 'load' in cpufreq cooling in this
> range. Maybe I should switch to 1024 (Ionela was also asking for this).

Well it is common practice to compute normalization with 1024 because
the division is a bit shift and the compiler optimize the code very well
with that value.




-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH v2 3/5] thermal: devfreq_cooling: add new registration functions with Energy Model

2020-12-03 Thread Daniel Lezcano

On 18/11/2020 13:03, Lukasz Luba wrote:
> The Energy Model (EM) framework supports devices such as Devfreq. Create
> new registration functions which automatically register EM for the thermal
> devfreq_cooling devices. This patch prepares the code for coming changes
> which are going to replace old power model with the new EM.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  drivers/thermal/devfreq_cooling.c | 99 ++-
>  include/linux/devfreq_cooling.h   | 22 +++
>  2 files changed, 120 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/devfreq_cooling.c 
> b/drivers/thermal/devfreq_cooling.c
> index 925523694462..b354271742c5 100644
> --- a/drivers/thermal/devfreq_cooling.c
> +++ b/drivers/thermal/devfreq_cooling.c
> @@ -50,6 +50,8 @@ static DEFINE_IDA(devfreq_ida);
>   * @capped_state:index to cooling state with in dynamic power budget
>   * @req_max_freq:PM QoS request for limiting the maximum frequency
>   *   of the devfreq device.
> + * @em:  Energy Model for the associated Devfreq device
> + * @em_registered:   Devfreq cooling registered the EM and should free it.
>   */
>  struct devfreq_cooling_device {
>   int id;
> @@ -63,6 +65,8 @@ struct devfreq_cooling_device {
>   u32 res_util;
>   int capped_state;
>   struct dev_pm_qos_request req_max_freq;
> + struct em_perf_domain *em;

This pointer is not needed, it is in the struct device.

> + bool em_registered;

The boolean em_registered is not needed because of the test in the
function em_dev_unregister_perf_domain():

if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
return;

Logically if the 'em' was not initialized, it must be NULL, the
corresponding struct device was zero-allocated.


>  };
>  
>  static int devfreq_cooling_get_max_state(struct thermal_cooling_device *cdev,
> @@ -583,22 +587,115 @@ struct thermal_cooling_device 
> *devfreq_cooling_register(struct devfreq *df)
>  }
>  EXPORT_SYMBOL_GPL(devfreq_cooling_register);
>  
> +/**
> + * devfreq_cooling_em_register_power() - Register devfreq cooling device with
> + *   power information and attempt to register Energy Model (EM)
> + * @df:  Pointer to devfreq device.
> + * @dfc_power:   Pointer to devfreq_cooling_power.
> + * @em_cb:   Callback functions providing the data of the EM
> + *
> + * Register a devfreq cooling device and attempt to register Energy Model. 
> The
> + * available OPPs must be registered for the device.
> + *
> + * If @dfc_power is provided, the cooling device is registered with the
> + * power extensions. If @em_cb is provided it will be called for each OPP to
> + * calculate power value and cost. If @em_cb is not provided then simple 
> Energy
> + * Model is going to be used, which requires "dynamic-power-coefficient" a
> + * devicetree property.
> + */
> +struct thermal_cooling_device *
> +devfreq_cooling_em_register_power(struct devfreq *df,
> +   struct devfreq_cooling_power *dfc_power,
> +   struct em_data_callback *em_cb)
> +{
> + struct thermal_cooling_device *cdev;
> + struct devfreq_cooling_device *dfc;
> + struct device_node *np = NULL;
> + struct device *dev;
> + int nr_opp, ret;
> +
> + if (IS_ERR_OR_NULL(df))
> + return ERR_PTR(-EINVAL);
> +
> + dev = df->dev.parent;

Why the parent ?

> +
> + if (em_cb) {
> + nr_opp = dev_pm_opp_get_opp_count(dev);
> + if (nr_opp <= 0) {
> + dev_err(dev, "No valid OPPs found\n");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + ret = em_dev_register_perf_domain(dev, nr_opp, em_cb, NULL, 
> false);
> + } else {
> + ret = dev_pm_opp_of_register_em(dev, NULL);
> + }
> +
> + if (ret)
> + dev_warn(dev, "Unable to register EM for devfreq cooling device 
> (%d)\n",
> +  ret);
> +
> + if (dev->of_node)
> + np = of_node_get(dev->of_node);
> +
> + cdev = of_devfreq_cooling_register_power(np, df, dfc_power);
> +
> + if (np)
> + of_node_put(np);> +
> + if (IS_ERR_OR_NULL(cdev)) {
> + if (!ret)
> + em_dev_unregister_perf_domain(dev);
> + } else {
> + dfc = cdev->devdata;
> + dfc->em_registered = !ret;
> + }
> +
> + return cdev;
> +}
> +EXPORT_SYMBOL_GPL(devfreq_cooling_em_register_power);
> +
> +/**
> + * devfreq_cooling_em_register() - Register devfreq cooling device together
> + *   with Energy Model.
> + * @df:  Pointer to devfreq device.
> + * @em_cb:   Callback functions providing the data of the Energy Model
> + *
> + * This function attempts to register Energy Model for devfreq device and 
> then
> + * register the devfreq cooling device.
> + */
> +struct thermal_cooling_device *
> +devfreq_cooling_em_register(struct

Re: [PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status

2020-12-03 Thread Daniel Lezcano

On 18/11/2020 13:03, Lukasz Luba wrote:
> Devfreq cooling needs to now the correct status of the device in order
> to operate. Do not rely on Devfreq last_status which might be a stale data
> and get more up-to-date values of the load.
> 
> Devfreq framework can change the device status in the background. To
> mitigate this situation make a copy of the status structure and use it
> for internal calculations.
> 
> In addition this patch adds normalization function, which also makes sure
> that whatever data comes from the device, it is in a sane range.
> 
> Signed-off-by: Lukasz Luba 
> ---
>  drivers/thermal/devfreq_cooling.c | 52 +--
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/thermal/devfreq_cooling.c 
> b/drivers/thermal/devfreq_cooling.c
> index 659c0143c9f0..925523694462 100644
> --- a/drivers/thermal/devfreq_cooling.c
> +++ b/drivers/thermal/devfreq_cooling.c
> @@ -227,20 +227,46 @@ static inline unsigned long get_total_power(struct 
> devfreq_cooling_device *dfc,
>  voltage);
>  }
>  
> +static void _normalize_load(struct devfreq_dev_status *status)
> +{
> + /* Make some space if needed */
> + if (status->busy_time > 0x) {
> + status->busy_time >>= 10;
> + status->total_time >>= 10;
> + }
> +
> + if (status->busy_time > status->total_time)
> + status->busy_time = status->total_time;

How the condition above is possible?

> + status->busy_time *= 100;
> + status->busy_time /= status->total_time ? : 1;
> +
> + /* Avoid division by 0 */
> + status->busy_time = status->busy_time ? : 1;
> + status->total_time = 100;

Why not base the normalization on 1024? and use an intermediate u64.

For example:

static u32 _normalize_load(struct devfreq_dev_status *status)
{
u64 load = 0;

/* Prevent divison by zero */
if (!status->busy_time)
return 0;

/*
 * Assuming status->total_time is always greater or equal
 * to status->busy_time, it can not be equal to zero because
 * of the test above
 */
load = status->busy_time * 1024;
load /= status->total_time;

/*
 * load is always [1..1024[, so it can not be truncated by a
 * u64 -> u32 coercive cast
 */
return (u32)load;
}


> +}
>  
>  static int devfreq_cooling_get_requested_power(struct thermal_cooling_device 
> *cdev,
>  u32 *power)
>  {
>   struct devfreq_cooling_device *dfc = cdev->devdata;
>   struct devfreq *df = dfc->devfreq;
> - struct devfreq_dev_status *status = >last_status;
> + struct devfreq_dev_status status;
>   unsigned long state;
> - unsigned long freq = status->current_frequency;
> + unsigned long freq;
>   unsigned long voltage;
>   u32 dyn_power = 0;
>   u32 static_power = 0;
>   int res;
>  
> + mutex_lock(>lock);
> + res = df->profile->get_dev_status(df->dev.parent, );
> + mutex_unlock(>lock);
> + if (res)
> + return res;
> +
> + freq = status.current_frequency;
> +
>   state = freq_get_state(dfc, freq);
>   if (state == THERMAL_CSTATE_INVALID) {
>   res = -EAGAIN;
> @@ -268,16 +294,18 @@ static int devfreq_cooling_get_requested_power(struct 
> thermal_cooling_device *cd
>   } else {
>   dyn_power = dfc->power_table[state];
>  
> + _normalize_load();

load = _normalize_load();

> +
>   /* Scale dynamic power for utilization */
> - dyn_power *= status->busy_time;
> - dyn_power /= status->total_time;
> + dyn_power *= status.busy_time;
> + dyn_power /= status.total_time;

/*
 * May be change dyn_power to a u64 to prevent overflow
 * when multiplied by 'load'
 */
dyn_power = (dyn_power * load) / 1024;

>   /* Get static power */
>   static_power = get_static_power(dfc, freq);
>  
>   *power = dyn_power + static_power;
>   }
>  
> - trace_thermal_power_devfreq_get_power(cdev, status, freq, *power);
> + trace_thermal_power_devfreq_get_power(cdev, , freq, *power);
>  
>   return 0;
>  fail:
> @@ -309,14 +337,20 @@ static int devfreq_cooling_power2state(struct 
> thermal_cooling_device *cdev,
>  {
>   struct devfreq_cooling_device *dfc = cdev->devdata;
>   struct devfreq *df = dfc->devfreq;
> - struct devfreq_dev_status *status = >last_status;
> - unsigned long freq = status->current_frequency;
> + struct devfreq_dev_status status;
>   unsigned long busy_time;
> + unsigned long freq;
>   s32 dyn_power;
>   u32 static_power;
>   s32 est_power;
>   int i;
>  
> + mutex_lock(>lock);
> + status =

Re: [PATCH] clocksource: dw_apb_timer_of: return EPROBE_DEFER if no clock available

2020-12-03 Thread Daniel Lezcano



Hi Dinh,

On 19/11/2020 13:12, Dinh Nguyen wrote:
> commit ("b0fc70ce1f02 arm64: berlin: Select DW_APB_TIMER_OF") added the
> support for the dw_apb_timer into the arm64 defconfig. However, for some
> platforms like the Intel Stratix10 and Agilex, the clock manager doesn't
> get probed until after the timer driver is probed. Thus, the driver hits
> the panic "No clock nor clock-frequency property for %" because it cannot
> properly get the clock.
> 
> This patch adds support for EPROBE_DEFER so the kernel can come back to
> finish probing this timer driver after the clock driver is probed.
> 
> Signed-off-by: Dinh Nguyen 

A few comments below.

> ---
>  drivers/clocksource/dw_apb_timer_of.c | 86 ---
>  1 file changed, 51 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/clocksource/dw_apb_timer_of.c 
> b/drivers/clocksource/dw_apb_timer_of.c
> index ab3ddebe8344..a8ce980c5146 100644
> --- a/drivers/clocksource/dw_apb_timer_of.c
> +++ b/drivers/clocksource/dw_apb_timer_of.c
> @@ -14,7 +14,7 @@
>  #include 
>  #include 
>  
> -static void __init timer_get_base_and_rate(struct device_node *np,
> +static int __init timer_get_base_and_rate(struct device_node *np,
>   void __iomem **base, u32 *rate)
>  {
>   struct clk *timer_clk;
> @@ -47,65 +47,77 @@ static void __init timer_get_base_and_rate(struct 
> device_node *np,
>   np);
>  
>   timer_clk = of_clk_get_by_name(np, "timer");
> - if (IS_ERR(timer_clk))
> - goto try_clock_freq;
> + if (IS_ERR(timer_clk)) {
> + if (PTR_ERR(timer_clk) != -EPROBE_DEFER) {
> + pr_err("Failed to get clock for %pOF\n", np);
> + goto try_clock_freq;
> + }
> + return PTR_ERR(timer_clk);
> + }

May be massage the changes by moving the of-rate check first

if (!of_property_read_u32(np, "clock-freq", rate) &&
!of_property_read_u32(np, "clock-frequency", rate))
return 0;

timer_clk = of_clk_get_by_name(np, "timer");

/*
 * Whatever the result, we return
 */
ret = PTR_ERR(timer_clk);
if (ret)
return ret;

ret = clk_prepare_enable(timer_clk);
if (ret)
return ret;

*rate = clk_get_rate(timer_clk);
if (!(*rate))
return -EINVAL;

return 0;

>  
>   if (!clk_prepare_enable(timer_clk)) {
>   *rate = clk_get_rate(timer_clk);
> - return;
> + return 0;
>   }
>  
>  try_clock_freq:
>   if (of_property_read_u32(np, "clock-freq", rate) &&
>   of_property_read_u32(np, "clock-frequency", rate))
>   panic("No clock nor clock-frequency property for %pOFn", np);
>
> + return 0;
>  }
>  
> -static void __init add_clockevent(struct device_node *event_timer)
> +static int __init add_clockevent(struct device_node *event_timer)
>  {
>   void __iomem *iobase;
>   struct dw_apb_clock_event_device *ced;
>   u32 irq, rate;
> + int ret = 0;
>  
>   irq = irq_of_parse_and_map(event_timer, 0);
>   if (irq == 0)
>   panic("No IRQ for clock event timer");
>  
> - timer_get_base_and_rate(event_timer, , );
> -
> - ced = dw_apb_clockevent_init(-1, event_timer->name, 300, iobase, irq,
> + ret = timer_get_base_and_rate(event_timer, , );
> + if (ret == 0) {
> + ced = dw_apb_clockevent_init(-1, event_timer->name, 300, 
> iobase, irq,
>rate);
> - if (!ced)
> - panic("Unable to initialise clockevent device");
> + if (!ced)
> + panic("Unable to initialise clockevent device");

ret = timer_get_base_and_rate(event_timer, , );
if (ret)
return ret;

ced = dw_apb_clockevent_init(-1, event_timer->name, 300,
iobase, irq, rate);
if (!ced)
return -EINVAL;

dw_apb_clockevent_register(ced);

return 0;

> - dw_apb_clockevent_register(ced);
> + dw_apb_clockevent_register(ced);
> + }
> + return ret;
>  }
>  
>  static void __iomem *sched_io_base;
>  static u32 sched_rate;
>  
> -static void __init add_clocksource(struct device_node *source_timer)
> +static int __init add_clocksource(struct device_node *source_timer)
>  {
>   void __iomem *iobase;
>   struct dw_apb_clocksource *cs;
>   u32 rate;
> -
> - timer_get_base_and_rate(source_timer, , );
> -
> - cs = dw_apb_clocksource_init(300, source_timer->name, iobase, rate);
> - if (!cs)
> - panic("Unable to initialise clocksource device");
> -
> - dw_apb_clocksource_start(cs);
> - dw_apb_clocksource_register(cs);
> -
> - /*
> -  * Fallback to use the clocksource as sched_clock if no separate
> -  * timer is found.

[PATCH v2 2/2] platform/x86/drivers/acerhdf: Check the interval value when it is set

2020-12-02 Thread Daniel Lezcano

Currently the code checks the interval value when the temperature is
read which is bad for two reasons:

 - checking and setting the interval in the get_temp callback is
   inaccurate and awful, that can be done when changing the value.

 - Changing the thermal zone structure internals is an abuse of the
   exported structure, moreover no lock is taken here.

The goal of this patch is to solve the first item by using the 'set'
function called when changing the interval. The check is done there
and removed from the get_temp function. If the thermal zone was not
initialized yet, the interval is not updated in this case as that will
happen in the init function when registering the thermal zone device.

I don't have any hardware to test the changes.

Signed-off-by: Daniel Lezcano 
---
 V2:
   - Fixed static function annotation
---
 drivers/platform/x86/acerhdf.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 19fc8ff2225c..b6aa6e5514f4 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -334,7 +334,10 @@ static void acerhdf_check_param(struct thermal_zone_device 
*thermal)
}
if (verbose)
pr_notice("interval changed to: %d\n", interval);
-   thermal->polling_delay = interval*1000;
+
+   if (thermal)
+   thermal->polling_delay = interval*1000;
+
prev_interval = interval;
}
 }
@@ -349,8 +352,6 @@ static int acerhdf_get_ec_temp(struct thermal_zone_device 
*thermal, int *t)
 {
int temp, err = 0;
 
-   acerhdf_check_param(thermal);
-
err = acerhdf_get_temp();
if (err)
return err;
@@ -823,8 +824,21 @@ MODULE_ALIAS("dmi:*:*Acer*:pnExtensa*5420*:");
 module_init(acerhdf_init);
 module_exit(acerhdf_exit);
 
+static int interval_set_uint(const char *val, const struct kernel_param *kp)
+{
+   int ret;
+
+   ret = param_set_uint(val, kp);
+   if (ret)
+   return ret;
+
+   acerhdf_check_param(thz_dev);
+
+   return 0;
+}
+
 static const struct kernel_param_ops interval_ops = {
-   .set = param_set_uint,
+   .set = interval_set_uint,
.get = param_get_uint,
 };
 
-- 
2.25.1

[PATCH v2 1/2] platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval

2020-12-02 Thread Daniel Lezcano

The module parameter can be set by using ops to get and set the
values. The change will allow to check the correctness of the interval
value everytime it is changed instead of checking in the get_temp
function.

Signed-off-by: Daniel Lezcano 
---
 drivers/platform/x86/acerhdf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 44b6bfbd32df..19fc8ff2225c 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -84,8 +84,6 @@ static struct platform_device *acerhdf_dev;
 
 module_param(kernelmode, uint, 0);
 MODULE_PARM_DESC(kernelmode, "Kernel mode fan control on / off");
-module_param(interval, uint, 0600);
-MODULE_PARM_DESC(interval, "Polling interval of temperature check");
 module_param(fanon, uint, 0600);
 MODULE_PARM_DESC(fanon, "Turn the fan on above this temperature");
 module_param(fanoff, uint, 0600);
@@ -824,3 +822,11 @@ MODULE_ALIAS("dmi:*:*Acer*:pnExtensa*5420*:");
 
 module_init(acerhdf_init);
 module_exit(acerhdf_exit);
+
+static const struct kernel_param_ops interval_ops = {
+   .set = param_set_uint,
+   .get = param_get_uint,
+};
+
+module_param_cb(interval, _ops, , 0600);
+MODULE_PARM_DESC(interval, "Polling interval of temperature check");
-- 
2.25.1

[PATCH 1/2] platform/x86/drivers/acerhdf: Use module_param_cb to set/get polling interval

2020-12-02 Thread Daniel Lezcano

The module parameter can be set by using ops to get and set the
values. The change will allow to check the correctness of the interval
value everytime it is changed instead of checking in the get_temp
function.

Signed-off-by: Daniel Lezcano 
---
 drivers/platform/x86/acerhdf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 44b6bfbd32df..19fc8ff2225c 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -84,8 +84,6 @@ static struct platform_device *acerhdf_dev;
 
 module_param(kernelmode, uint, 0);
 MODULE_PARM_DESC(kernelmode, "Kernel mode fan control on / off");
-module_param(interval, uint, 0600);
-MODULE_PARM_DESC(interval, "Polling interval of temperature check");
 module_param(fanon, uint, 0600);
 MODULE_PARM_DESC(fanon, "Turn the fan on above this temperature");
 module_param(fanoff, uint, 0600);
@@ -824,3 +822,11 @@ MODULE_ALIAS("dmi:*:*Acer*:pnExtensa*5420*:");
 
 module_init(acerhdf_init);
 module_exit(acerhdf_exit);
+
+static const struct kernel_param_ops interval_ops = {
+   .set = param_set_uint,
+   .get = param_get_uint,
+};
+
+module_param_cb(interval, _ops, , 0600);
+MODULE_PARM_DESC(interval, "Polling interval of temperature check");
-- 
2.25.1

[PATCH 2/2] platform/x86/drivers/acerhdf: Check the interval value when it is set

2020-12-02 Thread Daniel Lezcano

Currently the code checks the interval value when the temperature is
read which is bad for two reasons:

 - checking and setting the interval in the get_temp callback is
   inaccurate and awful, that can be done when changing the value.

 - Changing the thermal zone structure internals is an abuse of the
   exported structure, moreover no lock is taken here.

The goal of this patch is to solve the first item by using the 'set'
function called when changing the interval. The check is done there
and removed from the get_temp function. If the thermal zone was not
initialized yet, the interval is not updated in this case as that will
happen in the init function when registering the thermal zone device.

I don't have any hardware to test the changes.

Signed-off-by: Daniel Lezcano 
---
 drivers/platform/x86/acerhdf.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c
index 19fc8ff2225c..084005841d56 100644
--- a/drivers/platform/x86/acerhdf.c
+++ b/drivers/platform/x86/acerhdf.c
@@ -334,7 +334,10 @@ static void acerhdf_check_param(struct thermal_zone_device 
*thermal)
}
if (verbose)
pr_notice("interval changed to: %d\n", interval);
-   thermal->polling_delay = interval*1000;
+
+   if (thermal)
+   thermal->polling_delay = interval*1000;
+
prev_interval = interval;
}
 }
@@ -349,8 +352,6 @@ static int acerhdf_get_ec_temp(struct thermal_zone_device 
*thermal, int *t)
 {
int temp, err = 0;
 
-   acerhdf_check_param(thermal);
-
err = acerhdf_get_temp();
if (err)
return err;
@@ -823,8 +824,19 @@ MODULE_ALIAS("dmi:*:*Acer*:pnExtensa*5420*:");
 module_init(acerhdf_init);
 module_exit(acerhdf_exit);
 
+int interval_set_uint(const char *val, const struct kernel_param *kp)
+{
+   ret = param_set_uint(val, kp);
+   if (ret)
+   return ret;
+
+   acerhdf_check_param(thz_dev);
+
+   return 0;
+}
+
 static const struct kernel_param_ops interval_ops = {
-   .set = param_set_uint,
+   .set = interval_set_uint,
.get = param_get_uint,
 };
 
-- 
2.25.1

Re: [PATCH v2 0/5] Thermal devfreq cooling improvements with Energy Model

2020-12-02 Thread Daniel Lezcano

On 18/11/2020 13:03, Lukasz Luba wrote:
> Hi all,
> 
> This patch set is a continuation of my previous work, which aimed
> to add Energy Model to all devices. This series is a follow up
> for the patches which got merged to v5.9-rc1. It aims to change
> the thermal devfreq cooling and use the Energy Model instead of
> private power table and structures. The new registration interface
> in the patch 3/5 helps to register devfreq cooling and the EM in one
> call. There is also another improvement, patch 2/5 is changing the
> way how thermal gets the device status. Now it's taken on demand
> and stored as a copy. The last patch wouldn't go through thermal tree,
> but it's here for consistency.

The patch 5/5 is reviewed by the maintainers. If they agree, I can apply
the patch with this series.


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

[PATCH 3/4] thermal/core: Use precomputed jiffies for the polling

2020-12-02 Thread Daniel Lezcano

The delays are also stored in jiffies based unit. Use them instead of
the ms.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 3111ca2c87a1..237480429ba9 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -291,14 +291,9 @@ static int __init thermal_register_governors(void)
 static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
int delay)
 {
-   if (delay > 1000)
+   if (delay)
mod_delayed_work(system_freezable_power_efficient_wq,
->poll_queue,
-round_jiffies(msecs_to_jiffies(delay)));
-   else if (delay)
-   mod_delayed_work(system_freezable_power_efficient_wq,
->poll_queue,
-msecs_to_jiffies(delay));
+>poll_queue, delay);
else
cancel_delayed_work(>poll_queue);
 }
@@ -317,9 +312,9 @@ static void monitor_thermal_zone(struct thermal_zone_device 
*tz)
mutex_lock(>lock);
 
if (!stop && tz->passive)
-   thermal_zone_device_set_polling(tz, tz->passive_delay_ms);
+   thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
else if (!stop && tz->polling_delay_ms)
-   thermal_zone_device_set_polling(tz, tz->polling_delay_ms);
+   thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
else
thermal_zone_device_set_polling(tz, 0);
 
-- 
2.17.1

[PATCH 4/4] thermal/core: Remove ms based delay fields

2020-12-02 Thread Daniel Lezcano

The code does no longer use the ms unit based fields to set the
delays as they are replaced by the jiffies.

Remove them.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.h  | 2 --
 drivers/thermal/thermal_sysfs.c | 2 +-
 include/linux/thermal.h | 7 ---
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 2c9551ed5ef8..5baa308ee7a5 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -131,7 +131,6 @@ void thermal_zone_set_trips(struct thermal_zone_device *tz);
 static inline void thermal_zone_set_passive_delay(
struct thermal_zone_device *tz, int delay_ms)
 {
-   tz->passive_delay_ms = delay_ms;
tz->passive_delay_jiffies = msecs_to_jiffies(delay_ms);
if (delay_ms > 1000)
tz->passive_delay_jiffies = 
round_jiffies(tz->passive_delay_jiffies);
@@ -140,7 +139,6 @@ static inline void thermal_zone_set_passive_delay(
 static inline void thermal_zone_set_polling_delay(
struct thermal_zone_device *tz, int delay_ms)
 {
-   tz->polling_delay_ms = delay_ms;
tz->polling_delay_jiffies = msecs_to_jiffies(delay_ms);
if (delay_ms > 1000)
tz->polling_delay_jiffies = 
round_jiffies(tz->polling_delay_jiffies);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 9598b288a0a1..8532b1dd0608 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -233,7 +233,7 @@ passive_store(struct device *dev, struct device_attribute 
*attr,
return -EINVAL;
 
if (state && !tz->forced_passive) {
-   if (!tz->passive_delay_ms)
+   if (!tz->passive_delay_jiffies)
thermal_zone_set_passive_delay(tz, 1000);
thermal_zone_device_rebind_exception(tz, "Processor",
 sizeof("Processor"));
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 5dd9bdb6c6ad..f23a388ded15 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -116,13 +116,8 @@ struct thermal_cooling_device {
  * @devdata:   private pointer for device private data
  * @trips: number of trip points the thermal zone supports
  * @trips_disabled;bitmap for disabled trips
- * @passive_delay_ms:  number of milliseconds to wait between polls when
- * performing passive cooling.
  * @passive_delay_jiffies: number of jiffies to wait between polls when
  * performing passive cooling.
- * @polling_delay_ms:  number of milliseconds to wait between polls when
- * checking whether trip points have been crossed (0 for
- * interrupt driven systems)
  * @polling_delay_jiffies: number of jiffies to wait between polls when
  * checking whether trip points have been crossed (0 for
  * interrupt driven systems)
@@ -164,8 +159,6 @@ struct thermal_zone_device {
void *devdata;
int trips;
unsigned long trips_disabled;   /* bitmap for disabled trips */
-   int passive_delay_ms;
-   int polling_delay_ms;
int passive_delay_jiffies;
int polling_delay_jiffies;
int temperature;
-- 
2.17.1

[PATCH 2/4] thermal/core: Precompute the jiffies

2020-12-02 Thread Daniel Lezcano

The delays are stored in ms units and when the polling function is
called this delay is converted into jiffies at each call.

Instead of doing the conversion again and again, compute the jiffies
at init time and use the value directly when setting the polling.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c  |  5 +++--
 drivers/thermal/thermal_core.h  | 18 ++
 drivers/thermal/thermal_sysfs.c |  4 ++--
 include/linux/thermal.h |  7 +++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 53f55ceca220..3111ca2c87a1 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1340,8 +1340,9 @@ thermal_zone_device_register(const char *type, int trips, 
int mask,
tz->device.class = _class;
tz->devdata = devdata;
tz->trips = trips;
-   tz->passive_delay_ms = passive_delay;
-   tz->polling_delay_ms = polling_delay;
+
+   thermal_zone_set_passive_delay(tz, passive_delay);
+   thermal_zone_set_polling_delay(tz, polling_delay);
 
/* sys I/F */
/* Add nodes that are always present via .groups */
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 8df600fa7b79..2c9551ed5ef8 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -128,6 +128,24 @@ int thermal_build_list_of_policies(char *buf);
 /* Helpers */
 void thermal_zone_set_trips(struct thermal_zone_device *tz);
 
+static inline void thermal_zone_set_passive_delay(
+   struct thermal_zone_device *tz, int delay_ms)
+{
+   tz->passive_delay_ms = delay_ms;
+   tz->passive_delay_jiffies = msecs_to_jiffies(delay_ms);
+   if (delay_ms > 1000)
+   tz->passive_delay_jiffies = 
round_jiffies(tz->passive_delay_jiffies);
+}
+
+static inline void thermal_zone_set_polling_delay(
+   struct thermal_zone_device *tz, int delay_ms)
+{
+   tz->polling_delay_ms = delay_ms;
+   tz->polling_delay_jiffies = msecs_to_jiffies(delay_ms);
+   if (delay_ms > 1000)
+   tz->polling_delay_jiffies = 
round_jiffies(tz->polling_delay_jiffies);
+}
+
 /* sysfs I/F */
 int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
 void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f465462d8aa1..9598b288a0a1 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -234,11 +234,11 @@ passive_store(struct device *dev, struct device_attribute 
*attr,
 
if (state && !tz->forced_passive) {
if (!tz->passive_delay_ms)
-   tz->passive_delay_ms = 1000;
+   thermal_zone_set_passive_delay(tz, 1000);
thermal_zone_device_rebind_exception(tz, "Processor",
 sizeof("Processor"));
} else if (!state && tz->forced_passive) {
-   tz->passive_delay_ms = 0;
+   thermal_zone_set_passive_delay(tz, 0);
thermal_zone_device_unbind_exception(tz, "Processor",
 sizeof("Processor"));
}
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 230d451bf335..5dd9bdb6c6ad 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -118,9 +118,14 @@ struct thermal_cooling_device {
  * @trips_disabled;bitmap for disabled trips
  * @passive_delay_ms:  number of milliseconds to wait between polls when
  * performing passive cooling.
+ * @passive_delay_jiffies: number of jiffies to wait between polls when
+ * performing passive cooling.
  * @polling_delay_ms:  number of milliseconds to wait between polls when
  * checking whether trip points have been crossed (0 for
  * interrupt driven systems)
+ * @polling_delay_jiffies: number of jiffies to wait between polls when
+ * checking whether trip points have been crossed (0 for
+ * interrupt driven systems)
  * @temperature:   current temperature.  This is only for core code,
  * drivers should use thermal_zone_get_temp() to get the
  * current temperature
@@ -161,6 +166,8 @@ struct thermal_zone_device {
unsigned long trips_disabled;   /* bitmap for disabled trips */
int passive_delay_ms;
int polling_delay_ms;
+   int passive_delay_jiffies;
+   int polling_delay_jiffies;
int temperature;
int last_temperature;
int emul_temperature;
-- 
2.17.1

[PATCH 1/4] thermal/core: Rename passive_delay and polling_delay with units

2020-12-02 Thread Daniel Lezcano

Set the scene to directly store the delays under their jiffies
form. Add to the variable name the 'ms' suffix.

Signed-off-by: Daniel Lezcano 
---
 drivers/thermal/thermal_core.c  | 10 +-
 drivers/thermal/thermal_of.c|  8 
 drivers/thermal/thermal_sysfs.c |  6 +++---
 include/linux/thermal.h |  8 
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 90e38cc199f4..53f55ceca220 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -317,9 +317,9 @@ static void monitor_thermal_zone(struct thermal_zone_device 
*tz)
mutex_lock(>lock);
 
if (!stop && tz->passive)
-   thermal_zone_device_set_polling(tz, tz->passive_delay);
-   else if (!stop && tz->polling_delay)
-   thermal_zone_device_set_polling(tz, tz->polling_delay);
+   thermal_zone_device_set_polling(tz, tz->passive_delay_ms);
+   else if (!stop && tz->polling_delay_ms)
+   thermal_zone_device_set_polling(tz, tz->polling_delay_ms);
else
thermal_zone_device_set_polling(tz, 0);
 
@@ -1340,8 +1340,8 @@ thermal_zone_device_register(const char *type, int trips, 
int mask,
tz->device.class = _class;
tz->devdata = devdata;
tz->trips = trips;
-   tz->passive_delay = passive_delay;
-   tz->polling_delay = polling_delay;
+   tz->passive_delay_ms = passive_delay;
+   tz->polling_delay_ms = polling_delay;
 
/* sys I/F */
/* Add nodes that are always present via .groups */
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 69ef12f852b7..ebec4a8a8b5a 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -865,14 +865,14 @@ __init *thermal_of_build_thermal_zone(struct device_node 
*np)
pr_err("%pOFn: missing polling-delay-passive property\n", np);
goto free_tz;
}
-   tz->passive_delay = prop;
+   tz->passive_delay_ms = prop;
 
ret = of_property_read_u32(np, "polling-delay", );
if (ret < 0) {
pr_err("%pOFn: missing polling-delay property\n", np);
goto free_tz;
}
-   tz->polling_delay = prop;
+   tz->polling_delay_ms = prop;
 
/*
 * REVIST: for now, the thermal framework supports only
@@ -1085,8 +1085,8 @@ int __init of_parse_thermal_zones(void)
zone = thermal_zone_device_register(child->name, tz->ntrips,
mask, tz,
ops, tzp,
-   tz->passive_delay,
-   tz->polling_delay);
+   tz->passive_delay_ms,
+   tz->polling_delay_ms);
if (IS_ERR(zone)) {
pr_err("Failed to build %pOFn zone %ld\n", child,
   PTR_ERR(zone));
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 0866e949339b..f465462d8aa1 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -233,12 +233,12 @@ passive_store(struct device *dev, struct device_attribute 
*attr,
return -EINVAL;
 
if (state && !tz->forced_passive) {
-   if (!tz->passive_delay)
-   tz->passive_delay = 1000;
+   if (!tz->passive_delay_ms)
+   tz->passive_delay_ms = 1000;
thermal_zone_device_rebind_exception(tz, "Processor",
 sizeof("Processor"));
} else if (!state && tz->forced_passive) {
-   tz->passive_delay = 0;
+   tz->passive_delay_ms = 0;
thermal_zone_device_unbind_exception(tz, "Processor",
 sizeof("Processor"));
}
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index d07ea27e72a9..230d451bf335 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -116,9 +116,9 @@ struct thermal_cooling_device {
  * @devdata:   private pointer for device private data
  * @trips: number of trip points the thermal zone supports
  * @trips_disabled;bitmap for disabled trips
- * @passive_delay: number of milliseconds to wait between polls when
+ * @passive_delay_ms:  number of milliseconds to wait between polls when
  * performing passive cooling.
- * @polling_delay: number of milliseconds to wa

[PATCH v4 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-12-01 Thread Daniel Lezcano

With the powercap dtpm controller, we are able to plug devices with
power limitation features in the tree.

The following patch introduces the CPU power limitation based on the
energy model and the performance states.

The power limitation is done at the performance domain level. If some
CPUs are unplugged, the corresponding power will be subtracted from
the performance domain total power.

It is up to the platform to initialize the dtpm tree and add the CPU.

Here is an example to create a simple tree with one root node called
"pkg" and the CPU's performance domains.

static int dtpm_register_pkg(struct dtpm_descr *descr)
{
struct dtpm *pkg;
int ret;

pkg = dtpm_alloc(NULL);
if (!pkg)
return -ENOMEM;

ret = dtpm_register(descr->name, pkg, descr->parent);
if (ret)
return ret;

return dtpm_register_cpu(pkg);
}

static struct dtpm_descr descr = {
.name = "pkg",
.init = dtpm_register_pkg,
};
DTPM_DECLARE(descr);

Cc: Thara Gopinath 
Cc: Lina Iyer 
Cc: Ram Chandrasekar 
Cc: Zhang Rui 
Signed-off-by: Daniel Lezcano 
---
 drivers/powercap/Kconfig|   7 +
 drivers/powercap/Makefile   |   1 +
 drivers/powercap/dtpm_cpu.c | 257 
 include/linux/cpuhotplug.h  |   1 +
 include/linux/dtpm.h|   2 +
 5 files changed, 268 insertions(+)
 create mode 100644 drivers/powercap/dtpm_cpu.c

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index cc1953bd8bed..20b4325c6161 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -49,4 +49,11 @@ config DTPM
help
  This enables support for the power capping for the dynamic
  thermal power management userspace engine.
+
+config DTPM_CPU
+   bool "Add CPU power capping based on the energy model"
+   depends on DTPM && ENERGY_MODEL
+   help
+ This enables support for CPU power limitation based on
+ energy model.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 6482ac52054d..fabcf388a8d3 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_DTPM) += dtpm.o
+obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
new file mode 100644
index ..6933c783c6b4
--- /dev/null
+++ b/drivers/powercap/dtpm_cpu.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The DTPM CPU is based on the energy model. It hooks the CPU in the
+ * DTPM tree which in turns update the power number by propagating the
+ * power number from the CPU energy model information to the parents.
+ *
+ * The association between the power and the performance state, allows
+ * to set the power of the CPU at the OPP granularity.
+ *
+ * The CPU hotplug is supported and the power numbers will be updated
+ * if a CPU is hot plugged / unplugged.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct dtpm *__parent;
+
+static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
+
+struct dtpm_cpu {
+   struct freq_qos_request qos_req;
+   int cpu;
+};
+
+/*
+ * When a new CPU is inserted at hotplug or boot time, add the power
+ * contribution and update the dtpm tree.
+ */
+static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min += dtpm->power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max += dtpm->power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+/*
+ * When a CPU is unplugged, remove its power contribution from the
+ * dtpm tree.
+ */
+static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min = dtpm->power_min - power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max = dtpm->power_max - power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
+{
+   struct dtpm_cpu *dtpm_cpu = dtpm->private;
+   struct em_perf_domain *pd;
+   struct cpumask cpus;
+   unsigned long freq;
+   u64 power;
+   int i, nr_cpus;
+
+   pd = em_cpu_get(dtpm_cpu->cpu);
+

[PATCH v4 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-12-01 Thread Daniel Lezcano

On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
GPU performance is capped while a big CPU is processing something
else. Or reduce the battery charging because the dissipated power is
too high compared with the power consumed by other devices.

The userspace is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.

These userspace daemons are in charge of the Dynamic Thermal Power
Management (DTPM).

Nowadays, the dtpm daemons are abusing the thermal framework as they
act on the cooling device state to force a specific and arbitrary
state without taking care of the governor decisions. Given the closed
loop of some governors that can confuse the logic or directly enter in
a decision conflict.

As the number of cooling device support is limited today to the CPU
and the GPU, the dtpm daemons have little control on the power
dissipation of the system. The out of tree solutions are hacking
around here and there in the drivers, in the frameworks to have
control on the devices. The common solution is to declare them as
cooling devices.

There is no unification of the power limitation unit, opaque states
are used.

This patch provides a way to create a hierarchy of constraints using
the powercap framework. The devices which are registered as power
limit-able devices are represented in this hierarchy as a tree. They
are linked together with intermediate nodes which are just there to
propagate the constraint to the children.

The leaves of the tree are the real devices, the intermediate nodes
are virtual, aggregating the children constraints and power
characteristics.

Each node have a weight on a 2^10 basis, in order to reflect the
percentage of power distribution of the children's node. This
percentage is used to dispatch the power limit to the children.

The weight is computed against the max power of the siblings.

This simple approach allows to do a fair distribution of the power
limit.

Cc: Thara Gopinath 
Cc: Lina Iyer 
Cc: Ram Chandrasekar 
Cc: Zhang Rui 
Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
---
 drivers/powercap/Kconfig  |   6 +
 drivers/powercap/Makefile |   1 +
 drivers/powercap/dtpm.c   | 470 ++
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/dtpm.h  |  75 +
 5 files changed, 563 insertions(+)
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 include/linux/dtpm.h

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index bc228725346b..cc1953bd8bed 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -43,4 +43,10 @@ config IDLE_INJECT
  CPUs for power capping. Idle period can be injected
  synchronously on a set of specified CPUs or alternatively
  on a per CPU basis.
+
+config DTPM
+   bool "Power capping for Dynamic Thermal Power Management"
+   help
+ This enables support for the power capping for the dynamic
+ thermal power management userspace engine.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 7255c94ec61c..6482ac52054d 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_DTPM) += dtpm.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c
new file mode 100644
index ..d8f431d87555
--- /dev/null
+++ b/drivers/powercap/dtpm.c
@@ -0,0 +1,470 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The powercap based Dynamic Thermal Power Management framework
+ * provides to the userspace a consistent API to set the power limit
+ * on some devices.
+ *
+ * DTPM defines the functions to create a tree of constraints. Each
+ * parent node is a virtual description of the aggregation of the
+ * children. It propagates the constraints set at its level to its
+ * children and collect the children power information. The leaves of
+ * the tree are the real devices which have the ability to get their
+ * current power consumption and set their power limit.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DTPM_POWER_LIMIT_FLAG BIT(0)
+
+static const char *constraint_name[] = {
+   "Instantaneous",
+};
+
+static DEFINE_MU

[PATCH v4 2/4] Documentation/powercap/dtpm: Add documentation for dtpm

2020-12-01 Thread Daniel Lezcano

The dynamic thermal and power management is a technique to dynamically
adjust the power consumption of different devices in order to ensure a
global thermal constraint.

An userspace daemon is usually monitoring the temperature and the
power to take immediate action on the device.

The DTPM framework provides an unified API to userspace to act on the
power.

Document this framework.

Cc: Thara Gopinath 
Cc: Lina Iyer 
Cc: Ram Chandrasekar 
Cc: Zhang Rui 
Cc: Jonathan Corbet 
Signed-off-by: Daniel Lezcano 
---
 Documentation/power/index.rst |   1 +
 Documentation/power/powercap/dtpm.rst | 210 ++
 2 files changed, 211 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst

diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst
index ced8a8007434..a0f5244fb427 100644
--- a/Documentation/power/index.rst
+++ b/Documentation/power/index.rst
@@ -30,6 +30,7 @@ Power Management
 userland-swsusp
 
 powercap/powercap
+powercap/dtpm
 
 regulator/consumer
 regulator/design
diff --git a/Documentation/power/powercap/dtpm.rst 
b/Documentation/power/powercap/dtpm.rst
new file mode 100644
index ..ca095ef4b887
--- /dev/null
+++ b/Documentation/power/powercap/dtpm.rst
@@ -0,0 +1,210 @@
+==
+Dynamic Thermal Power Management framework
+==
+
+On the embedded world, the complexity of the SoC leads to an
+increasing number of hotspots which need to be monitored and mitigated
+as a whole in order to prevent the temperature to go above the
+normative and legally stated 'skin temperature'.
+
+Another aspect is to sustain the performance for a given power budget,
+for example virtual reality where the user can feel dizziness if the
+performance is capped while a big CPU is processing something else. Or
+reduce the battery charging because the dissipated power is too high
+compared with the power consumed by other devices.
+
+The userspace is the most adequate place to dynamically act on the
+different devices by limiting their power given an application
+profile: it has the knowledge of the platform.
+
+The Dynamic Thermal Power Management (DTPM) is a technique acting on
+the device power by limiting and/or balancing a power budget among
+different devices.
+
+The DTPM framework provides an unified interface to act on the
+device power.
+
+Overview
+
+
+The DTPM framework relies on the powercap framework to create the
+powercap entries in the sysfs directory and implement the backend
+driver to do the connection with the power manageable device.
+
+The DTPM is a tree representation describing the power constraints
+shared between devices, not their physical positions.
+
+The nodes of the tree are a virtual description aggregating the power
+characteristics of the children nodes and their power limitations.
+
+The leaves of the tree are the real power manageable devices.
+
+For instance::
+
+  SoC
+   |
+   `-- pkg
+   |
+   |-- pd0 (cpu0-3)
+   |
+   `-- pd1 (cpu4-5)
+
+The pkg power will be the sum of pd0 and pd1 power numbers::
+
+  SoC (400mW - 3100mW)
+   |
+   `-- pkg (400mW - 3100mW)
+   |
+   |-- pd0 (100mW - 700mW)
+   |
+   `-- pd1 (300mW - 2400mW)
+
+When the nodes are inserted in the tree, their power characteristics are 
propagated to the parents::
+
+  SoC (600mW - 5900mW)
+   |
+   |-- pkg (400mW - 3100mW)
+   ||
+   ||-- pd0 (100mW - 700mW)
+   ||
+   |`-- pd1 (300mW - 2400mW)
+   |
+   `-- pd2 (200mW - 2800mW)
+
+Each node have a weight on a 2^10 basis reflecting the percentage of power 
consumption along the siblings::
+
+  SoC (w=1024)
+   |
+   |-- pkg (w=538)
+   ||
+   ||-- pd0 (w=231)
+   ||
+   |`-- pd1 (w=794)
+   |
+   `-- pd2 (w=486)
+
+   Note the sum of weights at the same level are equal to 1024.
+
+When a power limitation is applied to a node, then it is distributed along the 
children given their weights. For example, if we set a power limitation of 
3200mW at the 'SoC' root node, the resulting tree will be::
+
+  SoC (w=1024) <--- power_limit = 3200mW
+   |
+   |-- pkg (w=538) --> power_limit = 1681mW
+   ||
+   ||-- pd0 (w=231) --> power_limit = 378mW
+   ||
+   |`-- pd1 (w=794) --> power_limit = 1303mW
+   |
+   `-- pd2 (w=486) --> power_limit = 1519mW
+
+
+Flat description
+
+
+A root node is created and it is the parent of all the nodes. This
+description is the simplest one and it is supposed to give to
+userspace a flat representation of all the devices supporting the
+power limitation without any power limitation distribution.
+
+Hierarchical description
+
+
+The different devices supporting the power limitation are represented
+hierarchically. There is one root node, all intermediate nodes are
+grouping the child nodes which can be intermediate nodes also o

[PATCH V4 0/4] powercap/dtpm: Add the DTPM framework

2020-12-01 Thread Daniel Lezcano

The density of components greatly increased the last decade bringing a
numerous number of heating sources which are monitored by more than 20
sensors on recent SoC. The skin temperature, which is the case
temperature of the device, must stay below approximately 45°C in order
to comply with the legal requirements.

The skin temperature is managed as a whole by an user space daemon,
which is catching the current application profile, to allocate a power
budget to the different components where the resulting heating effect
will comply with the skin temperature constraint.

This technique is called the Dynamic Thermal Power Management.

The Linux kernel does not provide any unified interface to act on the
power of the different devices. Currently, the thermal framework is
changed to export artificially the performance states of different
devices via the cooling device software component with opaque values.
This change is done regardless of the in-kernel logic to mitigate the
temperature. The user space daemon uses all the available knobs to act
on the power limit and those differ from one platform to another.

This series provides a Dynamic Thermal Power Management framework to
provide an unified way to act on the power of the devices.

Changelog:
 V4:
  - Changed fine grain spinlocks by global tree mutex lock
- Dropped tested by tag from Lukasz
  - Fixed rollback routine in dtpm_cpu
  - Checked freq_qos_request_active() when releasing the dtpm_cpu node
 V3:
  - Fixed power-limit computation in addition with the hotplugging
  - Improved the encapsulation
  - Added specific ops for the leaves of the tree
  - Simplified API and self-encapsulation
  - Fixed documentation and generated it to check the content
 V2:
  - Fixed indentation
  - Fixed typos in comments
  - Fixed missing kfree for dtpm_cpu
  - Capitalize letters in the Kconfig description
  - Reduced name description
  - Stringified section name
  - Added more debug traces in the code
  - Removed duplicate initialization in the dtpm cpu

Daniel Lezcano (4):
  units: Add Watt units
  Documentation/powercap/dtpm: Add documentation for dtpm
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  powercap/drivers/dtpm: Add CPU energy model based support

 Documentation/power/index.rst |   1 +
 Documentation/power/powercap/dtpm.rst | 210 
 drivers/powercap/Kconfig  |  13 +
 drivers/powercap/Makefile |   2 +
 drivers/powercap/dtpm.c   | 470 ++
 drivers/powercap/dtpm_cpu.c   | 257 ++
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/cpuhotplug.h|   1 +
 include/linux/dtpm.h  |  77 +
 include/linux/units.h |   4 +
 10 files changed, 1046 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 drivers/powercap/dtpm_cpu.c
 create mode 100644 include/linux/dtpm.h

Cc: Thara Gopinath 
Cc: Lina Iyer 
Cc: Ram Chandrasekar 
Cc: Zhang Rui 
Cc: Lukasz Luba 

--
2.17.1

[PATCH v4 1/4] units: Add Watt units

2020-12-01 Thread Daniel Lezcano

As there are the temperature units, let's add the Watt macros definition.

Cc: Thara Gopinath 
Cc: Lina Iyer 
Cc: Ram Chandrasekar 
Cc: Zhang Rui 
Cc: Lukasz Luba 
Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
---
 include/linux/units.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/units.h b/include/linux/units.h
index aaf716364ec3..92c234e71cab 100644
--- a/include/linux/units.h
+++ b/include/linux/units.h
@@ -4,6 +4,10 @@

 #include 

+#define MILLIWATT_PER_WATT 1000L
+#define MICROWATT_PER_MILLIWATT1000L
+#define MICROWATT_PER_WATT 100L
+
 #define ABSOLUTE_ZERO_MILLICELSIUS -273150

 static inline long milli_kelvin_to_millicelsius(long t)
--
2.17.1

Re: [RFC 1/2] dt-bindings: thermal: sprd: Add virtual thermal documentation

2020-11-30 Thread Daniel Lezcano

On 30/11/2020 10:03, gao yunxiao wrote:
> Hi Daniel
> 
> Thank you for your the new information
> 
> I have a question trouble to you
> We should choose which per-core thermal zone as the IPA's input
> reference temperature in the current kernel version? thank you.

Can you give a pointer to a DT describing your hardware ?



> On 27/11/2020, Lukasz Luba  wrote:
>>
>>
>> On 11/27/20 1:26 PM, Daniel Lezcano wrote:
>>>
>>> Hi Lukasz,
>>>
>>> On 27/11/2020 10:27, Lukasz Luba wrote:
>>>>
>>>>
>>>> On 11/27/20 8:35 AM, gao.yunxi...@gmail.com wrote:
>>>>> From: "jeson.gao" 
>>>>>
>>>>> virtual thermal node definition description in dts file
>>>>>
>>>>> Signed-off-by: jeson.gao 
>>>>> ---
>>>
>>> [ ... ]
>>>
>>>> It's coming back. There were attempts to solve this problem.
>>>> Javi tried to solved this using hierarchical thermal zones [1].
>>>> It was even agreed (IIRC during LPC) but couldn't continue. Then Eduardo
>>>> was going to continue this (last message at [3]). Unfortunately,
>>>> development stopped.
>>>>
>>>> I also have out-of-tree similar implementation for my Odroid-xu4,
>>>> which does no have an 'SoC' sensor, but have CPU sensors and needs
>>>> some aggregation function to get temperature.
>>>>
>>>> I can pick up Javi's patches and continue 'hierarchical thermal zones'
>>>> approach.
>>>>
>>>> Javi, Daniel, Rui what do you think?
>>>
>>> I already worked on the hierarchical thermal zones and my opinion is
>>> that fits not really well.
>>>
>>> We want to define a new feature because the thermal framework is built
>>> on the 1:1 relationship between a governor and a thermal zone.
>>>
>>> Practically speaking, we want to mitigate two thermal zones from one
>>> governor, especially here the IPA governor.
>>>
>>> The DTPM framework is being implemented to solve that by providing an
>>> automatic power rebalancing between the power manageable capable devices.
>>>
>>> In our case, the IPA would stick on the 'sustainable-power' resulting on
>>> the aggregation of the two performance domains and set the power limit
>>> on the parent node. The automatic power rebalancing will ensure maximum
>>> throughput between the two performance domains instead of capping the
>>> whole.
>>>
>>>
>>
>> Make sense. Thank you for sharing valuable opinion.
>>
>> Regards,
>> Lukasz
>>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH 1/3] thermal: core: Add indication for userspace usage

2020-11-29 Thread Daniel Lezcano



[Added Srinivas]

On 28/11/2020 18:54, Kai-Heng Feng wrote:
> We are seeing thermal shutdown on Intel based mobile workstations, the
> shutdown happens during the first trip handle in
> thermal_zone_device_register():
> kernel: thermal thermal_zone15: critical temperature reached (101 C), 
> shutting down
> 
> However, we shouldn't do a thermal shutdown here, since
> 1) We may want to use a dedicated daemon, Intel's thermald in this case,
> to handle thermal shutdown.
> 
> 2) For ACPI based system, _CRT doesn't mean shutdown unless it's inside
> ThermalZone. ACPI Spec, 11.4.4 _CRT (Critical Temperature):
> "... If this object it present under a device, the device’s driver
> evaluates this object to determine the device’s critical cooling
> temperature trip point. This value may then be used by the device’s
> driver to program an internal device temperature sensor trip point."
> 
> So a "critical trip" here merely means we should take a more aggressive
> cooling method.

Well, actually it is stated before:

"This object, when defined under a thermal zone, returns the critical
temperature at which OSPM must shutdown the system".

That is what does the thermal subsystem, no ?

> So add an indication to let thermal core know it should leave thermal
> device to userspace to handle.

You may want to check the 'HOT' trip point and then use the notification
mechanism to get notified in userspace and take action from there (eg.
offline some CPUs).

> Signed-off-by: Kai-Heng Feng 
> ---
>  drivers/thermal/thermal_core.c | 3 +++
>  include/linux/thermal.h| 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index c6d74bc1c90b..6561e3767529 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1477,6 +1477,9 @@ thermal_zone_device_register(const char *type, int 
> trips, int mask,
>   goto unregister;
>   }
>  
> + if (tz->tzp && tz->tzp->userspace)
> + thermal_zone_device_disable(tz);
> +
>   mutex_lock(_list_lock);
>   list_add_tail(>node, _tz_list);
>   mutex_unlock(_list_lock);
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index d07ea27e72a9..e8e8fac78fc8 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -247,6 +247,8 @@ struct thermal_zone_params {
>*/
>   bool no_hwmon;
>  
> + bool userspace;
> +
>   int num_tbps;   /* Number of tbp entries */
>   struct thermal_bind_params *tbp;
>  
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [RFC 1/2] dt-bindings: thermal: sprd: Add virtual thermal documentation

2020-11-27 Thread Daniel Lezcano

Hi Lukasz,

On 27/11/2020 10:27, Lukasz Luba wrote:
> 
> 
> On 11/27/20 8:35 AM, gao.yunxi...@gmail.com wrote:
>> From: "jeson.gao" 
>>
>> virtual thermal node definition description in dts file
>>
>> Signed-off-by: jeson.gao 
>> ---

[ ... ]

> It's coming back. There were attempts to solve this problem.
> Javi tried to solved this using hierarchical thermal zones [1].
> It was even agreed (IIRC during LPC) but couldn't continue. Then Eduardo
> was going to continue this (last message at [3]). Unfortunately,
> development stopped.
> 
> I also have out-of-tree similar implementation for my Odroid-xu4,
> which does no have an 'SoC' sensor, but have CPU sensors and needs
> some aggregation function to get temperature.
> 
> I can pick up Javi's patches and continue 'hierarchical thermal zones'
> approach.
> 
> Javi, Daniel, Rui what do you think?

I already worked on the hierarchical thermal zones and my opinion is
that fits not really well.

We want to define a new feature because the thermal framework is built
on the 1:1 relationship between a governor and a thermal zone.

Practically speaking, we want to mitigate two thermal zones from one
governor, especially here the IPA governor.

The DTPM framework is being implemented to solve that by providing an
automatic power rebalancing between the power manageable capable devices.

In our case, the IPA would stick on the 'sustainable-power' resulting on
the aggregation of the two performance domains and set the power limit
on the parent node. The automatic power rebalancing will ensure maximum
throughput between the two performance domains instead of capping the whole.

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v3 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-26 Thread Daniel Lezcano



Hi Lukasz,


On 26/11/2020 11:06, Lukasz Luba wrote:
> Hi Daniel,
> 
> On 11/23/20 9:42 PM, Daniel Lezcano wrote:
>> With the powercap dtpm controller, we are able to plug devices with
>> power limitation features in the tree.
>>
> 
> [snip]
> 
>> +
>> +static void pd_release(struct dtpm *dtpm)
>> +{
>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>> +
> 
> Maybe it's worth to add:
> --->8
> if (freq_qos_request_active(_cpu->qos_req))
> freq_qos_remove_request(_cpu->qos_req);
> ---8<---
> 
> If we are trying to unregister dtpm in error path due to freq_qos
> registration failure, a warning would be emitted from freq_qos.

Ah yes, good point.

>> +    freq_qos_remove_request(_cpu->qos_req);
>> +    kfree(dtpm_cpu);
>> +}
> 
> [snip]
> 
>> +
>> +static int cpuhp_dtpm_cpu_online(unsigned int cpu)
>> +{
>> +    struct dtpm *dtpm;
>> +    struct dtpm_cpu *dtpm_cpu;
>> +    struct cpufreq_policy *policy;
>> +    struct em_perf_domain *pd;
>> +    char name[CPUFREQ_NAME_LEN];
>> +    int ret;
>> +
>> +    policy = cpufreq_cpu_get(cpu);
>> +
>> +    if (!policy)
>> +    return 0;
>> +
>> +    pd = em_cpu_get(cpu);
>> +    if (!pd)
>> +    return -EINVAL;
>> +
>> +    dtpm = per_cpu(dtpm_per_cpu, cpu);
>> +    if (dtpm)
>> +    return power_add(dtpm, pd);
>> +
>> +    dtpm = dtpm_alloc(_ops);
>> +    if (!dtpm)
>> +    return -EINVAL;
>> +
>> +    dtpm_cpu = kzalloc(sizeof(dtpm_cpu), GFP_KERNEL);
>> +    if (!dtpm_cpu) {
>> +    kfree(dtpm);
>> +    return -ENOMEM;
>> +    }
>> +
>> +    dtpm->private = dtpm_cpu;
>> +    dtpm_cpu->cpu = cpu;
>> +
>> +    for_each_cpu(cpu, policy->related_cpus)
>> +    per_cpu(dtpm_per_cpu, cpu) = dtpm;
>> +
>> +    sprintf(name, "cpu%d", dtpm_cpu->cpu);
>> +
>> +    ret = dtpm_register(name, dtpm, __parent);
>> +    if (ret)
>> +    goto out_kfree_dtpm_cpu;
>> +
>> +    ret = power_add(dtpm, pd);
>> +    if (ret)
>> +    goto out_power_sub;
> 
> Shouldn't we call dtpm_unregister() instead?
> The dtpm_unregister() would remove the zone, which IIUC we
> are currently missing.
> 
>> +
>> +    ret = freq_qos_add_request(>constraints,
>> +   _cpu->qos_req, FREQ_QOS_MAX,
>> +   pd->table[pd->nr_perf_states - 1].frequency);
>> +    if (ret)
>> +    goto out_dtpm_unregister;
> 
> Could this trigger different steps, starting from out_power_sub_v2
> below?
> 
>> +
>> +    return 0;
>> +
>> +out_dtpm_unregister:
>> +    dtpm_unregister(dtpm);
>> +    dtpm_cpu = NULL; /* Already freed by the release ops */
>> +out_power_sub:
>> +    power_sub(dtpm, pd);
> 
> I would change the order of these two above into something like:

Ok, I'll revisit the rollback routine.

> out_power_sub_v2:
> power_sub(dtpm, pd);
> out_dtpm_unregister_v2:
> dtpm_unregister(dtpm);
> dtpm_cpu = NULL;
> 
>> +out_kfree_dtpm_cpu:
>> +    for_each_cpu(cpu, policy->related_cpus)
>> +    per_cpu(dtpm_per_cpu, cpu) = NULL;
>> +    kfree(dtpm_cpu);
>> +
>> +    return ret;
>> +}
> 
> IIUC power_sub() would decrement the power and set it to 0 for that
> dtmp, then the dtpm_unregister() would also try to decrement the power,
> but by the value of 0. So it should be safe.

Right.


Thanks for the review


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH v4 0/3] Improve the estimations in Intelligent Power Allocation

2020-11-26 Thread Daniel Lezcano

On 26/11/2020 15:02, Lukasz Luba wrote:

[ ... ]

>>> changed via sysfs.
>>>
>>> Could you take it please? It should apply smoothly in your tree.
>>
>> Actually, I'm waiting for Ionela and Dietmar ack.
>>
>>
> 
> Are they maintainers of this file that you need their ACKs?
> Maybe I should drop mine then.

Ok let me clarify :)

In general when someone comments on the changes, I usually wait for the
consensus before merging the patches. If the persons who commented
before are unresponsive, depending on the context and my perception, I
apply the changes or not.

I'm giving the opportunity to Ionela to review the series again as she
commented the previous version (and gave a reviewed-by). I thought also
Dietmar commented the series but apparently I was wrong.

As you stated, you are the maintainer of the subsystem, so if there are
no acked-by or reviewed-by, the series will be applied anyway soon.

Meanwhile, they are in the 'testing' branch of the tree, the antechamber
of 'linux-next' and 'next' ;)

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v4 0/3] Improve the estimations in Intelligent Power Allocation

2020-11-26 Thread Daniel Lezcano

On 26/11/2020 13:49, Lukasz Luba wrote:
> Hi Daniel,
> 
> On 11/24/20 4:10 PM, Lukasz Luba wrote:
>> Hi all,
>>
>> The Intelligent Power Allocation (IPA) estimates the needed
>> coefficients for
>> internal algorithm. It can also estimate the sustainable power value
>> when the
>> DT has not provided one. Fix the 'k_i' coefficient which might be to big
>> related to the other values, when the sustainable power is in an abstract
>> scale. Do the estimation of sustainable power only once and avoid
>> expensive
>> calculation every time the IPA is called. Do the estimation of PID
>> constants
>> when there was user update via sysfs to sustainable power.
>>
>> The patch set should apply on top next-20201124
>>
>> Changes:
>> v4:
>> - added new function get_sustainable_power() which handles use cases
>>    when the value should be estimated again or simply returned
>> - added sustainable_power in the power_allocator_params to track if there
>>    was a change to sustainable_power by the user via sysfs
>> - addressed Daniel's comments that sustainable power set via sysfs should
>>    trigger PID coefficients estimation
>> - removed 'force' argument from estimate_pid_constants() and make it
>> ready
>>    for updates due to new value for sust. power from sysfs
>> - abandoned the design from v3 with a single function responsible for
>>    estimation both sust. power and PID const. requested by Ionela
>> v3 [1]:
>> - changed estimate_pid_constants to estimate_tzp_constants and related
>> comments
>> - estimate the PID coefficients always together with sust. power
>> - added print indicating that we are estimating sust. power and PID
>> const.
>> - don't use local variable 'sustainable_power'
>>
>> Regards,
>> Lukasz Luba
>>
>> [1]
>> https://lore.kernel.org/lkml/20201009135850.14727-1-lukasz.l...@arm.com/
>>
>> Lukasz Luba (3):
>>    thermal: power allocator: change the 'k_i' coefficient estimation
>>    thermal: power allocator: refactor sustainable power estimation
>>    thermal: power allocator: change the 'k_*' always in
>>  estimate_pid_constants()
>>
>>   drivers/thermal/gov_power_allocator.c | 76 +--
>>   1 file changed, 49 insertions(+), 27 deletions(-)
>>
> 
> Gentle ping. This is a self contained change to only power allocator
> file. It addresses also your requirement regarding sustainable_power
> changed via sysfs.
> 
> Could you take it please? It should apply smoothly in your tree.

Actually, I'm waiting for Ionela and Dietmar ack.


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: amlogic_thermal: Add hwmon support

2020-11-25 Thread Daniel Lezcano

Hi

Thanks for your patch but exactly the same patch was submitted and
merged [1]

  -- Daniel

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=thermal/next=cb68a8580e2086fad38597af4c60d39de8df0cde

On 25/11/2020 17:24, Dongjin Kim wrote:
> Expose Amlogic thermal as HWMON devices.
> 
>   $ sensors
>   cpu_thermal-virtual-0
>   Adapter: Virtual device
>   temp1:+32.2 C  (crit = +110.0 C)
> 
>   ddr_thermal-virtual-0
>   Adapter: Virtual device
>   temp1:+33.4 C  (crit = +110.0 C)
> 
> Signed-off-by: Dongjin Kim 
> ---
>  drivers/thermal/amlogic_thermal.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/thermal/amlogic_thermal.c 
> b/drivers/thermal/amlogic_thermal.c
> index ccb1fe18e993..2fce96c32586 100644
> --- a/drivers/thermal/amlogic_thermal.c
> +++ b/drivers/thermal/amlogic_thermal.c
> @@ -29,6 +29,7 @@
>  #include 
>  
>  #include "thermal_core.h"
> +#include "thermal_hwmon.h"
>  
>  #define TSENSOR_CFG_REG1 0x4
>   #define TSENSOR_CFG_REG1_RSET_VBG   BIT(12)
> @@ -291,6 +292,9 @@ static int amlogic_thermal_probe(struct platform_device 
> *pdev)
>   if (ret)
>   return ret;
>  
> + if (devm_thermal_add_hwmon_sysfs(pdata->tzd))
> + dev_warn(>dev, "failed to add hwmon sysfs attributes\n");
> +
>   ret = amlogic_thermal_enable(pdata);
>  
>   return ret;
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

[PATCH] clocksource/drivers/ingenic: Fix section mismatch

2020-11-25 Thread Daniel Lezcano

The function ingenic_tcu_get_clock() is annotated for the __init
section but it is actually called from the online cpu callback.

That will lead to a crash if a CPU is hotplugged after boot time.

Remove the __init annotatation for the ingenic_tcu_get_clock()
function.

Fixes: f19d838d08fc (clocksource/drivers/ingenic: Add high resolution timer 
support for SMP/SMT)
Reported-by: kernel test robot 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/ingenic-timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/ingenic-timer.c 
b/drivers/clocksource/ingenic-timer.c
index 58fd9189fab7..905fd6b163a8 100644
--- a/drivers/clocksource/ingenic-timer.c
+++ b/drivers/clocksource/ingenic-timer.c
@@ -127,7 +127,7 @@ static irqreturn_t ingenic_tcu_cevt_cb(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static struct clk * __init ingenic_tcu_get_clock(struct device_node *np, int 
id)
+static struct clk *ingenic_tcu_get_clock(struct device_node *np, int id)
 {
struct of_phandle_args args;
 
-- 
2.25.1

[PATCH V2 0/4] powercap/dtpm: Add the DTPM framework

2020-11-23 Thread Daniel Lezcano

The density of components greatly increased the last decade bringing a
numerous number of heating sources which are monitored by more than 20
sensors on recent SoC. The skin temperature, which is the case
temperature of the device, must stay below approximately 45°C in order
to comply with the legal requirements.

The skin temperature is managed as a whole by an user space daemon,
which is catching the current application profile, to allocate a power
budget to the different components where the resulting heating effect
will comply with the skin temperature constraint.

This technique is called the Dynamic Thermal Power Management.

The Linux kernel does not provide any unified interface to act on the
power of the different devices. Currently, the thermal framework is
changed to export artificially the performance states of different
devices via the cooling device software component with opaque values.
This change is done regardless of the in-kernel logic to mitigate the
temperature. The user space daemon uses all the available knobs to act
on the power limit and those differ from one platform to another.

This series provides a Dynamic Thermal Power Management framework to
provide an unified way to act on the power of the devices.

Changelog:
 V3:
  - Fixed power-limit computation in addition with the hotplugging
  - Improved the encapsulation
  - Added specific ops for the leaves of the tree
  - Simplified API and self-encapsulation
  - Fixed documentation and generated it to check the content
 V2:
  - Fixed indentation
  - Fixed typos in comments
  - Fixed missing kfree for dtpm_cpu
  - Capitalize letters in the Kconfig description
  - Reduced name description
  - Stringified section name
  - Added more debug traces in the code
  - Removed duplicate initialization in the dtpm cpu

Daniel Lezcano (4):
  units: Add Watt units
  Documentation/powercap/dtpm: Add documentation for dtpm
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  powercap/drivers/dtpm: Add CPU energy model based support

 Documentation/power/index.rst |   1 +
 Documentation/power/powercap/dtpm.rst | 210 
 drivers/powercap/Kconfig  |  13 +
 drivers/powercap/Makefile |   2 +
 drivers/powercap/dtpm.c   | 441 ++
 drivers/powercap/dtpm_cpu.c   | 252 +++
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/cpuhotplug.h|   1 +
 include/linux/dtpm.h  |  78 +
 include/linux/units.h |   4 +
 10 files changed, 1013 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 drivers/powercap/dtpm_cpu.c
 create mode 100644 include/linux/dtpm.h

--
2.17.1

[PATCH v3 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-23 Thread Daniel Lezcano

With the powercap dtpm controller, we are able to plug devices with
power limitation features in the tree.

The following patch introduces the CPU power limitation based on the
energy model and the performance states.

The power limitation is done at the performance domain level. If some
CPUs are unplugged, the corresponding power will be subtracted from
the performance domain total power.

It is up to the platform to initialize the dtpm tree and add the CPU.

Here is an example to create a simple tree with one root node called
"pkg" and the CPU's performance domains.

static int dtpm_register_pkg(struct dtpm_descr *descr)
{
struct dtpm *pkg;
int ret;

pkg = dtpm_alloc(NULL);
if (!pkg)
return -ENOMEM;

ret = dtpm_register(descr->name, pkg, descr->parent);
if (ret)
return ret;

return dtpm_register_cpu(pkg);
}

static struct dtpm_descr descr = {
.name = "pkg",
.init = dtpm_register_pkg,
};
DTPM_DECLARE(descr);

Signed-off-by: Daniel Lezcano 
---
 drivers/powercap/Kconfig|   7 +
 drivers/powercap/Makefile   |   1 +
 drivers/powercap/dtpm_cpu.c | 252 
 include/linux/cpuhotplug.h  |   1 +
 include/linux/dtpm.h|   2 +
 5 files changed, 263 insertions(+)
 create mode 100644 drivers/powercap/dtpm_cpu.c

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index cc1953bd8bed..20b4325c6161 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -49,4 +49,11 @@ config DTPM
help
  This enables support for the power capping for the dynamic
  thermal power management userspace engine.
+
+config DTPM_CPU
+   bool "Add CPU power capping based on the energy model"
+   depends on DTPM && ENERGY_MODEL
+   help
+ This enables support for CPU power limitation based on
+ energy model.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 6482ac52054d..fabcf388a8d3 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_DTPM) += dtpm.o
+obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
new file mode 100644
index ..81f0996dd8ad
--- /dev/null
+++ b/drivers/powercap/dtpm_cpu.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The DTPM CPU is based on the energy model. It hooks the CPU in the
+ * DTPM tree which in turns update the power number by propagating the
+ * power number from the CPU energy model information to the parents.
+ *
+ * The association between the power and the performance state, allows
+ * to set the power of the CPU at the OPP granularity.
+ *
+ * The CPU hotplug is supported and the power numbers will be updated
+ * if a CPU is hot plugged / unplugged.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct dtpm *__parent;
+
+static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
+
+struct dtpm_cpu {
+   struct freq_qos_request qos_req;
+   int cpu;
+};
+
+/*
+ * When a new CPU is inserted at hotplug or boot time, add the power
+ * contribution and update the dtpm tree.
+ */
+static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min += dtpm->power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max += dtpm->power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+/*
+ * When a CPU is unplugged, remove its power contribution from the
+ * dtpm tree.
+ */
+static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min = dtpm->power_min - power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max = dtpm->power_max - power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
+{
+   struct dtpm_cpu *dtpm_cpu = dtpm->private;
+   struct em_perf_domain *pd;
+   struct cpumask cpus;
+   unsigned long freq;
+   u64 power;
+   int i, nr_cpus;
+
+   pd = em_cpu_get(dtpm_cpu->cpu);
+
+   cpumask_and(, cpu_online_mask, to_cpumask(pd->cpus));
+

[PATCH v3 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-11-23 Thread Daniel Lezcano

On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
GPU performance is capped while a big CPU is processing something
else. Or reduce the battery charging because the dissipated power is
too high compared with the power consumed by other devices.

The userspace is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.

These userspace daemons are in charge of the Dynamic Thermal Power
Management (DTPM).

Nowadays, the dtpm daemons are abusing the thermal framework as they
act on the cooling device state to force a specific and arbitrary
state without taking care of the governor decisions. Given the closed
loop of some governors that can confuse the logic or directly enter in
a decision conflict.

As the number of cooling device support is limited today to the CPU
and the GPU, the dtpm daemons have little control on the power
dissipation of the system. The out of tree solutions are hacking
around here and there in the drivers, in the frameworks to have
control on the devices. The common solution is to declare them as
cooling devices.

There is no unification of the power limitation unit, opaque states
are used.

This patch provides a way to create a hierarchy of constraints using
the powercap framework. The devices which are registered as power
limit-able devices are represented in this hierarchy as a tree. They
are linked together with intermediate nodes which are just there to
propagate the constraint to the children.

The leaves of the tree are the real devices, the intermediate nodes
are virtual, aggregating the children constraints and power
characteristics.

Each node have a weight on a 2^10 basis, in order to reflect the
percentage of power distribution of the children's node. This
percentage is used to dispatch the power limit to the children.

The weight is computed against the max power of the siblings.

This simple approach allows to do a fair distribution of the power
limit.

Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
Tested-by: Lukasz Luba 
---
 drivers/powercap/Kconfig  |   6 +
 drivers/powercap/Makefile |   1 +
 drivers/powercap/dtpm.c   | 441 ++
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/dtpm.h  |  76 +
 5 files changed, 535 insertions(+)
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 include/linux/dtpm.h

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index bc228725346b..cc1953bd8bed 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -43,4 +43,10 @@ config IDLE_INJECT
  CPUs for power capping. Idle period can be injected
  synchronously on a set of specified CPUs or alternatively
  on a per CPU basis.
+
+config DTPM
+   bool "Power capping for Dynamic Thermal Power Management"
+   help
+ This enables support for the power capping for the dynamic
+ thermal power management userspace engine.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 7255c94ec61c..6482ac52054d 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_DTPM) += dtpm.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c
new file mode 100644
index ..a4d1f6fd6c0a
--- /dev/null
+++ b/drivers/powercap/dtpm.c
@@ -0,0 +1,441 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The powercap based Dynamic Thermal Power Management framework
+ * provides to the userspace a consistent API to set the power limit
+ * on some devices.
+ *
+ * DTPM defines the functions to create a tree of constraints. Each
+ * parent node is a virtual description of the aggregation of the
+ * children. It propagates the constraints set at its level to its
+ * children and collect the children power information. The leaves of
+ * the tree are the real devices which have the ability to get their
+ * current power consumption and set their power limit.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DTPM_POWER_LIMIT_FLAG BIT(0)
+
+static const char *constraint_name[] = {
+   "Instantaneous",
+};
+
+static struct powercap_control_type *pct;
+static struct dtpm *root;
+

[PATCH v3 2/4] Documentation/powercap/dtpm: Add documentation for dtpm

2020-11-23 Thread Daniel Lezcano

The dynamic thermal and power management is a technique to dynamically
adjust the power consumption of different devices in order to ensure a
global thermal constraint.

An userspace daemon is usually monitoring the temperature and the
power to take immediate action on the device.

The DTPM framework provides an unified API to userspace to act on the
power.

Document this framework.

Signed-off-by: Daniel Lezcano 
---
 Documentation/power/index.rst |   1 +
 Documentation/power/powercap/dtpm.rst | 210 ++
 2 files changed, 211 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst

diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst
index ced8a8007434..a0f5244fb427 100644
--- a/Documentation/power/index.rst
+++ b/Documentation/power/index.rst
@@ -30,6 +30,7 @@ Power Management
 userland-swsusp
 
 powercap/powercap
+powercap/dtpm
 
 regulator/consumer
 regulator/design
diff --git a/Documentation/power/powercap/dtpm.rst 
b/Documentation/power/powercap/dtpm.rst
new file mode 100644
index ..ca095ef4b887
--- /dev/null
+++ b/Documentation/power/powercap/dtpm.rst
@@ -0,0 +1,210 @@
+==
+Dynamic Thermal Power Management framework
+==
+
+On the embedded world, the complexity of the SoC leads to an
+increasing number of hotspots which need to be monitored and mitigated
+as a whole in order to prevent the temperature to go above the
+normative and legally stated 'skin temperature'.
+
+Another aspect is to sustain the performance for a given power budget,
+for example virtual reality where the user can feel dizziness if the
+performance is capped while a big CPU is processing something else. Or
+reduce the battery charging because the dissipated power is too high
+compared with the power consumed by other devices.
+
+The userspace is the most adequate place to dynamically act on the
+different devices by limiting their power given an application
+profile: it has the knowledge of the platform.
+
+The Dynamic Thermal Power Management (DTPM) is a technique acting on
+the device power by limiting and/or balancing a power budget among
+different devices.
+
+The DTPM framework provides an unified interface to act on the
+device power.
+
+Overview
+
+
+The DTPM framework relies on the powercap framework to create the
+powercap entries in the sysfs directory and implement the backend
+driver to do the connection with the power manageable device.
+
+The DTPM is a tree representation describing the power constraints
+shared between devices, not their physical positions.
+
+The nodes of the tree are a virtual description aggregating the power
+characteristics of the children nodes and their power limitations.
+
+The leaves of the tree are the real power manageable devices.
+
+For instance::
+
+  SoC
+   |
+   `-- pkg
+   |
+   |-- pd0 (cpu0-3)
+   |
+   `-- pd1 (cpu4-5)
+
+The pkg power will be the sum of pd0 and pd1 power numbers::
+
+  SoC (400mW - 3100mW)
+   |
+   `-- pkg (400mW - 3100mW)
+   |
+   |-- pd0 (100mW - 700mW)
+   |
+   `-- pd1 (300mW - 2400mW)
+
+When the nodes are inserted in the tree, their power characteristics are 
propagated to the parents::
+
+  SoC (600mW - 5900mW)
+   |
+   |-- pkg (400mW - 3100mW)
+   ||
+   ||-- pd0 (100mW - 700mW)
+   ||
+   |`-- pd1 (300mW - 2400mW)
+   |
+   `-- pd2 (200mW - 2800mW)
+
+Each node have a weight on a 2^10 basis reflecting the percentage of power 
consumption along the siblings::
+
+  SoC (w=1024)
+   |
+   |-- pkg (w=538)
+   ||
+   ||-- pd0 (w=231)
+   ||
+   |`-- pd1 (w=794)
+   |
+   `-- pd2 (w=486)
+
+   Note the sum of weights at the same level are equal to 1024.
+
+When a power limitation is applied to a node, then it is distributed along the 
children given their weights. For example, if we set a power limitation of 
3200mW at the 'SoC' root node, the resulting tree will be::
+
+  SoC (w=1024) <--- power_limit = 3200mW
+   |
+   |-- pkg (w=538) --> power_limit = 1681mW
+   ||
+   ||-- pd0 (w=231) --> power_limit = 378mW
+   ||
+   |`-- pd1 (w=794) --> power_limit = 1303mW
+   |
+   `-- pd2 (w=486) --> power_limit = 1519mW
+
+
+Flat description
+
+
+A root node is created and it is the parent of all the nodes. This
+description is the simplest one and it is supposed to give to
+userspace a flat representation of all the devices supporting the
+power limitation without any power limitation distribution.
+
+Hierarchical description
+
+
+The different devices supporting the power limitation are represented
+hierarchically. There is one root node, all intermediate nodes are
+grouping the child nodes which can be intermediate nodes also or real
+devices.
+
+The intermediate nodes aggregate the power information and allows to
+set the power limit g

[PATCH v3 1/4] units: Add Watt units

2020-11-23 Thread Daniel Lezcano

As there are the temperature units, let's add the Watt macros definition.

Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
---
 include/linux/units.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/units.h b/include/linux/units.h
index aaf716364ec3..92c234e71cab 100644
--- a/include/linux/units.h
+++ b/include/linux/units.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define MILLIWATT_PER_WATT 1000L
+#define MICROWATT_PER_MILLIWATT1000L
+#define MICROWATT_PER_WATT 100L
+
 #define ABSOLUTE_ZERO_MILLICELSIUS -273150
 
 static inline long milli_kelvin_to_millicelsius(long t)
-- 
2.17.1

[GIT PULL] thermal fixes for v5.10-rc5

2020-11-19 Thread Daniel Lezcano

The following changes since commit f8394f232b1eab649ce2df5c5f15b0e528c92091:

  Linux 5.10-rc3 (2020-11-08 16:10:16 -0800)

are available in the Git repository at:


ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git
tags/thermal-v5.10-rc5

for you to fetch changes up to b98467fe96d2415836d154ecfe1cd389bf4147b5:

  thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430
(2020-11-12 12:30:29 +0100)


- Disable the CPU PM notifier for OMAP4430 for suspend in order to
  prevent wrong temperature leading to a critical shutdown (Peter
  Ujfalusi)


Peter Ujfalusi (1):
  thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430

 drivers/thermal/ti-soc-thermal/ti-bandgap.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2 2/4] Documentation/powercap/dtpm: Add documentation for dtpm

2020-11-17 Thread Daniel Lezcano

On 17/11/2020 17:08, Jonathan Corbet wrote:
> On Mon, 16 Nov 2020 16:26:47 +0100
> Daniel Lezcano  wrote:
> 
>> The dynamic thermal and power management is a technique to dynamically
>> adjust the power consumption of different devices in order to ensure a
>> global thermal constraint.
>>
>> An userspace daemon is usually monitoring the temperature and the
>> power to take immediate action on the device.
>>
>> The DTPM framework provides an unified API to userspace to act on the
>> power.
>>
>> Document this framework.
> 
> It's always refreshing to see documentation show up with a new feature! :)
> 
> That said, it's clear that you haven't built the docs with this new
> material.  There's a couple of little things I would ask you to do...

Thanks for the review, I'll take care of your comments and generate the
documentation to check the expected result.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH v2 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-17 Thread Daniel Lezcano

On 17/11/2020 14:15, Lukasz Luba wrote:
> Hi Daniel,
> 
> Only one small comment regarding the setup of 'power_limit'.
> 
> On 11/16/20 3:26 PM, Daniel Lezcano wrote:
>> With the powercap dtpm controller, we are able to plug devices with
>> power limitation features in the tree.
>>
>> The following patch introduces the CPU power limitation based on the
>> energy model and the performance states.
>>
>> The power limitation is done at the performance domain level. If some
>> CPUs are unplugged, the corresponding power will be subtracted from
>> the performance domain total power.
>>
>> It is up to the platform to initialize the dtpm tree and add the CPU.
>>

[ ... ]

>> +
>> +    dtpm = per_cpu(dtpm_per_cpu, cpu);
>> +    if (dtpm)
>> +    return power_add(dtpm, pd);
> 
> The dtpm->power_limit is not incremented in this path, when a new
> CPU joins the cluster.
> Is it correct?

Yes, you are right, there is something missing here. It does not change
the behavior of the power capping, but the value will be inconsistent in
the tree.

> Or maybe we need something like:
> -->8-
>     if (dtpm) {
>     ret = power_add(dtpm, pd);
>     if (!ret)
>     dtpm->power_limit = dtpm->power_max;
>     return ret;
>     }
> 8<---
> 
> The power_max should be updated after successful power_add().
> It would disturb user set value in power_limit, though (described
> below).
> 
> 
>> +
>> +    dtpm = dtpm_alloc();
>> +    if (!dtpm)
>> +    return -EINVAL;
>> +
>> +    dtpm_cpu = kzalloc(sizeof(dtpm_cpu), GFP_KERNEL);
>> +    if (!dtpm_cpu) {
>> +    kfree(dtpm);
>> +    return -ENOMEM;
>> +    }
>> +
>> +    dtpm->private = dtpm_cpu;
>> +    dtpm_cpu->cpu = cpu;
>> +
>> +    for_each_cpu(cpu, policy->related_cpus)
>> +    per_cpu(dtpm_per_cpu, cpu) = dtpm;
>> +
>> +    ret = power_add(dtpm, pd);
>> +    if (ret)
>> +    goto out_kfree_dtpm_cpu;
>> +
>> +    dtpm->power_limit = dtpm->power_max;
> 
> Here, the power_limit will be set only once with power_max
> for a single CPU. I am not sure, but maybe we can simple say:
> 
> dtpm->power_limit = dtpm->power_max * cpumask_weight(policy->related_cpus)
> 
> an avoid touching it later (?)
> 
> Because this function can be called in runtime, when the power_limit
> was already set by userspace, the hotpluging in/out/in... CPU shouldn't
> change this limit.

Hmm, I have to think about it because the power_limit is always less or
equal to power_max.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH] thermal: Fix NULL pointer dereference issue

2020-11-17 Thread Daniel Lezcano

On 17/11/2020 12:27, Zhang Rui wrote:
> On Tue, 2020-11-17 at 09:57 +0100, Daniel Lezcano wrote:
>> On 17/11/2020 08:18, Zhang Rui wrote:
>>> On Mon, 2020-11-16 at 21:59 +0530, Mukesh Ojha wrote:
>>>> Cooling stats variable inside
>>>> thermal_cooling_device_stats_update()
>>>> can get NULL. We should add a NULL check on stat inside for
>>>> sanity.
>>>>
>>>> Signed-off-by: Mukesh Ojha 
>>>> ---
>>>>  drivers/thermal/thermal_sysfs.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/thermal/thermal_sysfs.c
>>>> b/drivers/thermal/thermal_sysfs.c
>>>> index a6f371f..f52708f 100644
>>>> --- a/drivers/thermal/thermal_sysfs.c
>>>> +++ b/drivers/thermal/thermal_sysfs.c
>>>> @@ -754,6 +754,9 @@ void
>>>> thermal_cooling_device_stats_update(struct
>>>> thermal_cooling_device *cdev,
>>>>  {
>>>>struct cooling_dev_stats *stats = cdev->stats;
>>>>  
>>>> +  if (!stats)
>>>> +  return;
>>>> +
>>>
>>> May I know in which case stats can be NULL?
>>> The only possibility I can see is that cdev->ops->get_max_state()
>>> fails
>>> in cooling_device_stats_setup(), right?
>>
>> A few lines below, the allocation could fail.
>>
>> stats = kzalloc(var, GFP_KERNEL);
>> if (!stats)
>> return;
>>
>> Some drivers define themselves as a cooling device state to let the
>> userspace to act on their power. The screen brightness is one example
>> with a cdev with 1024 states, the resulting stats table to be
>> allocated
>> is very big and the kzalloc is prone to fail.
>>
> Oh, right.
> As we're not going to fix the cdev, so I think we do need this patch,
> right?

If the allocation fails at this level if initialization there is clearly
something wrong. I'm wondering if it would make sense to report back an
error and make thermal_cooling_device_register to fail.

Having an allocation failing and silently ignore it sounds like not very
robust IMO.




-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH V4] clocksource/drivers/cadence_ttc: fix memory leak in ttc_setup_clockevent()

2020-11-17 Thread Daniel Lezcano

On 16/11/2020 14:51, Yu Kuai wrote:
> If clk_notifier_register() failed, ttc_setup_clockevent() will return
> without freeing 'ttcce', which will leak memory.
> 
> Fixes: 70504f311d4b ("clocksource/drivers/cadence_ttc: Convert init function 
> to return error")
> Reported-by: Hulk Robot 
> Signed-off-by: Yu Kuai 
> ---

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: Fix NULL pointer dereference issue

2020-11-17 Thread Daniel Lezcano

On 17/11/2020 08:18, Zhang Rui wrote:
> On Mon, 2020-11-16 at 21:59 +0530, Mukesh Ojha wrote:
>> Cooling stats variable inside thermal_cooling_device_stats_update()
>> can get NULL. We should add a NULL check on stat inside for sanity.
>>
>> Signed-off-by: Mukesh Ojha 
>> ---
>>  drivers/thermal/thermal_sysfs.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/thermal/thermal_sysfs.c
>> b/drivers/thermal/thermal_sysfs.c
>> index a6f371f..f52708f 100644
>> --- a/drivers/thermal/thermal_sysfs.c
>> +++ b/drivers/thermal/thermal_sysfs.c
>> @@ -754,6 +754,9 @@ void thermal_cooling_device_stats_update(struct
>> thermal_cooling_device *cdev,
>>  {
>>  struct cooling_dev_stats *stats = cdev->stats;
>>  
>> +if (!stats)
>> +return;
>> +
> May I know in which case stats can be NULL?
> The only possibility I can see is that cdev->ops->get_max_state() fails
> in cooling_device_stats_setup(), right?

A few lines below, the allocation could fail.

stats = kzalloc(var, GFP_KERNEL);
if (!stats)
return;

Some drivers define themselves as a cooling device state to let the
userspace to act on their power. The screen brightness is one example
with a cdev with 1024 states, the resulting stats table to be allocated
is very big and the kzalloc is prone to fail.

> thanks,
> rui
> 
>>  spin_lock(>lock);
>>  
>>  if (stats->state == new_state)
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

[PATCH V2 0/4] powercap/dtpm: Add the DTPM framework

2020-11-16 Thread Daniel Lezcano

The density of components greatly increased the last decade bringing a
numerous number of heating sources which are monitored by more than 20
sensors on recent SoC. The skin temperature, which is the case
temperature of the device, must stay below approximately 45°C in order
to comply with the legal requirements.

The skin temperature is managed as a whole by an user space daemon,
which is catching the current application profile, to allocate a power
budget to the different components where the resulting heating effect
will comply with the skin temperature constraint.

This technique is called the Dynamic Thermal Power Management.

The Linux kernel does not provide any unified interface to act on the
power of the different devices. Currently, the thermal framework is
changed to export artificially the performance states of different
devices via the cooling device software component with opaque values.
This change is done regardless of the in-kernel logic to mitigate the
temperature. The user space daemon uses all the available knobs to act
on the power limit and those differ from one platform to another.

This series provides a Dynamic Thermal Power Management framework to
provide an unified way to act on the power of the devices.

Changelog:
 V2:
  - Fixed indentation
  - Fixed typos in comments
  - Fixed missing kfree for dtpm_cpu
  - Capitalize letters in the Kconfig description
  - Reduced name description
  - Stringified section name
  - Added more debug traces in the code
  - Removed duplicate initialization in the dtpm cpu

Daniel Lezcano (4):
  units: Add Watt units
  Documentation/powercap/dtpm: Add documentation for dtpm
  powercap/drivers/dtpm: Add API for dynamic thermal power management
  powercap/drivers/dtpm: Add CPU energy model based support

 Documentation/power/powercap/dtpm.rst | 222 +
 drivers/powercap/Kconfig  |  13 +
 drivers/powercap/Makefile |   2 +
 drivers/powercap/dtpm.c   | 436 ++
 drivers/powercap/dtpm_cpu.c   | 282 +
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/cpuhotplug.h|   1 +
 include/linux/dtpm.h  |  75 +
 include/linux/units.h |   4 +
 9 files changed, 1046 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 drivers/powercap/dtpm_cpu.c
 create mode 100644 include/linux/dtpm.h

-- 
2.17.1

[PATCH v2 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-11-16 Thread Daniel Lezcano

On the embedded world, the complexity of the SoC leads to an
increasing number of hotspots which need to be monitored and mitigated
as a whole in order to prevent the temperature to go above the
normative and legally stated 'skin temperature'.

Another aspect is to sustain the performance for a given power budget,
for example virtual reality where the user can feel dizziness if the
GPU performance is capped while a big CPU is processing something
else. Or reduce the battery charging because the dissipated power is
too high compared with the power consumed by other devices.

The userspace is the most adequate place to dynamically act on the
different devices by limiting their power given an application
profile: it has the knowledge of the platform.

These userspace daemons are in charge of the Dynamic Thermal Power
Management (DTPM).

Nowadays, the dtpm daemons are abusing the thermal framework as they
act on the cooling device state to force a specific and arbitrary
state without taking care of the governor decisions. Given the closed
loop of some governors that can confuse the logic or directly enter in
a decision conflict.

As the number of cooling device support is limited today to the CPU
and the GPU, the dtpm daemons have little control on the power
dissipation of the system. The out of tree solutions are hacking
around here and there in the drivers, in the frameworks to have
control on the devices. The common solution is to declare them as
cooling devices.

There is no unification of the power limitation unit, opaque states
are used.

This patch provides a way to create a hierarchy of constraints using
the powercap framework. The devices which are registered as power
limit-able devices are represented in this hierarchy as a tree. They
are linked together with intermediate nodes which are just there to
propagate the constraint to the children.

The leaves of the tree are the real devices, the intermediate nodes
are virtual, aggregating the children constraints and power
characteristics.

Each node have a weight on a 2^10 basis, in order to reflect the
percentage of power distribution of the children's node. This
percentage is used to dispatch the power limit to the children.

The weight is computed against the max power of the siblings.

This simple approach allows to do a fair distribution of the power
limit.

Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
Tested-by: Lukasz Luba 
---
 drivers/powercap/Kconfig  |   6 +
 drivers/powercap/Makefile |   1 +
 drivers/powercap/dtpm.c   | 436 ++
 include/asm-generic/vmlinux.lds.h |  11 +
 include/linux/dtpm.h  |  72 +
 5 files changed, 526 insertions(+)
 create mode 100644 drivers/powercap/dtpm.c
 create mode 100644 include/linux/dtpm.h

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index bc228725346b..cc1953bd8bed 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -43,4 +43,10 @@ config IDLE_INJECT
  CPUs for power capping. Idle period can be injected
  synchronously on a set of specified CPUs or alternatively
  on a per CPU basis.
+
+config DTPM
+   bool "Power capping for Dynamic Thermal Power Management"
+   help
+ This enables support for the power capping for the dynamic
+ thermal power management userspace engine.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 7255c94ec61c..6482ac52054d 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_DTPM) += dtpm.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm.c b/drivers/powercap/dtpm.c
new file mode 100644
index ..a45c56b1c532
--- /dev/null
+++ b/drivers/powercap/dtpm.c
@@ -0,0 +1,436 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The powercap based Dynamic Thermal Power Management framework
+ * provides to the userspace a consistent API to set the power limit
+ * on some devices.
+ *
+ * DTPM defines the functions to create a tree of constraints. Each
+ * parent node is a virtual description of the aggregation of the
+ * children. It propagates the constraints set at its level to its
+ * children and collect the children power information. The leaves of
+ * the tree are the real devices which have the ability to get their
+ * current power consumption and set their power limit.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const char *constraint_name[] = {
+   "Instantaneous",
+};
+
+static struct powercap_control_type *pct;
+static struct dtpm *root;
+
+static int get_time_window_us(struct po

[PATCH v2 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-16 Thread Daniel Lezcano

With the powercap dtpm controller, we are able to plug devices with
power limitation features in the tree.

The following patch introduces the CPU power limitation based on the
energy model and the performance states.

The power limitation is done at the performance domain level. If some
CPUs are unplugged, the corresponding power will be subtracted from
the performance domain total power.

It is up to the platform to initialize the dtpm tree and add the CPU.

Here is an example to create a simple tree with one root node called
"pkg" and the CPU's performance domains.

static int dtpm_register_pkg(struct dtpm_descr *descr)
{
struct dtpm *pkg;
int ret;

pkg = dtpm_alloc();
if (!pkg)
return -ENOMEM;

ret = dtpm_register_parent(descr->name, pkg, descr->parent);
if (ret)
return ret;

return dtpm_register_cpu(pkg);
}

static struct dtpm_descr descr = {
.name = "pkg",
.init = dtpm_register_pkg,
};
DTPM_DECLARE(descr);

Signed-off-by: Daniel Lezcano 
---
 drivers/powercap/Kconfig|   7 +
 drivers/powercap/Makefile   |   1 +
 drivers/powercap/dtpm_cpu.c | 282 
 include/linux/cpuhotplug.h  |   1 +
 include/linux/dtpm.h|   3 +
 5 files changed, 294 insertions(+)
 create mode 100644 drivers/powercap/dtpm_cpu.c

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index cc1953bd8bed..20b4325c6161 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -49,4 +49,11 @@ config DTPM
help
  This enables support for the power capping for the dynamic
  thermal power management userspace engine.
+
+config DTPM_CPU
+   bool "Add CPU power capping based on the energy model"
+   depends on DTPM && ENERGY_MODEL
+   help
+ This enables support for CPU power limitation based on
+ energy model.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 6482ac52054d..fabcf388a8d3 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_DTPM) += dtpm.o
+obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
 obj-$(CONFIG_POWERCAP) += powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o
diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
new file mode 100644
index ..6bff5f27d891
--- /dev/null
+++ b/drivers/powercap/dtpm_cpu.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2020 Linaro Limited
+ *
+ * Author: Daniel Lezcano 
+ *
+ * The DTPM CPU is based on the energy model. It hooks the CPU in the
+ * DTPM tree which in turns update the power number by propagating the
+ * power number from the CPU energy model information to the parents.
+ *
+ * The association between the power and the performance state, allows
+ * to set the power of the CPU at the OPP granularity.
+ *
+ * The CPU hotplug is supported and the power numbers will be updated
+ * if a CPU is hot plugged / unplugged.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct dtpm *__parent;
+
+static DEFINE_PER_CPU(struct dtpm *, dtpm_per_cpu);
+
+struct dtpm_cpu {
+   struct freq_qos_request qos_req;
+   int cpu;
+};
+
+/*
+ * When a new CPU is inserted at hotplug or boot time, add the power
+ * contribution and update the dtpm tree.
+ */
+static int power_add(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min += dtpm->power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max += dtpm->power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+/*
+ * When a CPU is unplugged, remove its power contribution from the
+ * dtpm tree.
+ */
+static int power_sub(struct dtpm *dtpm, struct em_perf_domain *em)
+{
+   u64 power_min, power_max;
+
+   power_min = em->table[0].power;
+   power_min *= MICROWATT_PER_MILLIWATT;
+   power_min = dtpm->power_min - power_min;
+
+   power_max = em->table[em->nr_perf_states - 1].power;
+   power_max *= MICROWATT_PER_MILLIWATT;
+   power_max = dtpm->power_max - power_max;
+
+   return dtpm_update_power(dtpm, power_min, power_max);
+}
+
+static int set_pd_power_limit(struct powercap_zone *pcz, int cid,
+ u64 power_limit)
+{
+   struct dtpm *dtpm = to_dtpm(pcz);
+   struct dtpm_cpu *dtpm_cpu = dtpm->private;
+   struct em_perf_domain *pd;
+   struct cpumask cpus;
+   unsigned long freq;
+   u64 power;
+   int i, nr_cpus;
+
+   spin_lock(>lock);

[PATCH v2 2/4] Documentation/powercap/dtpm: Add documentation for dtpm

2020-11-16 Thread Daniel Lezcano

The dynamic thermal and power management is a technique to dynamically
adjust the power consumption of different devices in order to ensure a
global thermal constraint.

An userspace daemon is usually monitoring the temperature and the
power to take immediate action on the device.

The DTPM framework provides an unified API to userspace to act on the
power.

Document this framework.

Signed-off-by: Daniel Lezcano 
---
 Documentation/power/powercap/dtpm.rst | 222 ++
 1 file changed, 222 insertions(+)
 create mode 100644 Documentation/power/powercap/dtpm.rst

diff --git a/Documentation/power/powercap/dtpm.rst 
b/Documentation/power/powercap/dtpm.rst
new file mode 100644
index ..ce11cf183994
--- /dev/null
+++ b/Documentation/power/powercap/dtpm.rst
@@ -0,0 +1,222 @@
+==
+Dynamic Thermal Power Management framework
+==
+
+On the embedded world, the complexity of the SoC leads to an
+increasing number of hotspots which need to be monitored and mitigated
+as a whole in order to prevent the temperature to go above the
+normative and legally stated 'skin temperature'.
+
+Another aspect is to sustain the performance for a given power budget,
+for example virtual reality where the user can feel dizziness if the
+performance is capped while a big CPU is processing something else. Or
+reduce the battery charging because the dissipated power is too high
+compared with the power consumed by other devices.
+
+The userspace is the most adequate place to dynamically act on the
+different devices by limiting their power given an application
+profile: it has the knowledge of the platform.
+
+The Dynamic Thermal Power Management (DTPM) is a technique acting on
+the device power by limiting and/or balancing a power budget among
+different devices.
+
+The DTPM framework provides an unified interface to act on the
+device power.
+
+===
+1. Overview
+===
+
+The DTPM framework relies on the powercap framework to create the
+powercap entries in the sysfs directory and implement the backend
+driver to do the connection with the power manageable device.
+
+The DTPM is a tree representation describing the power constraints
+shared between devices, not their physical positions.
+
+The nodes of the tree are a virtual description aggregating the power
+characteristics of the children nodes and their power limitations.
+
+The leaves of the tree are the real power manageable devices.
+
+For instance:
+
+  SoC
+   |
+   `-- pkg
+   |
+   |-- pd0 (cpu0-3)
+   |
+   `-- pd1 (cpu4-5)
+
+* The pkg power will be the sum of pd0 and pd1 power numbers.
+
+  SoC (400mW - 3100mW)
+   |
+   `-- pkg (400mW - 3100mW)
+   |
+   |-- pd0 (100mW - 700mW)
+   |
+   `-- pd1 (300mW - 2400mW)
+
+* When the nodes are inserted in the tree, their power characteristics
+  are propagated to the parents.
+
+  SoC (600mW - 5900mW)
+   |
+   |-- pkg (400mW - 3100mW)
+   ||
+   ||-- pd0 (100mW - 700mW)
+   ||
+   |`-- pd1 (300mW - 2400mW)
+   |
+   `-- pd2 (200mW - 2800mW)
+
+* Each node have a weight on a 2^10 basis reflecting the percentage of
+  power consumption along the siblings.
+
+  SoC (w=1024)
+   |
+   |-- pkg (w=538)
+   ||
+   ||-- pd0 (w=231)
+   ||
+   |`-- pd1 (w=794)
+   |
+   `-- pd2 (w=486)
+
+   Note the sum of weights at the same level are equal to 1024.
+
+* When a power limitation is applied to a node, then it is distributed
+  along the children given their weights. For example, if we set a
+  power limitation of 3200mW at the 'SoC' root node, the resulting
+  tree will be.
+
+  SoC (w=1024) <--- power_limit = 3200mW
+   |
+   |-- pkg (w=538) --> power_limit = 1681mW
+   ||
+   ||-- pd0 (w=231) --> power_limit = 378mW
+   ||
+   |`-- pd1 (w=794) --> power_limit = 1303mW
+   |
+   `-- pd2 (w=486) --> power_limit = 1519mW
+
+
+1.1 Flat description
+
+
+A root node is created and it is the parent of all the nodes. This
+description is the simplest one and it is supposed to give to
+userspace a flat representation of all the devices supporting the
+power limitation without any power limitation distribution.
+
+
+1.2 Hierarchical description
+
+
+The different devices supporting the power limitation are represented
+hierarchically. There is one root node, all intermediate nodes are
+grouping the child nodes which can be intermediate nodes also or real
+devices.
+
+The intermediate nodes aggregate the power information and allows to
+set the power limit given the weight of the nodes.
+
+
+2. Userspace API
+
+
+As stated in the overview, the DTPM framework is built on top of the
+powercap framework. Thus the sysfs interface is the same, please refer
+to the powercap documentation for further details.

[PATCH v2 1/4] units: Add Watt units

2020-11-16 Thread Daniel Lezcano

As there are the temperature units, let's add the Watt macros definition.

Signed-off-by: Daniel Lezcano 
Reviewed-by: Lukasz Luba 
---
 include/linux/units.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/units.h b/include/linux/units.h
index aaf716364ec3..92c234e71cab 100644
--- a/include/linux/units.h
+++ b/include/linux/units.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define MILLIWATT_PER_WATT 1000L
+#define MICROWATT_PER_MILLIWATT1000L
+#define MICROWATT_PER_WATT 100L
+
 #define ABSOLUTE_ZERO_MILLICELSIUS -273150
 
 static inline long milli_kelvin_to_millicelsius(long t)
-- 
2.17.1

Re: [PATCH] thermal: amlogic: Add hwmon support

2020-11-16 Thread Daniel Lezcano

On 15/11/2020 20:06, Martin Blumenstingl wrote:
> Many monitoring tools read the CPU temperature using the hwmon
> interface. Expose the thermal sensors on Amlogic boards as hwmon
> devices.
> 
> Without this lm_sensors' "sensors" tool does not find any temperature
> sensors. Now it prints:
>   cpu_thermal-virtual-0
>   Adapter: Virtual device
>   temp1:+44.7 C  (crit = +110.0 C)
> 
>   ddr_thermal-virtual-0
>   Adapter: Virtual device
>   temp1:+45.9 C  (crit = +110.0 C)
> 
> Signed-off-by: Martin Blumenstingl 
> ---

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: intel_pch_thermal: Add PCI ids for Lewisburg PCH.

2020-11-16 Thread Daniel Lezcano

On 13/11/2020 21:49, Andres Freund wrote:
> I noticed that I couldn't read the PCH temperature on my workstation
> (C620 series chipset, w/ 2x Xeon Gold 5215 CPUs) directly, but had to go
> through IPMI. Looking at the data sheet, it looks to me like the
> existing intel PCH thermal driver should work without changes for
> Lewisburg.
> 
> I suspect there's some other PCI IDs missing. But I hope somebody at
> Intel would have an easier time figuring that out than I...
> 
> Cc: Daniel Lezcano 
> Cc: Srinivas Pandruvada 
> Cc: Tushar Dave 
> Cc: Zhang Rui 
> Cc: linux...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: 
> https://lore.kernel.org/lkml/20200115184415.1726953-1-and...@anarazel.de/
> Signed-off-by: Andres Freund 
> ---

Applied, thanks

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH v5 0/2] dt-bindings: timer: renesas: tmu: Document r8a774e1 and

2020-11-12 Thread Daniel Lezcano

On 10/11/2020 17:20, Geert Uytterhoeven wrote:
>   Hi Daniel, Thomas,
> 
> This patch series picks up missing Device Tree binding updates for the
> Renesas Timer Unit (TMU), and converts the bindings to json-schema.
> 
> Thanks for applying!
> 
> Geert Uytterhoeven (1):
>   dt-bindings: timer: renesas: tmu: Convert to json-schema
> 
> Marian-Cristian Rotariu (1):
>   dt-bindings: timer: renesas: tmu: Document r8a774e1 bindings
> 
>  .../devicetree/bindings/timer/renesas,tmu.txt | 49 -
>  .../bindings/timer/renesas,tmu.yaml   | 99 +++
>  2 files changed, 99 insertions(+), 49 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.txt
>  create mode 100644 Documentation/devicetree/bindings/timer/renesas,tmu.yaml

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2 05/17] clocksource/hyperv: use MSR-based access if running as root

2020-11-12 Thread Daniel Lezcano

On 12/11/2020 12:24, Wei Liu wrote:
> On Thu, Nov 12, 2020 at 10:56:17AM +0100, Daniel Lezcano wrote:
>> On 05/11/2020 17:58, Wei Liu wrote:
>>> Signed-off-by: Wei Liu 

Acked-by: Daniel Lezcano 

>>> ---
>>
>> I would like to apply this patch but the changelog is too short (one line).
>>
>> Please add a small paragraph (no need to resend just answer here, I will
>> amend the log myself.
> 
> Please don't apply this to your tree. It is dependent on a previous
> patch. I expect this series to go through the hyperv tree.
> 
> I will add in this small paragraph in a later version:
> 
> When Linux runs as the root partition, the setup required for TSC page
> is different. Luckily Linux also has access to the MSR based
> clocksource. We can just disable the TSC page clocksource if Linux is
> the root partition.
> 
> If you're happy with the description, I would love to have an ack from
> you. I will funnel the patch via the hyperv tree.
> 
> Wei.
> 


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH] docs: thermal: time_in_state is displayed in msec and not usertime

2020-11-12 Thread Daniel Lezcano

On 10/11/2020 11:43, Viresh Kumar wrote:
> The sysfs stats for cooling devices shows the time_in_state in msec,
> remove the unwanted usertime comment.
> 
> Signed-off-by: Viresh Kumar 
> ---

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: intel_pch_thermal: Add PCI ids for Lewisburg PCH.

2020-11-12 Thread Daniel Lezcano

On 28/10/2020 21:21, Andres Freund wrote:
> Hi,
> 
> On 2020-01-16 11:41:34 -0800, Srinivas Pandruvada wrote:
>> On Thu, 2020-01-16 at 10:42 -0800, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2020-01-16 05:53:13 -0800, Srinivas Pandruvada wrote:
 On Wed, 2020-01-15 at 10:44 -0800, Andres Freund wrote:
> I noticed that I couldn't read the PCH temperature on my
> workstation
> (C620 series chipset, w/ 2x Xeon Gold 5215 CPUs) directly, but
> had to
> go
> through IPMI. Looking at the data sheet, it looks to me like the
> existing intel PCH thermal driver should work without changes for
> Lewisburg.
 Does the temperature reading match with what you read via IPMI?
>>>
>>> It does:
>>>
>>> root@awork3:~# ipmitool sdr|grep ^PCH
>>> PCH Temp | 58 degrees C  | ok
>>>
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/type
>>> pch_lewisburg
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/temp
>>> 58000
>>>
>>> And if I generate some load, it rises for both:
>>> root@awork3:~# ipmitool sdr|grep ^PCH
>>> PCH Temp | 60 degrees C  | ok
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/temp
>>> 6
>>>
>> Thanks for the test.
>>
>> Rui can add his ACK.
> 
> Ping? Looks like this got lost somewhere?

It does no longer apply, is it possible to do a respin ?

Thanks

  -- Daniel


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: ti-soc-thermal: Disable the CPU PM notifier for OMAP4430

2020-11-12 Thread Daniel Lezcano

On 03/11/2020 07:42, Peter Ujfalusi wrote:
> Eduardo, Keerthy,
> 
> On 29/10/2020 12.51, Tony Lindgren wrote:
>> * Peter Ujfalusi  [201029 10:03]:
>>> Disabling the notifier fixes the random shutdowns on OMAP4430 (ES2.0 and 
>>> ES2.1)
>>> but it does not cause any issues on OMAP4460 (PandaES) or OMAP3630 
>>> (BeagleXM).
>>> Tony's duovero with OMAP4430 ES2.3 did not ninja-shutdown, but he also have
>>> constant and steady stream of:
>>> thermal thermal_zone0: failed to read out thermal zone (-5)
>>
>> Works for me and I've verified duovero still keeps hitting core ret idle:
> 
> Can you pick this one up for 5.10 to make omap4430-sdp to be usable (to
> not shut down randomly).
> The regression was introduced in 5.10-rc1.
> 
>> Tested-by: Tony Lindgren 

Applied as a fix for v5.10-rc



-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: intel_pch_thermal: Add PCI ids for Lewisburg PCH.

2020-11-12 Thread Daniel Lezcano

On 28/10/2020 21:21, Andres Freund wrote:
> Hi,
> 
> On 2020-01-16 11:41:34 -0800, Srinivas Pandruvada wrote:
>> On Thu, 2020-01-16 at 10:42 -0800, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2020-01-16 05:53:13 -0800, Srinivas Pandruvada wrote:
 On Wed, 2020-01-15 at 10:44 -0800, Andres Freund wrote:
> I noticed that I couldn't read the PCH temperature on my
> workstation
> (C620 series chipset, w/ 2x Xeon Gold 5215 CPUs) directly, but
> had to
> go
> through IPMI. Looking at the data sheet, it looks to me like the
> existing intel PCH thermal driver should work without changes for
> Lewisburg.
 Does the temperature reading match with what you read via IPMI?
>>>
>>> It does:
>>>
>>> root@awork3:~# ipmitool sdr|grep ^PCH
>>> PCH Temp | 58 degrees C  | ok
>>>
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/type
>>> pch_lewisburg
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/temp
>>> 58000
>>>
>>> And if I generate some load, it rises for both:
>>> root@awork3:~# ipmitool sdr|grep ^PCH
>>> PCH Temp | 60 degrees C  | ok
>>> andres@awork3:~$ cat /sys/class/thermal/thermal_zone0/temp
>>> 6
>>>
>> Thanks for the test.
>>
>> Rui can add his ACK.
> 
> Ping? Looks like this got lost somewhere?

Waiting for Rui's ack :)


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2] thermal/drivers/cpufreq_cooling: Update cpufreq_state only if state has changed

2020-11-12 Thread Daniel Lezcano

On 06/11/2020 10:22, zhuguangqin...@gmail.com wrote:
> From: Zhuguangqing 
> 
> If state has not changed successfully and we updated cpufreq_state,
> next time when the new state is equal to cpufreq_state (not changed
> successfully last time), we will return directly and miss a
> freq_qos_update_request() that should have been.
> 
> Fixes: 5130802ddbb1 ("thermal: cpu_cooling: Switch to QoS requests for freq 
> limits")
> Cc: v5.4+  # v5.4+
> Signed-off-by: Zhuguangqing 
> Acked-by: Viresh Kumar 
> ---

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [Resend][PATCH] drivers/thermal: fix potential memleak in error branch

2020-11-12 Thread Daniel Lezcano

On 10/11/2020 09:10, Bernard wrote:
> Function __thermal_cooling_device_register, when device_register
> failed, cdev is not free after error value return, this may
> bring in potential memleak.
> 
> Signed-off-by: Bernard Zhao 
> ---
>  drivers/thermal/thermal_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 3d1e0033bf3e..e4bee15dfa1f 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1125,6 +1125,7 @@ __thermal_cooling_device_register(struct device_node 
> *np,
>   if (result) {
>   ida_simple_remove(_cdev_ida, cdev->id);
>   put_device(>device);
> + kfree(cdev);
>   return ERR_PTR(result);
>   }

Please fix the function with the proper error path and the labels.

Thanks

  -- Daniel


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [Resend][PATCH] drivers/thermal: cleanup coding style a bit

2020-11-12 Thread Daniel Lezcano

On 10/11/2020 09:10, Bernard wrote:
> Function thermal_add_hwmon_sysfs, hwmon will be NULL when
> new_hwmon_device = 0, so there is no need to check, kfree will
> handle NULL point.
> 
> Signed-off-by: Bernard Zhao 
> ---

Applied, thanks



-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2] thermal: sun8i: Use bitmap API instead of open code

2020-11-12 Thread Daniel Lezcano

On 09/11/2020 12:46, Frank Lee wrote:
> From: Yangtao Li 
> 
> The bitmap_* API is the standard way to access data in the bitfield.
> So convert irq_ack to return an unsigned long, and make things to use
> bitmap API.
> 
> Signed-off-by: Yangtao Li 
> ---
> v2:
> Make irq_ack to return an unsigned long
> ---

Applied, thanks

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH V2] clocksource/drivers/cadence_ttc: fix memory leak in ttc_setup_clockevent()

2020-11-12 Thread Daniel Lezcano

On 11/11/2020 02:16, Yu Kuai wrote:
> If clk_notifier_register() failed, ttc_setup_clockevent() will return
> without freeing 'ttcce', which will leak memory.
> 
> Fixes: 70504f311d4b ("clocksource/drivers/cadence_ttc: Convert init function 
> to return error")
> Reported-by: Hulk Robot 
> Signed-off-by: Yu Kuai 
> Reviewed-by: Michal Simek 
> ---
>  drivers/clocksource/timer-cadence-ttc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/clocksource/timer-cadence-ttc.c 
> b/drivers/clocksource/timer-cadence-ttc.c
> index 80e960602030..32b9560ce408 100644
> --- a/drivers/clocksource/timer-cadence-ttc.c
> +++ b/drivers/clocksource/timer-cadence-ttc.c
> @@ -426,6 +426,7 @@ static int __init ttc_setup_clockevent(struct clk *clk,
>   >ttc.clk_rate_change_nb);
>   if (err) {
>   pr_warn("Unable to register clock notifier.\n");
> + kfree(ttcce);
>   return err;
>   }

Please fix the error path by adding a label (eg. out_kfree) and jump to
it in case of error everywhere in the function.

out_kfree:
kfree(ttcce);

return err;


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2 05/17] clocksource/hyperv: use MSR-based access if running as root

2020-11-12 Thread Daniel Lezcano

On 05/11/2020 17:58, Wei Liu wrote:
> Signed-off-by: Wei Liu 
> ---

I would like to apply this patch but the changelog is too short (one line).

Please add a small paragraph (no need to resend just answer here, I will
amend the log myself.

>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..269a691bd2c4 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
>   if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>   return false;
>  
> + if (hv_root_partition)
> + return false;
> +
>   hv_read_reference_counter = read_hv_clock_tsc;
>   phys_addr = virt_to_phys(hv_get_tsc_page());
>  
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH 1/3] drivers/clocksource: Remove EZChip NPS clocksource driver

2020-11-12 Thread Daniel Lezcano

On 05/11/2020 22:22, Vineet Gupta wrote:
> NPS platform has been removed from ARC port and there are no in-tree
> users of it now. So RIP !
> 
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vineet Gupta 
> ---

Applied, thanks!


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-11-10 Thread Daniel Lezcano

On 10/11/2020 12:05, Lukasz Luba wrote:
> 
> Actually I've found one issue when I have been trying to clean
> my testing branch with modified scmi-cpufreq.c.

IMO, those errors are not the dtpm framework fault but the scmi-cpufreq.

You should add a component in the drivers/powercap which does the glue
between the scmi-cpufreq and the dtpm. No stub will be needed in this
case as the component will depend on CONFIG_DTPM.

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-11-10 Thread Daniel Lezcano



Hi Lukasz,

thanks for the review

On 10/11/2020 10:59, Lukasz Luba wrote:

[ ... ]

>> +/* Init section thermal table */
>> +extern struct dtpm_descr *__dtpm_table[];
>> +extern struct dtpm_descr *__dtpm_table_end[];
>> +
>> +#define DTPM_TABLE_ENTRY(name)    \
>> +    static typeof(name) *__dtpm_table_entry_##name    \
>> +    __used __section(__dtpm_table) = 
> 
> I had to change the section name to string, to pass compilation:
> __used __section("__dtpm_table") = 
> I don't know if it's my compiler or configuration.

Actually, it is:

commit 33def8498fdde180023444b08e12b72a9efed41d
Author: Joe Perches 
Date:   Wed Oct 21 19:36:07 2020 -0700

treewide: Convert macro and uses of __section(foo) to __section("foo")

Your change is correct, I've noticed it a few days ago when rebasing the
series.

> I've tried to register this DTPM in scmi-cpufreq.c with macro
> proposed in patch 4/4 commit message, but I might missed some
> important includes there...
> 
>> +
>> +#define DTPM_DECLARE(name)    DTPM_TABLE_ENTRY(name)
>> +
>> +#define for_each_dtpm_table(__dtpm)    \
>> +    for (__dtpm = __dtpm_table;    \
>> + __dtpm < __dtpm_table_end;    \
>> + __dtpm++)
>> +
>> +static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
>> +{
>> +    return container_of(zone, struct dtpm, zone);
>> +}
>> +
>> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max);
>> +
>> +int dtpm_release_zone(struct powercap_zone *pcz);
>> +
>> +struct dtpm *dtpm_alloc(void);
>> +
>> +void dtpm_unregister(struct dtpm *dtpm);
>> +
>> +int dtpm_register_parent(const char *name, struct dtpm *dtpm,
>> + struct dtpm *parent);
>> +
>> +int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm
>> *parent,
>> +  struct powercap_zone_ops *ops, int nr_constraints,
>> +  struct powercap_zone_constraint_ops *const_ops);
>> +#endif
>>
> 
> Minor comment. This new framework deserves more debug prints, especially
> in registration/unregistration paths. I had to put some, to test it.
> But it can be done later as well, after it gets into mainline.

Ok, I will add some debug traces.

> I have also run different hotplug stress tests to check this tree
> locking. The userspace process constantly reading these values, while
> the last CPU in the cluster was going on/off and node was detaching.
> I haven't seen any problems, but the tree wasn't so deep.
> Everything was calculated properly, no error, null pointers, etc.

Great! thank you very much for this test

> Apart from the spelling minor issues and the long constraint name, LGTM
> 
> Reviewed-by: Lukasz Luba 
> Tested-by: Lukasz Luba 

Thanks for the review

  -- Daniel


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH 1/5] cpuidle: Remove pointless stub

2020-11-05 Thread Daniel Lezcano



Hi Rafael,

On 05/11/2020 15:14, Rafael J. Wysocki wrote:

[ ... ]

>> [ ... ]
>>
>>> Applied (this patch alone) as 5.10-rc material with some minor edits
>>> in the changelog, thanks!
>>
>> Does it mean you disagree the other patches? Or are you waiting for more
>> comments?
> 
> Actually, both.
> 
> My primary concern regarding the modularization of cpuidle governors
> is that they really belong to the core kernel.  They access the
> internals thereof and the behavior of the idle loop in general depends
> on what they do.
> 
> OTOH IMV a potential benefit resulting from allowing them to be
> modular is very small, at least from the mainline perspective.
> 
> IOW this case is very similar to the modularization of the schedutil
> cpufreq governor which we decided not to do for similar reasons.

Yes, I recall this discussion.

The purpose is to be more consistent with the governors in the other
frameworks which can be module. But as you mentioned it, the benefit is
small, so I'm fine to drop the series.

Thanks

  -- Daniel

-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: Rearrange thermal_zone_device_set_polling()

2020-11-05 Thread Daniel Lezcano

On 05/11/2020 10:13, Viresh Kumar wrote:
> Rearrange thermal_zone_device_set_polling() to make it more readable and
> reduce duplicate code.
> 
> Signed-off-by: Viresh Kumar 
> ---

Hi Viresh,

I have a series where this function is reworked and conflicts with your
changes. The delay is converted into jiffies at init time and no
conversion happen in this function anymore.

Do you mind if we discard this patch ?


>  drivers/thermal/thermal_core.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index c6d74bc1c90b..7dfab370a369 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -291,16 +291,17 @@ static int __init thermal_register_governors(void)
>  static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
>   int delay)
>  {
> - if (delay > 1000)
> - mod_delayed_work(system_freezable_power_efficient_wq,
> -  >poll_queue,
> -  round_jiffies(msecs_to_jiffies(delay)));
> - else if (delay)
> + if (delay) {
> + int time = msecs_to_jiffies(delay);
> +
> + if (delay > 1000)
> + time = round_jiffies(time);
> +
>   mod_delayed_work(system_freezable_power_efficient_wq,
> -  >poll_queue,
> -  msecs_to_jiffies(delay));
> - else
> +  >poll_queue, time);
> + } else {
>   cancel_delayed_work(>poll_queue);
> + }
>  }
>  
>  static inline bool should_stop_polling(struct thermal_zone_device *tz)
> 


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-04 Thread Daniel Lezcano

On 04/11/2020 11:57, Lukasz Luba wrote:

[ ... ]

> Let me go again through the patches and then I will add my reviewed by.
> 
> I will also run LTP hotplug or LISA hotplug torture on this tree,
> just to check it's fine.

Ah yes, good idea. Thanks for doing that.


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support

2020-11-04 Thread Daniel Lezcano



Hi Lukasz,


On 23/10/2020 15:27, Lukasz Luba wrote:
> Hi Daniel,
> 
> 
> On 10/6/20 1:20 PM, Daniel Lezcano wrote:
>> With the powercap dtpm controller, we are able to plug devices with
>> power limitation features in the tree.
>>
>> The following patch introduces the CPU power limitation based on the
>> energy model and the performance states.
>>
>> The power limitation is done at the performance domain level. If some
>> CPUs are unplugged, the corresponding power will be substracted from
>> the performance domain total power.
>>
>> It is up to the platform to initialize the dtpm tree and add the CPU.
>>
>> Here is an example to create a simple tree with one root node called
>> "pkg" and the cpu's performance domains.

[ ... ]

>> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid,
>> +  u64 power_limit)
>> +{
>> +    struct dtpm *dtpm = to_dtpm(pcz);
>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>> +    struct em_perf_domain *pd;
>> +    unsigned long freq;
>> +    int i, nr_cpus;
>> +
>> +    spin_lock(>lock);
>> +
>> +    power_limit = clamp_val(power_limit, dtpm->power_min,
>> dtpm->power_max);
>> +
>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>> +
>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>> +
>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>> +
>> +    u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
>> +
>> +    if ((power * nr_cpus) > power_limit)
> 
> We have one node in that DTPM hierarchy tree, which represents all CPUs
> which are in 'related_cpus' mask. I saw below that we just remove the
> node in hotplug.

The last CPU hotplugged will remove the node.

> I have put a comment below asking if we could change the registration,
> which will affect power calculation.
> 
> 
>> +    break;
>> +    }
>> +
>> +    freq = pd->table[i - 1].frequency;
>> +
>> +    freq_qos_update_request(_cpu->qos_req, freq);
>> +
>> +    dtpm->power_limit = power_limit;
>> +
>> +    spin_unlock(>lock);
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64
>> *data)
>> +{
>> +    struct dtpm *dtpm = to_dtpm(pcz);
>> +
>> +    spin_lock(>lock);
>> +    *data = dtpm->power_max;
>> +    spin_unlock(>lock);
>> +
>> +    return 0;
>> +}
>> +
>> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw)
>> +{
>> +    struct dtpm *dtpm = to_dtpm(pcz);
>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>> +    struct em_perf_domain *pd;
>> +    unsigned long freq;
>> +    int i, nr_cpus;
>> +
>> +    freq = cpufreq_quick_get(dtpm_cpu->cpu);
>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>> +
>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>> +
>> +    if (pd->table[i].frequency < freq)
>> +    continue;
>> +
>> +    *power_uw = pd->table[i].power *
>> +    MICROWATT_PER_MILLIWATT * nr_cpus;
> 
> Same here, we have 'nr_cpus'.
> 
>> +
>> +    return 0;
>> +    }
>> +
>> +    return -EINVAL;
>> +}
>> +
>> +static int cpu_release_zone(struct powercap_zone *pcz)
>> +{
>> +    struct dtpm *dtpm = to_dtpm(pcz);
>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>> +
>> +    freq_qos_remove_request(_cpu->qos_req);
>> +
>> +    return dtpm_release_zone(pcz);
>> +}
>> +
>> +static struct powercap_zone_constraint_ops pd_constraint_ops = {
>> +    .set_power_limit_uw = set_pd_power_limit,
>> +    .get_power_limit_uw = get_pd_power_limit,
>> +};
>> +
>> +static struct powercap_zone_ops pd_zone_ops = {
>> +    .get_power_uw = get_pd_power_uw,
>> +    .release = cpu_release_zone,
>> +};
>> +
>> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
>> +{
>> +    struct cpufreq_policy *policy;
>> +    struct em_perf_domain *pd;
>> +    struct dtpm *dtpm;
>> +
>> +    policy = cpufreq_cpu_get(cpu);
>> +
>> +    if (!policy)
>> +    return 0;
>> +
>> +    pd = em_cpu_get(cpu);
>> +    if (!pd)
>> +    return -EINVAL;
>> +
>> +    dtpm = per_cpu(dtpm_per_cpu, cpu);
>> +
>> +    power_sub(dtpm, pd);
>> +
>> +    if (cpumask_w

Re: [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management

2020-11-03 Thread Daniel Lezcano



Hi Lukasz,

thanks for the review and the comments.

On 23/10/2020 12:29, Lukasz Luba wrote:
> Hi Daniel,

[ ... ]

>> +
>> +config DTPM
>> +    bool "Power capping for dynamic thermal power management"
> 
> Maybe starting with capital letters: Dynamic Thermal Power Management?

Ok, noted.

[ ... ]

>> +++ b/drivers/powercap/dtpm.c
>> @@ -0,0 +1,430 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright 2020 Linaro Limited
>> + *
>> + * Author: Daniel Lezcano 
>> + *
>> + * The powercap based Dynamic Thermal Power Management framework
>> + * provides to the userspace a consistent API to set the power limit
>> + * on some devices.
>> + *
>> + * DTPM defines the functions to create a tree of constraints. Each
>> + * parent node is a virtual description of the aggregation of the
>> + * children. It propagates the constraints set at its level to its
>> + * children and collect the children power infomation. The leaves of
> 
> s/infomation/information/

Ok, thanks

[ ... ]

>> +static struct powercap_control_type *pct;
>> +static struct dtpm *root;
> 
> I wonder if it safe to have the tree without a global lock for it, like
> mutex tree_lock ?
> I have put some comments below when the code traverses the tree.

The mutex is a heavy lock and the its purpose is to allow the current
process to be preempted while the spinlock is very fast without preemption.

Putting in place a single lock will simplify the code but I'm not sure
it is worth as it could be a contention. It would be simpler to switch
to a big lock than the opposite.

[ ... ]

>> +static void dtpm_rebalance_weight(void)
>> +{
>> +    __dtpm_rebalance_weight(root);
>> +}
>> +
>> +static void dtpm_sub_power(struct dtpm *dtpm)
>> +{
>> +    struct dtpm *parent = dtpm->parent;
>> +
>> +    while (parent) {
> 
> I am not sure if it is safe for a corner case when the
> nodes are removing from bottom to top. We don't hold a tree
> lock, so these two (above line and below) operations might
> be split/preempted and 'parent' freed before taking the lock.
> Is it possible? (Note: I might missed something like double
> locking using this local node spinlock).

The parent can not be freed until it has children, the check is done in
the release node function.

>> +    spin_lock(>lock);
>> +    parent->power_min -= dtpm->power_min;
>> +    parent->power_max -= dtpm->power_max;
>> +    spin_unlock(>lock);
>> +    parent = parent->parent;
>> +    }
>> +
>> +    dtpm_rebalance_weight();
>> +}
>> +
>> +static void dtpm_add_power(struct dtpm *dtpm)
>> +{
>> +    struct dtpm *parent = dtpm->parent;
>> +
>> +    while (parent) {
> 
> Similar here?
> 
>> +    spin_lock(>lock);
>> +    parent->power_min += dtpm->power_min;
>> +    parent->power_max += dtpm->power_max;
>> +    spin_unlock(>lock);
>> +    parent = parent->parent;
>> +    }
>> +
>> +    dtpm_rebalance_weight();
>> +}
>> +
>> +/**
>> + * dtpm_update_power - Update the power on the dtpm
>> + * @dtpm: a pointer to a dtpm structure to update
>> + * @power_min: a u64 representing the new power_min value
>> + * @power_max: a u64 representing the new power_max value
>> + *
>> + * Function to update the power values of the dtpm node specified in
>> + * parameter. These new values will be propagated to the tree.
>> + *
>> + * Return: zero on success, -EINVAL if the values are inconsistent
>> + */
>> +int dtpm_update_power(struct dtpm *dtpm, u64 power_min, u64 power_max)
>> +{
>> +    if (power_min == dtpm->power_min && power_max == dtpm->power_max)
>> +    return 0;
>> +
>> +    if (power_max < power_min)
>> +    return -EINVAL;
>> +
>> +    dtpm_sub_power(dtpm);
>> +    spin_lock(>lock);
>> +    dtpm->power_min = power_min;
>> +    dtpm->power_max = power_max;
>> +    spin_unlock(>lock);
>> +    dtpm_add_power(dtpm);
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * dtpm_release_zone - Cleanup when the node is released
>> + * @pcz: a pointer to a powercap_zone structure
>> + *
>> + * Do some housecleaning and update the weight on the tree. The
>> + * release will be denied if the node has children. This function must
>> + * be called by the specific release callback of the different
>> + * backends.
>> + *
>> + * Return: 0 on success, -EBUSY

Re: [PATCH 1/3] dt-bindings: thermal: mediatek: make resets property optional

2020-10-27 Thread Daniel Lezcano

On 21/10/2020 18:42, Fabien Parent wrote:
> MT8516 Thermal IP does not support reset. Make the resets property
> optional in order to be able to support MT8516 SoC.
> 
> Signed-off-by: Fabien Parent 
> ---

Applied, thanks




-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH] thermal: rcar: replace spin_lock_irqsave by spin_lock in hard IRQ

2020-10-27 Thread Daniel Lezcano

On 27/10/2020 02:06, Tian Tao wrote:
> On RT or even on mainline with 'threadirqs' on the command line all
> interrupts which are not explicitly requested with IRQF_NO_THREAD
> run their handlers in thread context. The same applies to soft interrupts.
> That means they are subject to the normal scheduler rules and no other
> code is going to acquire that lock from hard interrupt context either,
> so the irqsave() here is pointless in all cases.
> 
> Signed-off-by: Tian Tao 

Applied, thanks


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

Re: [PATCH v2] drivers/thermal: optimize the for circle to run a bit fast

2020-10-27 Thread Daniel Lezcano

On 27/10/2020 02:37, Bernard Zhao wrote:
> Function thermal_zone_device_register, in the for circle, if the
> first if branch set the count bit in tz->trips_disabled, there is
> no need to set in the other if branch again.
> This change is to make the code run a bit fast and readable.
> 
> Signed-off-by: Bernard Zhao 
> 
> Changes since V1:
> *make the code more clear and readable
> 
> Link for V1:
> *https://lore.kernel.org/patchwork/patch/1324507/
> ---

Applied with the patch description massaged.

https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=testing=37b2539e63d6570c9ee51b1d48bdecb334df367d


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 8072 matches

Mail list logo