Re: [PATCH V2 5/7] thermal/drivers/cpu_cooling: Add idle cooling device documentation

2018-03-08 Thread Daniel Thompson
On Thu, Mar 08, 2018 at 09:59:49AM +0100, Pavel Machek wrote:
> Hi!
> 
> > >> +Under certain circumstances, the SoC reaches a temperature exceeding
> > >> +the allocated power budget or the maximum temperature limit. The
> > > 
> > > I don't understand. Power budget is in W, temperature is in
> > > kelvin. Temperature can't exceed power budget AFAICT.
> > 
> > Yes, it is badly worded. Is the following better ?
> > 
> > "
> > Under certain circumstances a SoC can reach the maximum temperature
> > limit or is unable to stabilize the temperature around a temperature
> > control.
> > 
> > When the SoC has to stabilize the temperature, the kernel can act on a
> > cooling device to mitigate the dissipated power.
> > 
> > When the maximum temperature is reached and to prevent a catastrophic
> > situation a radical decision must be taken to reduce the temperature
> > under the critical threshold, that impacts the performance.
> > 
> > "
> 
> Actually... if hardware is expected to protect itself, I'd tone it
> down. No need to be all catastrophic and critical... But yes, better.

Makes sense. For a thermally overcommitted but passively cooled device 
work close to max operating temperature it is not a critical situation 
requiring a radical reaction, it is normal operation.

Put another way, it would severely bogus to attach KERN_CRITICAL 
messages to reaching the cooling threshold.


Daniel.


> > > Critical here, critical there. I have trouble following
> > > it. Theoretically hardware should protect itself, because you don't
> > > want kernel bug to damage your CPU?
> > 
> > There are several levels of protection. The first level is mitigating
> > the temperature from the kernel, then in the temperature sensor a reset
> > line will trigger the reboot of the CPUs. Usually it is a register where
> > you write the maximum temperature, from the driver itself. I never tried
> > to write 1000°C in this register and see if I can burn the board.
> > 
> > I know some boards have another level of thermal protection in the
> > hardware itself and some other don't.
> > 
> > In any case, from a kernel point of view, it is a critical situation as
> > we are about to hard reboot the system and in this case it is preferable
> > to drop drastically the performance but give the opportunity to the
> > system to run in a degraded mode.
> 
> Agreed you want to keep going. In ACPI world, we shutdown when
> critical trip point is reached, so this is somehow confusing.
> 
> > >> +Solutions:
> > >> +--
> > >> +
> > >> +If we can remove the static and the dynamic leakage for a specific
> > >> +duration in a controlled period, the SoC temperature will
> > >> +decrease. Acting at the idle state duration or the idle cycle
> > > 
> > > "should" decrease? If you are in bad environment..
> > 
> > No, it will decrease in any case because of the static leakage drop. The
> > bad environment will impact the speed of this decrease.
> 
> I meant... if ambient temperature is 105C, there's not much you can do
> to cool system down :-).
> 
> > >> +Idle Injection:
> > >> +---
> > >> +
> > >> +The base concept of the idle injection is to force the CPU to go to an
> > >> +idle state for a specified time each control cycle, it provides
> > >> +another way to control CPU power and heat in addition to
> > >> +cpufreq. Ideally, if all CPUs of a cluster inject idle synchronously,
> > >> +this cluster can get into the deepest idle state and achieve minimum
> > >> +power consumption, but that will also increase system response latency
> > >> +if we inject less than cpuidle latency.
> > > 
> > > I don't understand last sentence.
> > 
> > Is it better ?
> > 
> > "Ideally, if all CPUs, belonging to the same cluster, inject their idle
> > cycle synchronously, the cluster can reach its power down state with a
> > minimum power consumption and static leakage drop. However, these idle
> > cycles injection will add extra latencies as the CPUs will have to
> > wakeup from a deep sleep state."
> 
> Extra comma "CPUs , belonging". But yes, better.
> 
> > Thanks!
> 
> You are welcome. Best regards,
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/7] thermal/drivers/cpu_cooling: Add idle cooling device documentation

2018-03-08 Thread Pavel Machek
Hi!

> >> +Under certain circumstances, the SoC reaches a temperature exceeding
> >> +the allocated power budget or the maximum temperature limit. The
> > 
> > I don't understand. Power budget is in W, temperature is in
> > kelvin. Temperature can't exceed power budget AFAICT.
> 
> Yes, it is badly worded. Is the following better ?
> 
> "
> Under certain circumstances a SoC can reach the maximum temperature
> limit or is unable to stabilize the temperature around a temperature
> control.
> 
> When the SoC has to stabilize the temperature, the kernel can act on a
> cooling device to mitigate the dissipated power.
> 
> When the maximum temperature is reached and to prevent a catastrophic
> situation a radical decision must be taken to reduce the temperature
> under the critical threshold, that impacts the performance.
> 
> "

Actually... if hardware is expected to protect itself, I'd tone it
down. No need to be all catastrophic and critical... But yes, better.

> > Critical here, critical there. I have trouble following
> > it. Theoretically hardware should protect itself, because you don't
> > want kernel bug to damage your CPU?
> 
> There are several levels of protection. The first level is mitigating
> the temperature from the kernel, then in the temperature sensor a reset
> line will trigger the reboot of the CPUs. Usually it is a register where
> you write the maximum temperature, from the driver itself. I never tried
> to write 1000°C in this register and see if I can burn the board.
> 
> I know some boards have another level of thermal protection in the
> hardware itself and some other don't.
> 
> In any case, from a kernel point of view, it is a critical situation as
> we are about to hard reboot the system and in this case it is preferable
> to drop drastically the performance but give the opportunity to the
> system to run in a degraded mode.

Agreed you want to keep going. In ACPI world, we shutdown when
critical trip point is reached, so this is somehow confusing.

> >> +Solutions:
> >> +--
> >> +
> >> +If we can remove the static and the dynamic leakage for a specific
> >> +duration in a controlled period, the SoC temperature will
> >> +decrease. Acting at the idle state duration or the idle cycle
> > 
> > "should" decrease? If you are in bad environment..
> 
> No, it will decrease in any case because of the static leakage drop. The
> bad environment will impact the speed of this decrease.

I meant... if ambient temperature is 105C, there's not much you can do
to cool system down :-).

> >> +Idle Injection:
> >> +---
> >> +
> >> +The base concept of the idle injection is to force the CPU to go to an
> >> +idle state for a specified time each control cycle, it provides
> >> +another way to control CPU power and heat in addition to
> >> +cpufreq. Ideally, if all CPUs of a cluster inject idle synchronously,
> >> +this cluster can get into the deepest idle state and achieve minimum
> >> +power consumption, but that will also increase system response latency
> >> +if we inject less than cpuidle latency.
> > 
> > I don't understand last sentence.
> 
> Is it better ?
> 
> "Ideally, if all CPUs, belonging to the same cluster, inject their idle
> cycle synchronously, the cluster can reach its power down state with a
> minimum power consumption and static leakage drop. However, these idle
> cycles injection will add extra latencies as the CPUs will have to
> wakeup from a deep sleep state."

Extra comma "CPUs , belonging". But yes, better.

> Thanks!

You are welcome. Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH V2 5/7] thermal/drivers/cpu_cooling: Add idle cooling device documentation

2018-03-07 Thread Daniel Lezcano
On 07/03/2018 00:19, Pavel Machek wrote:
> Hi!

Hi Pavel,

thanks for taking the time to review the documentation.

>> --- /dev/null
>> +++ b/Documentation/thermal/cpu-idle-cooling.txt
>> @@ -0,0 +1,165 @@
>> +
>> +Situation:
>> +--
>> +
> 
> Can we have some real header here? Also if this is .rst, maybe it
> should be marked so?

Ok, I will fix it.

>> +Under certain circumstances, the SoC reaches a temperature exceeding
>> +the allocated power budget or the maximum temperature limit. The
> 
> I don't understand. Power budget is in W, temperature is in
> kelvin. Temperature can't exceed power budget AFAICT.

Yes, it is badly worded. Is the following better ?

"
Under certain circumstances a SoC can reach the maximum temperature
limit or is unable to stabilize the temperature around a temperature
control.

When the SoC has to stabilize the temperature, the kernel can act on a
cooling device to mitigate the dissipated power.

When the maximum temperature is reached and to prevent a catastrophic
situation a radical decision must be taken to reduce the temperature
under the critical threshold, that impacts the performance.

"

>> +former must be mitigated to stabilize the SoC temperature around the
>> +temperature control using the defined cooling devices, the latter
> 
> later?
> 
>> +catastrophic situation where a radical decision must be taken to
>> +reduce the temperature under the critical threshold, that can impact
>> +the performances.
> 
> performance.
> 
>> +Another situation is when the silicon reaches a certain temperature
>> +which continues to increase even if the dynamic leakage is reduced to
>> +its minimum by clock gating the component. The runaway phenomena will
>> +continue with the static leakage and only powering down the component,
>> +thus dropping the dynamic and static leakage will allow the component
>> +to cool down. This situation is critical.
> 
> Critical here, critical there. I have trouble following
> it. Theoretically hardware should protect itself, because you don't
> want kernel bug to damage your CPU?

There are several levels of protection. The first level is mitigating
the temperature from the kernel, then in the temperature sensor a reset
line will trigger the reboot of the CPUs. Usually it is a register where
you write the maximum temperature, from the driver itself. I never tried
to write 1000°C in this register and see if I can burn the board.

I know some boards have another level of thermal protection in the
hardware itself and some other don't.

In any case, from a kernel point of view, it is a critical situation as
we are about to hard reboot the system and in this case it is preferable
to drop drastically the performance but give the opportunity to the
system to run in a degraded mode.

>> +Last but not least, the system can ask for a specific power budget but
>> +because of the OPP density, we can only choose an OPP with a power
>> +budget lower than the requested one and underuse the CPU, thus losing
>> +performances. In other words, one OPP under uses the CPU with a
> 
> performance.
> 
>> +lesser than the power budget and the next OPP exceed the power budget,
>> +an intermediate OPP could have been used if it were present.
> 
> was.
> 
>> +Solutions:
>> +--
>> +
>> +If we can remove the static and the dynamic leakage for a specific
>> +duration in a controlled period, the SoC temperature will
>> +decrease. Acting at the idle state duration or the idle cycle
> 
> "should" decrease? If you are in bad environment..

No, it will decrease in any case because of the static leakage drop. The
bad environment will impact the speed of this decrease.

>> +The Operating Performance Point (OPP) density has a great influence on
>> +the control precision of cpufreq, however different vendors have a
>> +plethora of OPP density, and some have large power gap between OPPs,
>> +that will result in loss of performance during thermal control and
>> +loss of power in other scenes.
> 
> scene seems to be wrong word here.

yes, 'scenario' will be better :)

>> +At a specific OPP, we can assume injecting idle cycle on all CPUs,
> 
> Extra comma?
> 
>> +Idle Injection:
>> +---
>> +
>> +The base concept of the idle injection is to force the CPU to go to an
>> +idle state for a specified time each control cycle, it provides
>> +another way to control CPU power and heat in addition to
>> +cpufreq. Ideally, if all CPUs of a cluster inject idle synchronously,
>> +this cluster can get into the deepest idle state and achieve minimum
>> +power consumption, but that will also increase system response latency
>> +if we inject less than cpuidle latency.
> 
> I don't understand last sentence.

Is it better ?

"Ideally, if all CPUs, belonging to the same cluster, inject their idle
cycle synchronously, the cluster can reach its power down state with a
minimum power consumption and static leakage drop. However, these idle
cycles injection will add extra latencies as the CP

Re: [PATCH V2 5/7] thermal/drivers/cpu_cooling: Add idle cooling device documentation

2018-03-06 Thread Pavel Machek
Hi!

> --- /dev/null
> +++ b/Documentation/thermal/cpu-idle-cooling.txt
> @@ -0,0 +1,165 @@
> +
> +Situation:
> +--
> +

Can we have some real header here? Also if this is .rst, maybe it
should be marked so?

> +Under certain circumstances, the SoC reaches a temperature exceeding
> +the allocated power budget or the maximum temperature limit. The

I don't understand. Power budget is in W, temperature is in
kelvin. Temperature can't exceed power budget AFAICT.

> +former must be mitigated to stabilize the SoC temperature around the
> +temperature control using the defined cooling devices, the latter

later?

> +catastrophic situation where a radical decision must be taken to
> +reduce the temperature under the critical threshold, that can impact
> +the performances.

performance.

> +Another situation is when the silicon reaches a certain temperature
> +which continues to increase even if the dynamic leakage is reduced to
> +its minimum by clock gating the component. The runaway phenomena will
> +continue with the static leakage and only powering down the component,
> +thus dropping the dynamic and static leakage will allow the component
> +to cool down. This situation is critical.

Critical here, critical there. I have trouble following
it. Theoretically hardware should protect itself, because you don't
want kernel bug to damage your CPU?

> +Last but not least, the system can ask for a specific power budget but
> +because of the OPP density, we can only choose an OPP with a power
> +budget lower than the requested one and underuse the CPU, thus losing
> +performances. In other words, one OPP under uses the CPU with a

performance.

> +lesser than the power budget and the next OPP exceed the power budget,
> +an intermediate OPP could have been used if it were present.

was.

> +Solutions:
> +--
> +
> +If we can remove the static and the dynamic leakage for a specific
> +duration in a controlled period, the SoC temperature will
> +decrease. Acting at the idle state duration or the idle cycle

"should" decrease? If you are in bad environment..

> +The Operating Performance Point (OPP) density has a great influence on
> +the control precision of cpufreq, however different vendors have a
> +plethora of OPP density, and some have large power gap between OPPs,
> +that will result in loss of performance during thermal control and
> +loss of power in other scenes.

scene seems to be wrong word here.

> +At a specific OPP, we can assume injecting idle cycle on all CPUs,

Extra comma?

> +Idle Injection:
> +---
> +
> +The base concept of the idle injection is to force the CPU to go to an
> +idle state for a specified time each control cycle, it provides
> +another way to control CPU power and heat in addition to
> +cpufreq. Ideally, if all CPUs of a cluster inject idle synchronously,
> +this cluster can get into the deepest idle state and achieve minimum
> +power consumption, but that will also increase system response latency
> +if we inject less than cpuidle latency.

I don't understand last sentence.

> +The mitigation begins with a maximum period value which decrease

decreases?
  
> +more cooling effect is requested. When the period duration is equal
> to
> +the idle duration, then we are in a situation the platform can’t
> +dissipate the heat enough and the mitigation fails. In this case

fast enough?

> +situation is considered critical and there is nothing to do. The idle

Nothing to do? Maybe power the system down?

> +The idle injection duration value must comply with the constraints:
> +
> +- It is lesser or equal to the latency we tolerate when the mitigation

less ... than the latency

> +Minimum period
> +--
> +
> +The idle injection duration being fixed, it is obvious the minimum
> +period can’t be lesser than that, otherwise we will be scheduling the

less.

> +Practically, if the running power is lesses than the targeted power,

less.

> +However, in this demonstration we ignore three aspects:
> +
> + * The static leakage is not defined here, we can introduce it in the
> +   equation but assuming it will be zero most of the time as it is

, but?

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


[PATCH V2 5/7] thermal/drivers/cpu_cooling: Add idle cooling device documentation

2018-02-21 Thread Daniel Lezcano
Provide some documentation for the idle injection cooling effect in
order to let people to understand the rational of the approach for the
idle injection CPU cooling device.

Signed-off-by: Daniel Lezcano 
---
 Documentation/thermal/cpu-idle-cooling.txt | 165 +
 1 file changed, 165 insertions(+)
 create mode 100644 Documentation/thermal/cpu-idle-cooling.txt

diff --git a/Documentation/thermal/cpu-idle-cooling.txt 
b/Documentation/thermal/cpu-idle-cooling.txt
new file mode 100644
index 000..29fc651
--- /dev/null
+++ b/Documentation/thermal/cpu-idle-cooling.txt
@@ -0,0 +1,165 @@
+
+Situation:
+--
+
+Under certain circumstances, the SoC reaches a temperature exceeding
+the allocated power budget or the maximum temperature limit. The
+former must be mitigated to stabilize the SoC temperature around the
+temperature control using the defined cooling devices, the latter is a
+catastrophic situation where a radical decision must be taken to
+reduce the temperature under the critical threshold, that can impact
+the performances.
+
+Another situation is when the silicon reaches a certain temperature
+which continues to increase even if the dynamic leakage is reduced to
+its minimum by clock gating the component. The runaway phenomena will
+continue with the static leakage and only powering down the component,
+thus dropping the dynamic and static leakage will allow the component
+to cool down. This situation is critical.
+
+Last but not least, the system can ask for a specific power budget but
+because of the OPP density, we can only choose an OPP with a power
+budget lower than the requested one and underuse the CPU, thus losing
+performances. In other words, one OPP under uses the CPU with a power
+lesser than the power budget and the next OPP exceed the power budget,
+an intermediate OPP could have been used if it were present.
+
+Solutions:
+--
+
+If we can remove the static and the dynamic leakage for a specific
+duration in a controlled period, the SoC temperature will
+decrease. Acting at the idle state duration or the idle cycle
+injection period, we can mitigate the temperature by modulating the
+power budget.
+
+The Operating Performance Point (OPP) density has a great influence on
+the control precision of cpufreq, however different vendors have a
+plethora of OPP density, and some have large power gap between OPPs,
+that will result in loss of performance during thermal control and
+loss of power in other scenes.
+
+At a specific OPP, we can assume injecting idle cycle on all CPUs,
+belonging to the same cluster, with a duration greater than the
+cluster idle state target residency, we drop the static and the
+dynamic leakage for this period (modulo the energy needed to enter
+this state). So the sustainable power with idle cycles has a linear
+relation with the OPP’s sustainable power and can be computed with a
+coefficient similar to:
+
+   Power(IdleCycle) = Coef x Power(OPP)
+
+Idle Injection:
+---
+
+The base concept of the idle injection is to force the CPU to go to an
+idle state for a specified time each control cycle, it provides
+another way to control CPU power and heat in addition to
+cpufreq. Ideally, if all CPUs of a cluster inject idle synchronously,
+this cluster can get into the deepest idle state and achieve minimum
+power consumption, but that will also increase system response latency
+if we inject less than cpuidle latency.
+
+ ^
+ |
+ |
+ |---   ---   ---
+ |___|_|___|_|___|___
+
+  <->
+   idle  <>
+  running
+
+With the fixed idle injection duration, we can give a value which is
+an acceptable performance drop off or latency when we reach a specific
+temperature and we begin to mitigate by varying the Idle injection
+period.
+
+The mitigation begins with a maximum period value which decrease when
+more cooling effect is requested. When the period duration is equal to
+the idle duration, then we are in a situation the platform can’t
+dissipate the heat enough and the mitigation fails. In this case the
+situation is considered critical and there is nothing to do. The idle
+injection duration must be changed by configuration and until we reach
+the cooling effect, otherwise an additionnal cooling device must be
+used or ultimately decrease the SoC performance by dropping the
+highest OPP point of the SoC.
+
+The idle injection duration value must comply with the constraints:
+
+- It is lesser or equal to the latency we tolerate when the mitigation
+  begins. It is platform dependent and will depend on the user
+  experience, reactivity vs performance trade off we want. This value
+  should be specified.
+
+- It is greater than the idle state’s target residency we want to go
+  for thermal mitigation, otherwise we end up consuming more energy.
+
+Minimum period
+--
+
+The idle injection duration being fixed, it