RE: [RFC/RFT][PATCH 2/4] cpufreq: intel_pstate: Change P-state selection algorithm for Core

2016-09-07 Thread Doug Smythies
On 2016.09.02 18:02 Rafael J. Wysocki wrote:

...[cut]...

> This includes an IIR filter on top of the load-based P-state selection,
> but the filter is applied to the non-boosted case only (otherwise it
> defeats the point of the boost) and I used a slightly different raw gain
> value.

The different gain value, 12.5% instead 10%, does come at a cost of some
energy. Although we are finding inconsistencies in the test results.
(I estimated about 2.2% energy cost, for my 20% SpecPower simulator test,
and scaling off of a simple graph I did of energy vs gain with the previous
version).

...[cut]...
> + intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
> + target = clamp_val(target, int_tofp(min_perf), int_tofp(max_perf));
> + sample->target = fp_toint(target + (1 << (FRAC_BITS-1)));
> + return sample->target;
> +}
> +

In my earlier proposed versions, it was very much on purpose that it
was keeping the pseudo floating point filtered target. Excerpt:

+   cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
+   cpu->sample.target + scaled_gain *
+   unfiltered_target, int_tofp(100));
+   /*
+* Clamp the filtered value.
+*/
+   intel_pstate_get_min_max(cpu, &min_perf, &max_perf);
+   if (cpu->sample.target < int_tofp(min_perf))
+   cpu->sample.target = int_tofp(min_perf);
+   if (cpu->sample.target > int_tofp(max_perf))
+   cpu->sample.target = int_tofp(max_perf);
+
+   return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));

Why? To prevent a lock up scenario where, depending on the processor
and the gain settings, the target pstate would never kick over to the
next value. i.e. if it only increased 1/3 of a pstate per iteration
as the filter approached its steady state value. While this condition
did occur in my older proposed implementations, with my processor it
doesn't seem to with this implementation. I didn't theoretically check
other processors.

Another side effect of this change is effectively a further increase
in the gain setting, and thus more energy being given back. This was
determined by looking at step function load response times, as opposed
to math analysis. (I can make pretty graphs if you want.)

The purpose of this e-mail just to make us aware of the tradeoffs,
not to imply it should change.

... Doug




[RFC/RFT][PATCH 2/4] cpufreq: intel_pstate: Change P-state selection algorithm for Core

2016-09-02 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The PID-base P-state selection algorithm used by intel_pstate for
Core processors is based on very weak foundations.  Namely, its
decisions are mostly based on the values of the APERF and MPERF
feedback registers and it only estimates the actual utilization to
check if it is not extremely low (in order to avoid getting stuck
in the highest P-state in that case).

Since it generally causes the CPU P-state to ramp up quickly, it
leads to satisfactory performance, but the metric used by it is only
really valid when the CPU changes P-states by itself (ie. in the turbo
range) and if the P-state value set by the driver is treated by the
CPU as the upper limit on turbo P-states selected by it.

As a result, the only case when P-states are reduced by that
algorithm is when the CPU has just come out of idle, but in that
particular case it would have been better to bump up the P-state
instead.  That causes some benchmarks to behave erratically and
attempts to improve the situation lead to excessive energy
consumption, because they make the CPU stay in very high P-states
almost all the time.

Consequently, the only viable way to fix that is to replace the
erroneous algorithm entirely with a new one.

To that end, notice that setting the P-state proportional to the
actual CPU utilization (measured with the help of MPERF and TSC)
generally leads to reasonable behavior, but it does not reflect
the "performance boosting" nature of the current P-state
selection algorithm.  It may be made more similar to that
algorithm, though, by adding iowait boosting to it.

Specifically, if the P-state is bumped up to the maximum after
receiving the SCHED_CPUFREQ_IOWAIT flag via cpufreq_update_util(),
it will allow tasks that were previously waiting on I/O to get the
full capacity of the CPU when they are ready to process data again
and that should lead to the desired performance increase overall
without sacrificing too much energy.

However, the utilization-based method of target P-state selection
may cause the resultant target P-state to oscillate which generally
leads to excessive consumption of energy, so apply an Infinite
Impulse Response filter on top of it to dampen those osciallations
and make it more energy-efficient (thanks to Doug Smythies for this
idea).

Use the approach as described in intel_pstate for Core processors.

Original-by: Srinivas Pandruvada 
Suggested-by: Doug Smythies 
Signed-off-by: Rafael J. Wysocki 
---

This includes an IIR filter on top of the load-based P-state selection,
but the filter is applied to the non-boosted case only (otherwise it
defeats the point of the boost) and I used a slightly different raw gain
value.

Thanks,
Rafael

---
 drivers/cpufreq/intel_pstate.c |   81 +++--
 1 file changed, 79 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -98,6 +98,7 @@ static inline u64 div_ext_fp(u64 x, u64
  * @tsc:   Difference of time stamp counter between last and
  * current sample
  * @time:  Current time from scheduler
+ * @target:Target P-state
  *
  * This structure is used in the cpudata structure to store performance sample
  * data for choosing next P State.
@@ -109,6 +110,7 @@ struct sample {
u64 mperf;
u64 tsc;
u64 time;
+   int target;
 };
 
 /**
@@ -181,6 +183,8 @@ struct _pid {
  * @cpu:   CPU number for this instance data
  * @update_util:   CPUFreq utility callback information
  * @update_util_set:   CPUFreq utility callback is set
+ * @iowait_boost:  iowait-related boost fraction
+ * @last_update:   Time of the last update.
  * @pstate:Stores P state limits for this CPU
  * @vid:   Stores VID limits for this CPU
  * @pid:   Stores PID parameters for this CPU
@@ -206,6 +210,7 @@ struct cpudata {
struct vid_data vid;
struct _pid pid;
 
+   u64 last_update;
u64 last_sample_time;
u64 prev_aperf;
u64 prev_mperf;
@@ -216,6 +221,7 @@ struct cpudata {
struct acpi_processor_performance acpi_perf_data;
bool valid_pss_table;
 #endif
+   unsigned int iowait_boost;
 };
 
 static struct cpudata **all_cpu_data;
@@ -229,6 +235,7 @@ static struct cpudata **all_cpu_data;
  * @p_gain_pct:PID proportional gain
  * @i_gain_pct:PID integral gain
  * @d_gain_pct:PID derivative gain
+ * @boost_iowait:  Whether or not to use iowait boosting.
  *
  * Stores per CPU model static PID configuration data.
  */
@@ -240,6 +247,7 @@ struct pstate_adjust_policy {
int p_gain_pct;
int d_gain_pct;
int i_gain_pct;
+   bool boost_iowait;
 };
 
 /**
@@ -277,6 +285,7 @@ struct c