[RFC][PATCH] cpufreq: governor: Change the calculation of load for deferred updates

2016-11-17 Thread Stratos Karafotis
Commit 18b46abd0009 ("cpufreq: governor: Be friendly towards latency-
sensitive bursty workloads"), introduced a method to copy the calculated
load from the previous sampling period in case of a deferred timer
(update).

This helps on bursty workloads but generally coping the load for the
previous measurement could be arbitrary, because of the possibly different
nature of the new workload.

Instead of coping the load from the previous period we can calculate the
load considering that between the two samples, the busy time is comparable
to one sampling period. Thus:

 busy = time_elapsed - idle_time

and

 load = 100 * busy / sampling_rate;

Also, remove the 'unlikely' hint because it seems that a deferred update
is a very common case on most modern systems.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq_governor.c | 37 +++--
 drivers/cpufreq/cpufreq_governor.h |  7 +--
 2 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 0196467..8675180 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -163,25 +163,17 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 * calls, so the previous load value can be used then.
 */
load = j_cdbs->prev_load;
-   } else if (unlikely(time_elapsed > 2 * sampling_rate &&
-   j_cdbs->prev_load)) {
+   } else if (time_elapsed > 2 * sampling_rate) {
+   unsigned int busy, periods;
+
/*
 * If the CPU had gone completely idle and a task has
 * just woken up on this CPU now, it would be unfair to
 * calculate 'load' the usual way for this elapsed
 * time-window, because it would show near-zero load,
 * irrespective of how CPU intensive that task actually
-* was. This is undesirable for latency-sensitive bursty
-* workloads.
-*
-* To avoid this, reuse the 'load' from the previous
-* time-window and give this task a chance to start with
-* a reasonably high CPU frequency. However, that
-* shouldn't be over-done, lest we get stuck at a high
-* load (high frequency) for too long, even when the
-* current system load has actually dropped down, so
-* clear prev_load to guarantee that the load will be
-* computed again next time.
+* was. This is undesirable, so we can safely consider
+* that the CPU is busy only in the last sampling period
 *
 * Detecting this situation is easy: the governor's
 * utilization update handler would not have run during
@@ -189,8 +181,16 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 * 'time_elapsed' (as compared to the sampling rate)
 * indicates this scenario.
 */
-   load = j_cdbs->prev_load;
-   j_cdbs->prev_load = 0;
+   busy = time_elapsed - idle_time;
+   if (busy > sampling_rate)
+   load = 100;
+   else
+   load = 100 * busy / sampling_rate;
+   j_cdbs->prev_load = load;
+
+   periods = time_elapsed / sampling_rate;
+   if (periods < idle_periods)
+   idle_periods = periods;
} else {
if (time_elapsed >= idle_time) {
load = 100 * (time_elapsed - idle_time) / 
time_elapsed;
@@ -215,13 +215,6 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
j_cdbs->prev_load = load;
}

-   if (time_elapsed > 2 * sampling_rate) {
-   unsigned int periods = time_elapsed / sampling_rate;
-
-   if (periods < idle_periods)
-   idle_periods = periods;
-   }
-
if (load > max_load)
max_load = load;
}
diff --git a/drivers/cpufreq/cpufreq_governor.h 
b/drivers/cpufreq/cpufreq_governor.h
index f5717ca..e068a31 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -114,12 +114,7 @@ struct cpu_dbs_info {

[PATCH v2] cpufreq: conservative: Fix comment explaining frequency updates

2016-11-16 Thread Stratos Karafotis
The original comment about the frequency increase to maximum is wrong.

Both increase and decrease happen at steps.

Signed-off-by: Stratos Karafotis 
---
 -> v2
Remove a trailing space

 drivers/cpufreq/cpufreq_conservative.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index a48b724..7522ec6 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -55,8 +55,8 @@ static inline unsigned int get_freq_step(struct cs_dbs_tuners 
*cs_tuners,
  * sampling_down_factor, we check, if current idle time is more than 80%
  * (default), then we try to decrease frequency
  *
- * Any frequency increase takes it to the maximum frequency. Frequency 
reduction
- * happens at minimum steps of 5% (default) of maximum frequency
+ * Frequency updates happen at minimum steps of 5% (default) of maximum
+ * frequency
  */
 static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
 {
-- 
2.7.4


[PATCH] cpufreq: conservative: Fix comment explaining frequency updates

2016-11-16 Thread Stratos Karafotis
The original comment about the frequency increase to maximum is wrong.

Both increase and decrease happen at steps.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq_conservative.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index a48b724..7522ec6 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -55,8 +55,8 @@ static inline unsigned int get_freq_step(struct cs_dbs_tuners 
*cs_tuners,
  * sampling_down_factor, we check, if current idle time is more than 80%
  * (default), then we try to decrease frequency
  *
- * Any frequency increase takes it to the maximum frequency. Frequency 
reduction
- * happens at minimum steps of 5% (default) of maximum frequency
+ * Frequency updates happen at minimum steps of 5% (default) of maximum
+ * frequency
  */
 static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
 {
-- 
2.7.4


[PATCH v4] cpufreq: conservative: Decrease frequency faster when the update deferred

2016-11-16 Thread Stratos Karafotis
   [007]    672.823095: cpu_frequency: state=160 cpu_id=7

WITH

 -0 [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4380.949767: cpu_frequency: state=200 cpu_id=7
kworker/7:2-399   [007]   4380.969765: cpu_frequency: state=220 cpu_id=7
kworker/7:2-399   [007]   4381.009766: cpu_frequency: state=250 cpu_id=7
kworker/7:2-399   [007]   4381.029767: cpu_frequency: state=260 cpu_id=7
kworker/7:2-399   [007]   4381.049769: cpu_frequency: state=280 cpu_id=7
kworker/7:2-399   [007]   4381.069769: cpu_frequency: state=300 cpu_id=7
kworker/7:2-399   [007]   4381.089771: cpu_frequency: state=310 cpu_id=7
kworker/7:2-399   [007]   4381.109772: cpu_frequency: state=340 cpu_id=7
kworker/7:2-399   [007]   4381.129773: cpu_frequency: state=3401000 cpu_id=7
 -0 [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
 -0 [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
 -0 [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.649268: cpu_frequency: state=280 cpu_id=7
 -0 [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.801748: cpu_frequency: state=170 cpu_id=7
 -0 [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4429.086293: cpu_frequency: state=160 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis 
Acked-by: Viresh Kumar 
---
 v3 -> v4
- Fix format issues
- Ack by Viresh Kumar

 v2 -> v3
- cpufreq_conservative.c: move the calculation below the block that check
the limits
- Calculate the freq_step only once
- Fix the bug introduced in dbs_update() because of wrong estimation of
'if' conditions

 v1 -> v2
- Use correct terminology in change log
- Change the member variable name from 'deferred_periods' to 'idle_periods'
- Fix format issue

 drivers/cpufreq/cpufreq_conservative.c | 22 +++---
 drivers/cpufreq/cpufreq_governor.c | 12 +++-
 drivers/cpufreq/cpufreq_governor.h |  1 +
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 0681fcf..a48b724 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -66,6 +66,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
struct dbs_data *dbs_data = policy_dbs->dbs_data;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
unsigned int load = dbs_update(policy);
+   unsigned int freq_step;

/*
 * break out if we 'cannot' reduce the speed as the user might
@@ -82,6 +83,23 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
if (requested_freq > policy->max || requested_freq < policy->min)
requested_freq = policy->cur;

+   freq_step = get_freq_step(cs_tuners, policy);
+
+   /*
+* Decrease requested_freq one freq_step for each idle period that
+* we didn't update the frequency.
+*/
+   if (policy_dbs->idle_periods < UINT_MAX) {
+   unsigned int freq_steps = policy_dbs->idle_periods * freq_step;
+
+   if (requested_freq > freq_steps)
+   requested_freq -= freq_steps;
+   else
+   requested_freq = policy->min;
+
+   policy_dbs->idle_periods = UINT_MAX;
+   }
+
/* Check for frequency increase */
if (load > dbs_data->up_threshold) {
dbs_info->down_skip = 0;
@@ -90,7 +108,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
if (requested_freq == policy->max)
goto out;

-   requested_freq += get_freq_step(cs_tuners, policy);
+   requested_freq += freq_step;
if (requested_freq > policy->max)
requested_freq = policy->max;

@@ -106,14 +124,12 @@ static unsigned int cs_dbs_update(struct

[PATCH v3] cpufreq: conservative: Decrease frequency faster when the update deferred

2016-11-15 Thread Stratos Karafotis
   [007]    672.823095: cpu_frequency: state=160 cpu_id=7

WITH

 -0 [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4380.949767: cpu_frequency: state=200 cpu_id=7
kworker/7:2-399   [007]   4380.969765: cpu_frequency: state=220 cpu_id=7
kworker/7:2-399   [007]   4381.009766: cpu_frequency: state=250 cpu_id=7
kworker/7:2-399   [007]   4381.029767: cpu_frequency: state=260 cpu_id=7
kworker/7:2-399   [007]   4381.049769: cpu_frequency: state=280 cpu_id=7
kworker/7:2-399   [007]   4381.069769: cpu_frequency: state=300 cpu_id=7
kworker/7:2-399   [007]   4381.089771: cpu_frequency: state=310 cpu_id=7
kworker/7:2-399   [007]   4381.109772: cpu_frequency: state=340 cpu_id=7
kworker/7:2-399   [007]   4381.129773: cpu_frequency: state=3401000 cpu_id=7
 -0 [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
 -0 [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
 -0 [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.649268: cpu_frequency: state=280 cpu_id=7
 -0 [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.801748: cpu_frequency: state=170 cpu_id=7
 -0 [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4429.086293: cpu_frequency: state=160 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis 
---
 v2 -> v3
- cpufreq_conservative.c: move the calculation below the block that check
the limits
- Calculate the freq_step only once
- Fix the bug introduced in dbs_update() because of wrong estimation of
'if' conditions

 v1 -> v2
- Use correct terminology in change log
- Change the member variable name from 'deferred_periods' to 'idle_periods'
- Fix format issue

 drivers/cpufreq/cpufreq_conservative.c | 21 ++---
 drivers/cpufreq/cpufreq_governor.c |  9 -
 drivers/cpufreq/cpufreq_governor.h |  1 +
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 0681fcf..808cc4d 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -66,6 +66,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
struct dbs_data *dbs_data = policy_dbs->dbs_data;
struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
unsigned int load = dbs_update(policy);
+   unsigned int freq_step;

/*
 * break out if we 'cannot' reduce the speed as the user might
@@ -82,6 +83,22 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
if (requested_freq > policy->max || requested_freq < policy->min)
requested_freq = policy->cur;

+   freq_step = get_freq_step(cs_tuners, policy);
+
+   /*
+* Decrease requested_freq one freq_step for each idle period that
+* we didn't update the frequency.
+*/
+   if (policy_dbs->idle_periods < UINT_MAX) {
+   unsigned int freq_steps = policy_dbs->idle_periods * freq_step;
+
+   if (requested_freq > freq_steps)
+   requested_freq -= freq_steps;
+   else
+   requested_freq = policy->min;
+   policy_dbs->idle_periods = UINT_MAX;
+   }
+
/* Check for frequency increase */
if (load > dbs_data->up_threshold) {
dbs_info->down_skip = 0;
@@ -90,7 +107,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
if (requested_freq == policy->max)
goto out;

-   requested_freq += get_freq_step(cs_tuners, policy);
+   requested_freq += freq_step;
if (requested_freq > policy->max)
requested_freq = policy->max;

@@ -106,14 +123,12 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)

/* Check for frequency decrease */
if (load < cs_tuners->do

Re: [PATCH v2] cpufreq: conservative: Decrease frequency faster when the update deferred

2016-11-14 Thread Stratos Karafotis


On 15/11/2016 12:09 πμ, Rafael J. Wysocki wrote:
> On Mon, Nov 14, 2016 at 10:59 PM, Rafael J. Wysocki  wrote:
>> On Mon, Nov 14, 2016 at 10:46 PM, Stratos Karafotis
>>  wrote:
>>>
>>>
>>> On 14/11/2016 10:44 μμ, Rafael J. Wysocki wrote:
>>>> On Sat, Nov 12, 2016 at 10:04 PM, Stratos Karafotis
>>>>  wrote:
>>>>> Conservative governor changes the CPU frequency in steps.
>>>>> That means that if a CPU runs at max frequency, it will need several
>>>>> sampling periods to return to min frequency when the workload
>>>>> is finished.
>>>>>
>>>>> If the update function that calculates the load and target frequency
>>>>> is deferred, the governor might need even more time to decrease the
>>>>> frequency.
>>>>>
>>>>> This may have impact to power consumption and after all conservative
>>>>> should decrease the frequency if there is no workload at every sampling
>>>>> rate.
>>>>>
>>>>> To resolve the above issue calculate the number of sampling periods
>>>>> that the update is deferred. Considering that for each sampling period
>>>>> conservative should drop the frequency by a freq_step because the
>>>>> CPU was idle apply the proper subtraction to requested frequency.
>>>>>
>>>>> Below, the kernel trace with and without this patch. First an
>>>>> intensive workload is applied on a specific CPU. Then the workload
>>>>> is removed and the CPU goes to idle.
>>>>>
>>>>> WITHOUT
>>>>>
>>>>>  -0 [007] dN..   620.329153: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.350857: cpu_frequency: state=170 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.370856: cpu_frequency: state=190 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.390854: cpu_frequency: state=210 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.411853: cpu_frequency: state=220 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.432854: cpu_frequency: state=240 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.453854: cpu_frequency: state=260 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.494856: cpu_frequency: state=290 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.515856: cpu_frequency: state=310 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.536858: cpu_frequency: state=330 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    620.557857: cpu_frequency: state=3401000 
>>>>> cpu_id=7
>>>>>  -0 [007] d...   669.591363: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   669.591939: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>>  -0 [007] d...   669.591980: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] dN..   669.591989: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> ...
>>>>>  -0 [007] d...   670.201224: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   670.221975: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    670.222016: cpu_frequency: state=330 
>>>>> cpu_id=7
>>>>>  -0 [007] d...   670.222026: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   670.234964: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> ...
>>>>>  -0 [007] d...   670.801251: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   671.236046: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    671.236073: cpu_frequency: state=310 
>>>>> cpu_id=7
>>>>>  -0 [007] d...   671.236112: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   671.393437: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> ...
>>>>>  -0 [007] d...   671.401277: cpu_idle: state=4 cpu_id=7
>>>>>  -0 [007] d...   671.404083: cpu_idle: state=4294967295 
>>>>> cpu_id=7
>>>>> kworker/7:2-556   [007]    671.404111: cpu_frequency: stat

Re: [PATCH v2] cpufreq: conservative: Decrease frequency faster when the update deferred

2016-11-14 Thread Stratos Karafotis


On 14/11/2016 10:44 μμ, Rafael J. Wysocki wrote:
> On Sat, Nov 12, 2016 at 10:04 PM, Stratos Karafotis
>  wrote:
>> Conservative governor changes the CPU frequency in steps.
>> That means that if a CPU runs at max frequency, it will need several
>> sampling periods to return to min frequency when the workload
>> is finished.
>>
>> If the update function that calculates the load and target frequency
>> is deferred, the governor might need even more time to decrease the
>> frequency.
>>
>> This may have impact to power consumption and after all conservative
>> should decrease the frequency if there is no workload at every sampling
>> rate.
>>
>> To resolve the above issue calculate the number of sampling periods
>> that the update is deferred. Considering that for each sampling period
>> conservative should drop the frequency by a freq_step because the
>> CPU was idle apply the proper subtraction to requested frequency.
>>
>> Below, the kernel trace with and without this patch. First an
>> intensive workload is applied on a specific CPU. Then the workload
>> is removed and the CPU goes to idle.
>>
>> WITHOUT
>>
>>  -0 [007] dN..   620.329153: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.350857: cpu_frequency: state=170 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.370856: cpu_frequency: state=190 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.390854: cpu_frequency: state=210 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.411853: cpu_frequency: state=220 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.432854: cpu_frequency: state=240 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.453854: cpu_frequency: state=260 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.494856: cpu_frequency: state=290 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.515856: cpu_frequency: state=310 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.536858: cpu_frequency: state=330 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.557857: cpu_frequency: state=3401000 
>> cpu_id=7
>>  -0 [007] d...   669.591363: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   669.591939: cpu_idle: state=4294967295 
>> cpu_id=7
>>  -0 [007] d...   669.591980: cpu_idle: state=4 cpu_id=7
>>  -0 [007] dN..   669.591989: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   670.201224: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   670.221975: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    670.222016: cpu_frequency: state=330 
>> cpu_id=7
>>  -0 [007] d...   670.222026: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   670.234964: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   670.801251: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.236046: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.236073: cpu_frequency: state=310 
>> cpu_id=7
>>  -0 [007] d...   671.236112: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.393437: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   671.401277: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.404083: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.404111: cpu_frequency: state=290 
>> cpu_id=7
>>  -0 [007] d...   671.404125: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.404974: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   671.501180: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.995414: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.995459: cpu_frequency: state=280 
>> cpu_id=7
>>  -0 [007] d...   671.995469: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.996287: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   672.001305: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   672.078374: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    672.078410: cpu_frequency: state=260 
>> cpu_id=7
>>  -0 [007] d...   672.078419: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   672.158020: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    672.158040: cpu_frequency: state=240 

[PATCH v2] cpufreq: conservative: Decrease frequency faster when the update deferred

2016-11-12 Thread Stratos Karafotis
   [007]    672.823095: cpu_frequency: state=160 cpu_id=7

WITH

 -0 [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4380.949767: cpu_frequency: state=200 cpu_id=7
kworker/7:2-399   [007]   4380.969765: cpu_frequency: state=220 cpu_id=7
kworker/7:2-399   [007]   4381.009766: cpu_frequency: state=250 cpu_id=7
kworker/7:2-399   [007]   4381.029767: cpu_frequency: state=260 cpu_id=7
kworker/7:2-399   [007]   4381.049769: cpu_frequency: state=280 cpu_id=7
kworker/7:2-399   [007]   4381.069769: cpu_frequency: state=300 cpu_id=7
kworker/7:2-399   [007]   4381.089771: cpu_frequency: state=310 cpu_id=7
kworker/7:2-399   [007]   4381.109772: cpu_frequency: state=340 cpu_id=7
kworker/7:2-399   [007]   4381.129773: cpu_frequency: state=3401000 cpu_id=7
 -0 [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
 -0 [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
 -0 [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.649268: cpu_frequency: state=280 cpu_id=7
 -0 [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.801748: cpu_frequency: state=170 cpu_id=7
 -0 [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4429.086293: cpu_frequency: state=160 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis 
---
 v1 -> v2
- Use correct terminology in change log
- Change the member variable name from 'deferred_periods' to 'idle_periods'
- Fix format issue

 drivers/cpufreq/cpufreq_conservative.c | 14 +-
 drivers/cpufreq/cpufreq_governor.c | 18 +-
 drivers/cpufreq/cpufreq_governor.h |  1 +
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index fa5ece3..d787772 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -73,7 +73,19 @@ static unsigned int cs_dbs_update(struct cpufreq_policy 
*policy)
 */
if (cs_tuners->freq_step == 0)
goto out;
-
+   /*
+* Decrease requested_freq for each idle period that we didn't
+* update the frequency
+*/
+   if (policy_dbs->idle_periods < UINT_MAX) {
+   unsigned int freq_target = policy_dbs->idle_periods *
+   get_freq_target(cs_tuners, policy);
+   if (requested_freq > freq_target)
+   requested_freq -= freq_target;
+   else
+   requested_freq = policy->min;
+   policy_dbs->idle_periods = UINT_MAX;
+   }
/*
 * If requested_freq is out of range, it is likely that the limits
 * changed in the meantime, so fall back to current frequency in that
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 3729474..1bc7137 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -117,7 +117,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct dbs_data *dbs_data = policy_dbs->dbs_data;
unsigned int ignore_nice = dbs_data->ignore_nice_load;
-   unsigned int max_load = 0;
+   unsigned int max_load = 0, idle_periods = UINT_MAX;
unsigned int sampling_rate, io_busy, j;

/*
@@ -163,8 +163,12 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 * calls, so the previous load value can be used then.
 */
load = j_cdbs->prev_load;
-   } else if (unlikely(time_elapsed > 2 * sampling_rate &&
-   j_cdbs->prev_load)) {
+   } else if (unlikely(time_elapsed > 2 * sampling_rate)) {
+   unsigned int periods = time_elapsed / sampling_rate;
+
+   if (

Re: [Resend][PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-11-10 Thread Stratos Karafotis


On 10/11/2016 02:13 πμ, Rafael J. Wysocki wrote:
> On Wed, Nov 9, 2016 at 6:55 AM, Viresh Kumar  wrote:
>> On 08-11-16, 21:25, Stratos Karafotis wrote:
>>> But this is the supposed behaviour of conservative governor. We want
>>> the CPU to increase the frequency in steps. The patch just resets
>>> the frequency to a lower frequency in case of idle.
>>>
>>> For argument's sake, let's assume that the governor timer is never
>>> deferred and runs every sampling period even on completely idle CPU.
>>
>> There are no timers now :)
>>
>>> And let's assume, for example, a burst load that runs every 100ms
>>> for 20ms. The default sampling rate is also 20ms.
>>> What would conservative do in case of that burst load? It would
>>> increase the frequency by one freq step after 20ms and then it would
>>> decrease the frequency 4 times by one frequency step. Most probably
>>> on the next burst load, the CPU will run on min frequency.
>>>
>>> I agree that maybe this is not ideal for performance but maybe this is
>>> how we want conservative governor to work (lazily increase and decrease
>>> frequency).
>>
>> Idle periods are already accounted for while calculating system load by 
>> legacy
>> governors.
>>
>> And the more and more I think about this, I am inclined towards your patch.
>> Maybe in a bit different form and commit log.
>>
>> If we see how the governors were written initially, there were no deferred
>> timers. And so even if CPUs were idle, we will wake up to adjust the step.
>>
>> Even if we want to make the behavior similar to that, then also we should
>> account of missed sampling periods both while decreasing or increasing
>> frequencies.
>>
>> @Rafael: What do you think ?
> 
> It looks like the issue with the conservative governor is real, but
> I'm a bit concerned about adding things to use by one particular
> governor only to cpufreq_governor.c.

I think the code is minimum and I didn't find a way to do this
calculation in cpufreq_conservative.c. We also use code in
cpufreq_governor.c that it's only specific to ondemand (io_busy).

If you can give me a hint about how to implement this logic in
cpufreq_conservative I would appreciate it.

> Apart from the timer-related terminology that is not applicable any
> more, of course.

I will correct the terminology if the logic is accepted.


Regards,
Stratos



Re: [Resend][PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-11-09 Thread Stratos Karafotis


On 09/11/2016 07:55 πμ, Viresh Kumar wrote:
> On 08-11-16, 21:25, Stratos Karafotis wrote:
>> But this is the supposed behaviour of conservative governor. We want
>> the CPU to increase the frequency in steps. The patch just resets
>> the frequency to a lower frequency in case of idle.
>>
>> For argument's sake, let's assume that the governor timer is never
>> deferred and runs every sampling period even on completely idle CPU.
> 
> There are no timers now :)
> 
>> And let's assume, for example, a burst load that runs every 100ms
>> for 20ms. The default sampling rate is also 20ms.
>> What would conservative do in case of that burst load? It would
>> increase the frequency by one freq step after 20ms and then it would
>> decrease the frequency 4 times by one frequency step. Most probably
>> on the next burst load, the CPU will run on min frequency.
>>
>> I agree that maybe this is not ideal for performance but maybe this is
>> how we want conservative governor to work (lazily increase and decrease
>> frequency).
> 
> Idle periods are already accounted for while calculating system load by legacy
> governors.
> 
> And the more and more I think about this, I am inclined towards your patch.
> Maybe in a bit different form and commit log.
> 
> If we see how the governors were written initially, there were no deferred
> timers. And so even if CPUs were idle, we will wake up to adjust the step.
> 
> Even if we want to make the behavior similar to that, then also we should
> account of missed sampling periods both while decreasing or increasing
> frequencies.

I have already tested a patch that tries to account the missed sampling
periods in load calculation and I will submit it, but I thought that
conservative should drop the frequency after the timer (or the update)
is deferred. This is the reason I first submitted this patch.


Regards,
Stratos


Re: [Resend][PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-11-08 Thread Stratos Karafotis
On 08/11/2016 10:32 πμ, Viresh Kumar wrote:
> On 8 November 2016 at 12:49, Stratos Karafotis  wrote:
>> I think we shouldn't. That's why the patch first decreases the frequency
>> by n freq steps (where n the number of deferred periods).
>> Then the normal processing takes place.
> 
> The problem that I see is that the new algorithm will reduce the
> frequency even if we are
> on a ramp up phase.
> 
> For example consider this case:
> 
> - We have a special load running, that runs in bursts. i.e. runs for
> some time, lets the CPU idle
> then and then again runs.
> 
> - To run the load properly, we need to ramp up the frequency
> 
> - But the new algorithm can make the frequency stagnant in this case.
> i.e. because of the idle
> period you may want to decrease the frequency by delta A and then the
> regular algorithm may
> want to increase it by same delta A.
> 
> That's why I was asking to adopt this only in the ramp down path.
> 

But this is the supposed behaviour of conservative governor. We want
the CPU to increase the frequency in steps. The patch just resets
the frequency to a lower frequency in case of idle.

For argument's sake, let's assume that the governor timer is never
deferred and runs every sampling period even on completely idle CPU.
And let's assume, for example, a burst load that runs every 100ms
for 20ms. The default sampling rate is also 20ms.
What would conservative do in case of that burst load? It would
increase the frequency by one freq step after 20ms and then it would
decrease the frequency 4 times by one frequency step. Most probably
on the next burst load, the CPU will run on min frequency.

I agree that maybe this is not ideal for performance but maybe this is
how we want conservative governor to work (lazily increase and decrease
frequency).


Regards,
Stratos


Re: [Resend][PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-11-07 Thread Stratos Karafotis
Hi,

Thanks for reviewing.


On 07/11/2016 08:12 πμ, Viresh Kumar wrote:
> For the record, I have never got the original mail with this subject.

I'm sorry for inconvenience. It seems that I had an issue on my mail
server.

> On 06-11-16, 11:19, Stratos Karafotis wrote:
>> Conservative governor changes the CPU frequency in steps.
>> That means that if a CPU runs at max frequency, it will need several
>> sampling periods to return at min frequency when the workload
>> is finished.
>>
>> If the timer that calculates the load and target frequency is deferred,
>> the governor might need even more time to decrease the frequency.
>>
>> This may have impact to power consumption and after all conservative
>> should decrease the frequency if there is no workload every sampling
>> rate.
>>
>> To resolve the above issue calculate the number of sampling periods
>> that the timer deferred. Considering that for each sampling period
>> conservative should drop the frequency by a freq_step because the
>> CPU was idle apply the proper subtraction to requested frequency.
>>
>> Below, the kernel trace with and without this patch. First an
>> intensive workload is applied on a specific CPU. Then the workload
>> is removed and the CPU goes to idle.
>>
>> WITHOUT
>> ---
>>  -0 [007] dN..   620.329153: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.350857: cpu_frequency: state=170 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.370856: cpu_frequency: state=190 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.390854: cpu_frequency: state=210 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.411853: cpu_frequency: state=220 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.432854: cpu_frequency: state=240 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.453854: cpu_frequency: state=260 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.494856: cpu_frequency: state=290 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.515856: cpu_frequency: state=310 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.536858: cpu_frequency: state=330 
>> cpu_id=7
>> kworker/7:2-556   [007]    620.557857: cpu_frequency: state=3401000 
>> cpu_id=7
>>  -0 [007] d...   669.591363: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   669.591939: cpu_idle: state=4294967295 
>> cpu_id=7
>>  -0 [007] d...   669.591980: cpu_idle: state=4 cpu_id=7
>>  -0 [007] dN..   669.591989: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   670.201224: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   670.221975: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    670.222016: cpu_frequency: state=330 
>> cpu_id=7
>>  -0 [007] d...   670.222026: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   670.234964: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   670.801251: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.236046: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.236073: cpu_frequency: state=310 
>> cpu_id=7
>>  -0 [007] d...   671.236112: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.393437: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   671.401277: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.404083: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.404111: cpu_frequency: state=290 
>> cpu_id=7
>>  -0 [007] d...   671.404125: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.404974: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   671.501180: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.995414: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    671.995459: cpu_frequency: state=280 
>> cpu_id=7
>>  -0 [007] d...   671.995469: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   671.996287: cpu_idle: state=4294967295 
>> cpu_id=7
>> ...
>>  -0 [007] d...   672.001305: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d...   672.078374: cpu_idle: state=4294967295 
>> cpu_id=7
>> kworker/7:2-556   [007]    672.078410: cpu_frequency: state=260 
>> cpu_id=7
>>  -0 [007] d...   672.078419: cpu_idle: state=4 cpu_id=7
>>  -0 [007] d..

[Resend][PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-11-06 Thread Stratos Karafotis
]    672.823095: cpu_frequency: state=160 cpu_id=7

WITH

 -0 [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4380.949767: cpu_frequency: state=200 cpu_id=7
kworker/7:2-399   [007]   4380.969765: cpu_frequency: state=220 cpu_id=7
kworker/7:2-399   [007]   4381.009766: cpu_frequency: state=250 cpu_id=7
kworker/7:2-399   [007]   4381.029767: cpu_frequency: state=260 cpu_id=7
kworker/7:2-399   [007]   4381.049769: cpu_frequency: state=280 cpu_id=7
kworker/7:2-399   [007]   4381.069769: cpu_frequency: state=300 cpu_id=7
kworker/7:2-399   [007]   4381.089771: cpu_frequency: state=310 cpu_id=7
kworker/7:2-399   [007]   4381.109772: cpu_frequency: state=340 cpu_id=7
kworker/7:2-399   [007]   4381.129773: cpu_frequency: state=3401000 cpu_id=7
 -0 [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
 -0 [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
 -0 [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.649268: cpu_frequency: state=280 cpu_id=7
 -0 [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.801748: cpu_frequency: state=170 cpu_id=7
 -0 [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4429.086293: cpu_frequency: state=160 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq_conservative.c |  9 +
 drivers/cpufreq/cpufreq_governor.c | 18 +-
 drivers/cpufreq/cpufreq_governor.h |  1 +
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 1347589..07dac72 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -74,6 +74,15 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy 
*policy)
if (cs_tuners->freq_step == 0)
goto out;

+   if (policy_dbs->deferred_periods < UINT_MAX) {
+   unsigned int freq_target = policy_dbs->deferred_periods *
+   get_freq_target(cs_tuners, policy);
+   if (requested_freq > freq_target)
+   requested_freq -= freq_target;
+   else
+   requested_freq = policy->min;
+   policy_dbs->deferred_periods = UINT_MAX;
+   }
/*
 * If requested_freq is out of range, it is likely that the limits
 * changed in the meantime, so fall back to current frequency in that
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 642dd0f..0d498a0 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -117,7 +117,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct dbs_data *dbs_data = policy_dbs->dbs_data;
unsigned int ignore_nice = dbs_data->ignore_nice_load;
-   unsigned int max_load = 0;
+   unsigned int max_load = 0, deferred_periods = UINT_MAX;
unsigned int sampling_rate, io_busy, j;

/*
@@ -163,8 +163,13 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 * calls, so the previous load value can be used then.
 */
load = j_cdbs->prev_load;
-   } else if (unlikely(time_elapsed > 2 * sampling_rate &&
-   j_cdbs->prev_load)) {
+   } else if (unlikely(time_elapsed > 2 * sampling_rate)) {
+   unsigned int periods = time_elapsed / sampling_rate;
+
+   if (periods < deferred_periods)
+   deferred_periods = periods;
+
+   if (j_cdbs->prev_load) {
/*
 * If the CPU had gone completely idle and a task has
 * just woken up on this CPU now, it would be unfair to
@@ -189,8 +194,9 

[PATCH] cpufreq: conservative: Decrease frequency faster when the timer deferred

2016-10-23 Thread Stratos Karafotis
]    672.823095: cpu_frequency: state=160 cpu_id=7

WITH

 -0 [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4380.949767: cpu_frequency: state=200 cpu_id=7
kworker/7:2-399   [007]   4380.969765: cpu_frequency: state=220 cpu_id=7
kworker/7:2-399   [007]   4381.009766: cpu_frequency: state=250 cpu_id=7
kworker/7:2-399   [007]   4381.029767: cpu_frequency: state=260 cpu_id=7
kworker/7:2-399   [007]   4381.049769: cpu_frequency: state=280 cpu_id=7
kworker/7:2-399   [007]   4381.069769: cpu_frequency: state=300 cpu_id=7
kworker/7:2-399   [007]   4381.089771: cpu_frequency: state=310 cpu_id=7
kworker/7:2-399   [007]   4381.109772: cpu_frequency: state=340 cpu_id=7
kworker/7:2-399   [007]   4381.129773: cpu_frequency: state=3401000 cpu_id=7
 -0 [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
 -0 [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
 -0 [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.649268: cpu_frequency: state=280 cpu_id=7
 -0 [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4428.801748: cpu_frequency: state=170 cpu_id=7
 -0 [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
 -0 [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
 -0 [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007]   4429.086293: cpu_frequency: state=160 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq_conservative.c |  9 +
 drivers/cpufreq/cpufreq_governor.c | 18 +-
 drivers/cpufreq/cpufreq_governor.h |  1 +
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 1347589..07dac72 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -74,6 +74,15 @@ static unsigned int cs_dbs_timer(struct cpufreq_policy 
*policy)
if (cs_tuners->freq_step == 0)
goto out;

+   if (policy_dbs->deferred_periods < UINT_MAX) {
+   unsigned int freq_target = policy_dbs->deferred_periods *
+   get_freq_target(cs_tuners, policy);
+   if (requested_freq > freq_target)
+   requested_freq -= freq_target;
+   else
+   requested_freq = policy->min;
+   policy_dbs->deferred_periods = UINT_MAX;
+   }
/*
 * If requested_freq is out of range, it is likely that the limits
 * changed in the meantime, so fall back to current frequency in that
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 642dd0f..0d498a0 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -117,7 +117,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
struct policy_dbs_info *policy_dbs = policy->governor_data;
struct dbs_data *dbs_data = policy_dbs->dbs_data;
unsigned int ignore_nice = dbs_data->ignore_nice_load;
-   unsigned int max_load = 0;
+   unsigned int max_load = 0, deferred_periods = UINT_MAX;
unsigned int sampling_rate, io_busy, j;

/*
@@ -163,8 +163,13 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 * calls, so the previous load value can be used then.
 */
load = j_cdbs->prev_load;
-   } else if (unlikely(time_elapsed > 2 * sampling_rate &&
-   j_cdbs->prev_load)) {
+   } else if (unlikely(time_elapsed > 2 * sampling_rate)) {
+   unsigned int periods = time_elapsed / sampling_rate;
+
+   if (periods < deferred_periods)
+   deferred_periods = periods;
+
+   if (j_cdbs->prev_load) {
/*
 * If the CPU had gone completely idle and a task has
 * just woken up on this CPU now, it would be unfair to
@@ -189,8 +194,9 

Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes

2016-09-19 Thread Stratos Karafotis
On Mon, Sep 19, 2016 at 7:16 PM, Andreas Herrmann  wrote:
> On Fri, Sep 16, 2016 at 09:58:42PM +0300, Stratos Karafotis wrote:
>> Hi,
>>
>> [ I 'm resending this message, because I think some recipients didn't receive
>> it. ]
>>
>> On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
>> > On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
>> >> On 01-09-16, 15:21, Andreas Herrmann wrote:
>> >>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
>> >
>> >>>> I am _really_ worried about such hacks in drivers to negate the effect 
>> >>>> of
>> a
>> >>>> patch, that was actually good.
>> >>>
>> >>>> Did you try to increase the sampling period of ondemand governor to see 
>> >>>> if
>> that
>> >>>> helps without this patch.
>> >>>
>> >>> With an older kernel I've modified transition_latency of the driver
>> >>> which in turn is used to calculate the sampling rate.
>> >
>> >> Naah, that isn't what I was looking for, sorry
>> >
>> >> To explain it a bit more, this is what the patch did.
>> >
>> >> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
>> >> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
>> >> sampling rate and system load tries to change the frequency of the
>> >> underlying hardware and select one of those.
>> >
>> >> Before the original patch came in, F2 and F3 were never getting
>> >> selected and the system was stuck in F1 for a long time.
>> >
>> > I think this is not a general statement. Such a behaviour is not
>> > common to all systems. Before commit 6393d6a target frequency was
>> > based on
>> >
>> >freq_next = load * policy->cpuinfo.max_freq / 100;
>> >
>> > F2 would have been selected if
>> >
>> >   load = F2 * 100 / F7
>> >
>> > If F2 was not seen it can mean
>> >
>> > (1) either the load value was not hit in practice during monitoring of
>> > a certain workload
>> >
>> > (2) or the calculated load value (in integer representation) would
>> > select F1 or F3 (there is no corresponding integer value that
>> > would select F2)
>> >
>> > E.g. for the Intel i7-3770 system mentioned in commit message for
>> > 6393d6a I think a load value of 49 should have selected 170 which
>> > is not shown in the provided frequency table.
>>
>> I think this is not true, because before the specific patch the relation
>> of frequency selection was CPUFREQ_RELATION_L. This is the reason
>> that a load value of 49 with a freq_next 1666490 would have a
>> target frequency 160.
>
> Hmm...
> CPUFREQ_RELATION_L should select "lowest frequency at or above target"
> being 170 in this case. Otherwise (if it would select "highest
> frequency at or below target") this would imply that load values of
> 50, 51, 52 should select 170 which would contradict what was
> written in commit message of 6393d6a1.

Yes, you are right. I'm sorry for the noise.

> In any case probability of seeing such load values and thus selecting
> a frequency of 170 is quite low. So I fully understand why the
> patch was introduced.

>> > What essentially changed was how load values are mapped to target
>> > frequencies. For the HP system (min_freq=120, max_freq=280)
>> > that I used in my tests, the old code would create following mapping:
>> >
>> > load | freq_next | used target frequency
>> > 
>> > 0  0120
>> > 10 28   120
>> > 20 56   120
>> > 30 84   120
>> > 40 112  120
>> > 42 1176000  120
>> > 43 1204000  1204000
>> > 50 140  140
>> > 60 168  168
>> > 70 196  196
>> > 80 224  224
>> > 90 252  252
>> > 100280  280
>> >
>> > The new code (introduced with commit 6393d6a) changed the mapping as
>> > follows:
>> >
>> > load | freq_next | used target frequency
>> > 
>> > 0  120  120
>> > 10 136136

Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes

2016-09-16 Thread Stratos Karafotis
Hi,

On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
> On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
>> On 01-09-16, 15:21, Andreas Herrmann wrote:
>>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> 
 I am _really_ worried about such hacks in drivers to negate the effect of a
 patch, that was actually good.
>>>
 Did you try to increase the sampling period of ondemand governor to see if 
 that
 helps without this patch.
>>>
>>> With an older kernel I've modified transition_latency of the driver
>>> which in turn is used to calculate the sampling rate.
> 
>> Naah, that isn't what I was looking for, sorry :(
> 
>> To explain it a bit more, this is what the patch did.
> 
>> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
>> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
>> sampling rate and system load tries to change the frequency of the
>> underlying hardware and select one of those.
> 
>> Before the original patch came in, F2 and F3 were never getting
>> selected and the system was stuck in F1 for a long time.
> 
> I think this is not a general statement. Such a behaviour is not
> common to all systems. Before commit 6393d6a target frequency was
> based on
> 
>freq_next = load * policy->cpuinfo.max_freq / 100;
> 
> F2 would have been selected if
> 
>   load = F2 * 100 / F7
> 
> If F2 was not seen it can mean
> 
> (1) either the load value was not hit in practice during monitoring of
> a certain workload
> 
> (2) or the calculated load value (in integer representation) would
> select F1 or F3 (there is no corresponding integer value that
> would select F2)
> 
> E.g. for the Intel i7-3770 system mentioned in commit message for
> 6393d6a I think a load value of 49 should have selected 170 which
> is not shown in the provided frequency table.

I think this is not true, because before the specific patch the relation
of frequency selection was CPUFREQ_RELATION_L. This is the reason
that a load value of 49 with a freq_next 1666490 would have a
target frequency 160.

> 
> What essentially changed was how load values are mapped to target
> frequencies. For the HP system (min_freq=120, max_freq=280)
> that I used in my tests, the old code would create following mapping:
> 
> load | freq_next | used target frequency
> 
> 0  0120
> 10 28   120
> 20 56   120
> 30 84   120
> 40 112  120
> 42 1176000  120
> 43 1204000  1204000
> 50 140  140
> 60 168  168
> 70 196  196
> 80 224  224
> 90 252  252
> 100280  280
> 
> The new code (introduced with commit 6393d6a) changed the mapping as
> follows:
> 
> load | freq_next | used target frequency
> 
> 0  120  120
> 10 136136
> 20 152152
> 30 168168
> 40 184184
> 42 18720001872000
> 43 18880001888000
> 50 200200
> 60 216216
> 70 232232
> 80 248248
> 90 264264
> 100280280
> 
> My patch creates a third mapping. It basically ensures that up to a
> load value of 42 the minimum frequency is used.
> 
>> Which will decrease the performance for that period of time as we
>> should have switched to a higher frequency really.
> 
> I am not sure whether it's really useful for all systems using
> ondemand governor to increase frequency above min_freq even if load is
> just above 0. Of course expectation is that performance will be equal
> or better than before. But how overall power consumption changes
> depends on the hardware and its power saving capabilites.
> 
>   ---8<---
> 
>>> My understanding is that the original commit was tested with certain
>>> combinations of hardware and cpufreq-drivers and the claim was that
>>> for those (two?) tested combinations performance increased and power
>>> consumption was lower. So I am not so sure what to expect from all
>>> other cpufreq-driver/hardware combinations.
> 
>> It was principally the right thing to do IMO. And I don't think any
>> other hardware should get affected badly. At the max, the tuning needs
>> to be made a bit better.
> 
>   ---8<---
> 
> It seems that the decision how to best map load values to target
> frequencies is kind of hardware specific.
> 
> Maybe a solution to this is that the cpufreq driver should be able to
> provide a mapping function to overwrite the current default
> calculation.
> 

I'm not familiar with ppc-cpufreq drive but maybe patch 6393d6 just
uncovered an "issue" that was already existed but only on higher loads.

Because, w

Re: [PATCH 0/2] cpufreq: ondemand: Eliminate the deadband effect

2014-07-23 Thread Stratos Karafotis

On 07/23/2014 02:50 AM, Rafael J. Wysocki wrote:

On Monday, June 30, 2014 07:59:32 PM Stratos Karafotis wrote:

Hi all,

This patchset changes slightly the calculation of target frequency to
eliminate the deadband effect (explained in patch 2 changelog) that it
seems to slow down the CPU in low and medium loads.

Patch 1 introduces a new relation (RELATION_C) for the next frequency
selection, which chooses the closest frequency to target.

Patch 2 is the actual change to ondemand governor.

You may find graphs with the 'deadband' effect and benchmark results:
https://docs.google.com/spreadsheets/d/16kDBh5lyc6YvBnoS1hUa1t2O38z0xrWvaEj5XtJ8auw/edit#gid=2072493052

Thanks


Stratos Karafotis (2):
   cpufreq: Introduce new relation for freq selection
   cpufreq: ondemand: Eliminate the deadband effect in low loads

  drivers/cpufreq/cpufreq_ondemand.c | 11 +++
  drivers/cpufreq/freq_table.c   | 12 +++-
  include/linux/cpufreq.h|  1 +
  3 files changed, 19 insertions(+), 5 deletions(-)


OK, I've queued up these two patches for 3.17.  We'll see if anyone sees
any problems related to them.  Thanks!



Thank you!

Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] cpufreq: ondemand: Eliminate the deadband effect

2014-07-20 Thread Stratos Karafotis
On 21/07/2014 12:51 πμ, Pavel Machek wrote:
> Hi!
> 
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
>> (Android smartphone).
>> Benchmarks on Intel i7 shows a performance improvement on low and medium
>> work loads with lower power consumption. Specifics:
>>
>> Phoronix Linux Kernel Compilation 3.1:
>> Time: -0.40%, energy: -0.07%
>> Phoronix Apache:
>> Time: -4.98%, energy: -2.35%
>> Phoronix FFMPEG:
>> Time: -6.29%, energy: -4.02%
>
> Hmm. Intel i7 should be race-to-idle machine. So basically rule like
> if (load > 0) go to max frequency else go to lowest frequency would do
> the right thing in your test, right?

 I don't think that "if (load > 0) go to max" will work even on i7.
 For low load this will have impact on energy consumption.
>>>
>>> Are you sure? CPU frequency should not matter on idle CPU.
>>
>> Even on a totally idle CPU there will be a small impact because of leakage
>> current (thanks to Dirk Brandewie for this info).
> 
> Are you sure? IIRC Intel cpus will automatically lower CPU frequency
> in deep C states..

I'm sorry. I don't know further details about the leakage current
in deeper C states.

>> This simple test on a nearly idle system shows this:
>>
>> [root@albert cpufreq]# for CPUFREQ in 
>> /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || 
>> continue; echo -n performance > $CPUFREQ; done
>> [root@albert cpufreq]# 
>> /home/stratosk/kernels/linux-pm/tools/power/x86/turbostat/turbostat -J sleep 
>> 20
>> Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  CPU%c3  
>> CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7   Pkg_J   
>> Cor_J   GFX_J   time
>>-   -   20.0627123392   00.300.00   
>> 99.630.00  34  348.090.00   81.940.00  380.41   
>> 14.511.64   20.00
>>0   0   00.0218913392   00.090.00   
>> 99.880.00  34  348.090.00   81.940.00  380.41   
>> 14.511.64   20.00
>>0   4   10.0430063392   00.07
>>1   1   10.0425013392   00.620.00   
>> 99.330.00  34
>>1   5   00.0123463392   00.66
>>2   2   00.0119963392   00.440.00   
>> 99.550.00  34
>>2   6   40.1822783392   00.26
>>3   3   50.1534493392   00.070.01   
>> 99.770.00  34
>>3   7   00.0118393392   00.21
>> 20.000899 sec
>> [root@albert cpufreq]# ^C
>> [root@albert cpufreq]# for CPUFREQ in 
>> /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || 
>> continue; echo -n ondemand > $CPUFREQ; done
>> [root@albert cpufreq]# 
>> /home/stratosk/kernels/linux-pm/tools/power/x86/turbostat/turbostat -J sleep 
>> 20
>> Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  CPU%c3  
>> CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7   Pkg_J   
>> Cor_J   GFX_J   time
>>-   -   20.0916933392   00.350.01   
>> 99.550.00  35  368.330.00   84.310.00  377.68   
>> 12.231.15   20.00
>>0   0   10.0816033392   00.130.00   
>> 99.790.00  35  368.330.00   84.310.00  377.68   
>> 12.231.15   20.00
>>0   4   10.0816463392   00.13
>>1   1   10.0616473392   00.660.00   
>> 99.280.00  35
>>1   5   00.0116113392   00.71
>>2   2   00.0216173392   00.500.02   
>> 99.460.00  35
>>2   6   40.2217643392   00.30
>>3   3   40.2517013392   00.070.00   
>> 99.680.00  35
>>3   7   00.0116023392   00.31
>> 20.001580 sec
>>
>>
>> So, for low loads the impact will be higher.
> 
> So it seems ondemand saves cca 1% of energy?

Yes, in this small test, on my nearly "idle" system.


Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] cpufreq: ondemand: Eliminate the deadband effect

2014-07-13 Thread Stratos Karafotis
Hi Doug,

On 12/07/2014 06:45 μμ, Doug Smythies wrote:
> 
> On 2014.07.30 10:00 Stratos Karafotis wrote:
> 
>> This patchset changes slightly the calculation of target frequency to
>> eliminate the deadband effect (explained in patch 2 changelog) that it
>> seems to slow down the CPU in low and medium loads.
>>
>> Patch 1 introduces a new relation (RELATION_C) for the next frequency
>> selection, which chooses the closest frequency to target.
>>
>> Patch 2 is the actual change to ondemand governor.
> 
>> You may find graphs with the 'deadband' effect and benchmark results:
>> https://docs.google.com/spreadsheets/d/16kDBh5lyc6YvBnoS1hUa1t2O38z0xrWvaEj5XtJ8auw/edit#gid=2072493052
> 
> I did the same benchmark tests before (without) and after (with) this patch 
> set on my i7-2600K system.
> I added the results, which are similar to Stratos', under a new "benchmark" 
> tab on the spreadsheet.

Thank you very much for your benchmarks!

Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] cpufreq: ondemand: Eliminate the deadband effect

2014-07-11 Thread Stratos Karafotis
On 11/07/2014 09:34 μμ, Pavel Machek wrote:
> On Fri 2014-07-11 20:29:57, Stratos Karafotis wrote:
>> Hi Pavel!
>>
>> On 11/07/2014 07:57 μμ, Pavel Machek wrote:
>>> Hi!
>>>
>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
>>>> (Android smartphone).
>>>> Benchmarks on Intel i7 shows a performance improvement on low and medium
>>>> work loads with lower power consumption. Specifics:
>>>>
>>>> Phoronix Linux Kernel Compilation 3.1:
>>>> Time: -0.40%, energy: -0.07%
>>>> Phoronix Apache:
>>>> Time: -4.98%, energy: -2.35%
>>>> Phoronix FFMPEG:
>>>> Time: -6.29%, energy: -4.02%
>>>
>>> Hmm. Intel i7 should be race-to-idle machine. So basically rule like
>>> if (load > 0) go to max frequency else go to lowest frequency would do
>>> the right thing in your test, right?
>>
>> I don't think that "if (load > 0) go to max" will work even on i7.
>> For low load this will have impact on energy consumption.
> 
> Are you sure? CPU frequency should not matter on idle CPU.

Even on a totally idle CPU there will be a small impact because of leakage
current (thanks to Dirk Brandewie for this info).

This simple test on a nearly idle system shows this:

[root@albert cpufreq]# for CPUFREQ in 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || 
continue; echo -n performance > $CPUFREQ; done
[root@albert cpufreq]# 
/home/stratosk/kernels/linux-pm/tools/power/x86/turbostat/turbostat -J sleep 20
Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  CPU%c3  
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7   Pkg_J   Cor_J  
 GFX_J   time
   -   -   20.0627123392   00.300.00   
99.630.00  34  348.090.00   81.940.00  380.41   14.51   
 1.64   20.00
   0   0   00.0218913392   00.090.00   
99.880.00  34  348.090.00   81.940.00  380.41   14.51   
 1.64   20.00
   0   4   10.0430063392   00.07
   1   1   10.0425013392   00.620.00   
99.330.00  34
   1   5   00.0123463392   00.66
   2   2   00.0119963392   00.440.00   
99.550.00  34
   2   6   40.1822783392   00.26
   3   3   50.1534493392   00.070.01   
99.770.00  34
   3   7   00.0118393392   00.21
20.000899 sec
[root@albert cpufreq]# ^C
[root@albert cpufreq]# for CPUFREQ in 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $CPUFREQ ] || 
continue; echo -n ondemand > $CPUFREQ; done
[root@albert cpufreq]# 
/home/stratosk/kernels/linux-pm/tools/power/x86/turbostat/turbostat -J sleep 20
Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  CPU%c3  
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7   Pkg_J   Cor_J  
 GFX_J   time
   -   -   20.0916933392   00.350.01   
99.550.00  35  368.330.00   84.310.00  377.68   12.23   
 1.15   20.00
   0   0   10.0816033392   00.130.00   
99.790.00  35  368.330.00   84.310.00  377.68   12.23   
 1.15   20.00
   0   4   10.0816463392   00.13
   1   1   10.0616473392   00.660.00   
99.280.00  35
   1   5   00.0116113392   00.71
   2   2   00.0216173392   00.500.02   
99.460.00  35
   2   6   40.2217643392   00.30
   3   3   40.2517013392   00.070.00   
99.680.00  35
   3   7   00.0116023392   00.31
20.001580 sec


So, for low loads the impact will be higher.
This is the reason that the intel_pstate driver don't use 'performance'
and try to request a low P state when there is no load.

> (Can you try to modify your code and rerun for example the apache
> test?)

Yes, I can do the apache test if the above example is not enough.

>>> So... should we do that, or do we need better benchmark?
>>
>> I'm sorry. I'm not sure I understood exactly what do you mean by "better
>> benchmark".
> 
> I believe that any increase of frequency in frequency will make the
> benchmarks you qouted better (on i7). Actually, you can probably just
> select performance governor...?

Maybe in benchmarks where the CPU load is high. But definitely not, in mp3
decoding and idle system 

Re: [PATCH 2/2] cpufreq: ondemand: Eliminate the deadband effect

2014-07-11 Thread Stratos Karafotis
Hi Pavel!

On 11/07/2014 07:57 μμ, Pavel Machek wrote:
> Hi!
> 
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
>> (Android smartphone).
>> Benchmarks on Intel i7 shows a performance improvement on low and medium
>> work loads with lower power consumption. Specifics:
>>
>> Phoronix Linux Kernel Compilation 3.1:
>> Time: -0.40%, energy: -0.07%
>> Phoronix Apache:
>> Time: -4.98%, energy: -2.35%
>> Phoronix FFMPEG:
>> Time: -6.29%, energy: -4.02%
> 
> Hmm. Intel i7 should be race-to-idle machine. So basically rule like
> if (load > 0) go to max frequency else go to lowest frequency would do
> the right thing in your test, right?

I don't think that "if (load > 0) go to max" will work even on i7.
For low load this will have impact on energy consumption.
On my tests, a simple mp3 decoding (very low load on my machine) have no
difference with and without this patch.

> So... should we do that, or do we need better benchmark?

I'm sorry. I'm not sure I understood exactly what do you mean by "better
benchmark".
Of course, we should do as many benchmarks as we can.
I usually do these 5 sets of benchmarks on my i7 that IMHO give a good
indication about the changes in different CPU loads.

1) Linux kernel compilation (about 85% busy CPU)
2) Apache (about 32% busy CPU)
3) ffmpeg (about 24% busy CPU)
4) mp3 decoding (about 0.3% CPU)
5) Idle system (about 0.06% CPU)

The patch was also tested on a Android smartphone (kernel 3.4). The kernel
distributed to 1000+ users.
Unfortunately I have no benchmarks, but no regressions reported on
consumption. Actually, there reports for better performance and
lower power consumption, but of course we can't rely on these reports. :)


Thanks for your comments!

Stratos


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] cpufreq: ondemand: Eliminate the deadband effect

2014-06-30 Thread Stratos Karafotis
Hi all,

This patchset changes slightly the calculation of target frequency to
eliminate the deadband effect (explained in patch 2 changelog) that it
seems to slow down the CPU in low and medium loads.

Patch 1 introduces a new relation (RELATION_C) for the next frequency
selection, which chooses the closest frequency to target.

Patch 2 is the actual change to ondemand governor.

You may find graphs with the 'deadband' effect and benchmark results:
https://docs.google.com/spreadsheets/d/16kDBh5lyc6YvBnoS1hUa1t2O38z0xrWvaEj5XtJ8auw/edit#gid=2072493052

Thanks


Stratos Karafotis (2):
  cpufreq: Introduce new relation for freq selection
  cpufreq: ondemand: Eliminate the deadband effect in low loads

 drivers/cpufreq/cpufreq_ondemand.c | 11 +++
 drivers/cpufreq/freq_table.c   | 12 +++-
 include/linux/cpufreq.h|  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] cpufreq: ondemand: Eliminate the deadband effect

2014-06-30 Thread Stratos Karafotis
Currently, ondemand calculates the target frequency proportional to load
using the formula:
Target frequency = C * load
where C = policy->cpuinfo.max_freq / 100

Though, in many cases, the minimum available frequency is pretty high and
the above calculation introduces a dead band from load 0 to
100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target
frequency is always calculated to less than policy->cpuinfo.min_freq and
the minimum frequency is selected.

For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 160
and the policy->cpuinfo.max_freq = 340 (without turbo). Thus, the CPU
starts to scale up at a load above 47.
On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000
and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale
at load above 25.

Change the calculation of target frequency to eliminate the above effect using
the formula:

Target frequency = A + B * load
where A = policy->cpuinfo.min_freq and
  B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100

This will map load values 0 to 100 linearly to cpuinfo.min_freq to
cpuinfo.max_freq.

Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the
closest frequency in frequency_table. This is necessary to avoid selection
of minimum frequency only when load equals to 0. It will also help for selection
of frequencies using a more 'fair' criterion.

Tables below show the difference in selected frequency for specific values
of load without and with this patch. On Intel i7-3770 @ 3.40GHz:
Without With
LoadTarget  SelectedTarget  Selected
0   0   160 160 160
5   170050  160 1690050 170
10  340100  160 1780100 170
15  510150  160 1870150 190
20  680200  160 1960200 200
25  850250  160 2050250 210
30  1020300 160 2140300 210
35  1190350 160 2230350 220
40  1360400 160 2320400 240
45  1530450 160 2410450 240
50  1700500 190 2500500 250
55  1870550 190 2590550 260
60  2040600 210 2680600 260
65  2210650 240 2770650 280
70  2380700 240 2860700 280
75  2550750 260 2950750 300
80  2720800 280 3040800 300
85  2890850 290 3130850 310
90  3060900 310 3220900 330
95  3230950 330 3310950 330
100 3401000 3401000 3401000 3401000

On ARM quad core 1500MHz Krait:
Without With
LoadTarget  SelectedTarget  Selected
0   0   384000  384000  384000
5   75600   384000  440400  486000
10  151200  384000  496800  486000
15  226800  384000  553200  594000
20  302400  384000  609600  594000
25  378000  384000  666000  702000
30  453600  486000  722400  702000
35  529200  594000  778800  81
40  604800  702000  835200  81
45  680400  702000  891600  918000
50  756000  81  948000  918000
55  831600  918000  1004400 1026000
60  907200  918000  1060800 1026000
65  982800  1026000 1117200 1134000
70  1058400 1134000 1173600 1134000
75  1134000 1134000 123 1242000
80  1209600 1242000 1286400 1242000
85  1285200 135 1342800 135
90  1360800 1458000 1399200 135
95  1436400 1458000 1455600 1458000
100 1512000 1512000 1512000 1512000

Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
(Android smartphone).
Benchmarks on Intel i7 shows a performance improvement on low and medium
work loads with lower power consumption. Specifics:

Phoronix Linux Kernel Compilation 3.1:
Time: -0.40%, energy: -0.07%
Phoronix Apache:
Time: -4.98%, energy: -2.35%
Phoronix FFMPEG:
Time: -6.29%, energy: -4.02%

Also, running mp3 decoding (very low load) shows no differences with and
without this patch.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq_ondemand.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
b/drivers/cpufreq/cpufreq_ondemand.c
index 18d4091..ad3f38f 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -170,21 +170,24 @@ static void od_check_cpu(int cpu, unsigned int load)
dbs_freq_increase(policy, policy->max);
} else {
/* Calculate the next frequency proportional to load */
-   unsigned int freq_next;
-   freq_next = load * policy->

[PATCH 1/2] cpufreq: Introduce new relation for freq selection

2014-06-30 Thread Stratos Karafotis
Introduce CPUFREQ_RELATION_C for frequency selection.
It selects the frequency with the minimum euclidean distance to target.
In case of equal distance between 2 frequencies, it will select the
greater frequency.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/freq_table.c | 12 +++-
 include/linux/cpufreq.h  |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 1632981..df14766 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -117,7 +117,7 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
.frequency = 0,
};
struct cpufreq_frequency_table *pos;
-   unsigned int freq, i = 0;
+   unsigned int freq, diff, i = 0;
 
pr_debug("request for target %u kHz (relation: %u) for cpu %u\n",
target_freq, relation, policy->cpu);
@@ -127,6 +127,7 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
suboptimal.frequency = ~0;
break;
case CPUFREQ_RELATION_L:
+   case CPUFREQ_RELATION_C:
optimal.frequency = ~0;
break;
}
@@ -168,6 +169,15 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
}
}
break;
+   case CPUFREQ_RELATION_C:
+   diff = abs(freq - target_freq);
+   if (diff < optimal.frequency ||
+   (diff == optimal.frequency &&
+freq > table[optimal.driver_data].frequency)) {
+   optimal.frequency = diff;
+   optimal.driver_data = i;
+   }
+   break;
}
}
if (optimal.driver_data > i) {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index ec4112d..00fad91 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -176,6 +176,7 @@ static inline void disable_cpufreq(void) { }
 
 #define CPUFREQ_RELATION_L 0  /* lowest frequency at or above target */
 #define CPUFREQ_RELATION_H 1  /* highest frequency below or at target */
+#define CPUFREQ_RELATION_C 2  /* closest frequency to target */
 
 struct freq_attr {
struct attribute attr;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] cpufreq: intel_pstate: Avoid duplicate call of intel_pstate_get_scaled_busy

2014-06-14 Thread Stratos Karafotis
On 14/06/2014 06:45 μμ, Doug Smythies wrote:
> I am sorry to be late chiming in on this one.
> 
> On 2014.06.10 09:27 Stratos Karafotis wrote:
>> On 10/06/2014 07:05 μμ, Dirk Brandewie wrote:
>> On 06/09/2014 02:00 PM, Stratos Karafotis wrote:
>>> Store busy_scaled value to avoid to duplicate call of
>>> intel_pstate_get_scaled_busy on every sampling interval.
>>>
>>>
>>> The second call *only* happens if the tracepoint is being used otherwise
>>> the whole function call to  trace_pstate_sample() is a noop.
> 
>> Yes, I'm sorry, I forgot to add this in my changelog. I have written this
>> in cover letter.
>> I made this change mostly to support patch 3/7.
> 
>>> This makes the code less readable IMHO the reader is left wondering
>>> how cpu->sample.busy_scaled was set in intel_pstate_adjust_busy_pstate()
>>>
> 
>> I agree that the the original code is more readable. If we don't care
>> about the small overhead when tracing is on and forget patch 3/7,
>> of course the original code is by far better.
> 
> Actually, when reading the code, I found it odd to call the function
> twice.
> 
> However by far the much more important issue here, in my opinion,
> is that if one is using the tracepoint stuff, then the second call
> to intel_pstate_get_scaled_busy can give a different result than
> the first call. Why? Because "cpu->pstate.current_pstate" may have
> changed between the two calls.
> 
> In the end the user (me in this case) of the tracepoint stuff can
> end up pulling (what's left of) their hair out and going around in
> circles attempting to figure out why doing the so simple math by
> hand doesn't seem to agree with the tracepoint data.

:)

> As a side note: I am now pulling the tracepoint data into a
> spreadsheet and calculating what "scaled" should be myself.
> 

I think you are right. Tracepoint data might be inconsistent.
I will re-submit this patch in v2 series, updating the changelog.

Thanks for pointing this out!

Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-13 Thread Stratos Karafotis
On 13/06/2014 09:49 πμ, Doug Smythies wrote:
> On 2014.06.12 13:03 Rafael J. Wysocki wrote:
>> On Thursday, June 12, 2014 05:35:59 PM Stratos Karafotis wrote:
>>> On 12/06/2014 12:15 πμ, Doug Smythies wrote:
>>>>
>>>>
>>>> On 2014.06.11 13:20 Stratos Karafotis wrote:
>>>>> On 11/06/2014 06:02 μμ, Doug Smythies wrote:
>>>>>>
>>>>>> On 2104.06.11 07:08 Stratos Karafotis wrote:
>>>>>>> On 11/06/2014 04:41 μμ, Doug Smythies wrote:
>>>>>>>
>>>>>>> No.
>>>>>
>>>>>>> The intent was only ever to round properly the pseudo floating point 
>>>>>>> result of the divide.
>>>>>>> It was much more important (ugh, well 4 times more) when FRACBITS was 
>>>>>>> still 6, which also got changed to 8 in a recent patch.
>>>>>>>
>>>>>>
>>>>>> Are you sure?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> This rounding was very recently added.
>>>>>>> As far as I can understand, I don't see the meaning of this rounding, 
>>>>>>> as is.
>>>>>>> Even if FRAC_BITS was 6, I think it would have almost no improvement in
>>>>>>> calculations.
>>>>>>
>>>>>> Note: I had not seen this e-mail when I wrote a few minutes ago:
>>>>>>
>>>>>> You may be correct.
>>>>>> If Dirk agrees, I will re-analyse the entire driver for rounding effects 
>>>>>> soon.
>>>>>> When FRACBITS was 6 there were subtle cases where the driver would get 
>>>>>> stuck, and not make a final pstate change, with the default PID gains.
>>>>>> Other things have changed, and the analysis needs to be re-done.
>>>>>>
>>>>
>>>>> Could you please elaborate a little bit more what we need these 2 lines 
>>>>> below?
>>>>>
>>>>>if ((rem << 1) >= int_tofp(sample->mperf))
>>>>>core_pct += 1;
>>>>>
>>>>> Because nothing is mentioned for them in commit's changelog.
>>>>> Do we need to round core_pct or not?
>>>>> Because if we try to round it, I think this patch should work.
>>>>
>>>> As mentioned originally, they are there just to round the pseudo floating 
>>>> number, not the integer portion only.
>>>> Let us bring back the very numbers you originally gave and work through it.
>>>>
>>>> aperf = 5024
>>>> mperf = 10619
>>>>
>>>> core_pct = 47.31142292%
>>>> or 47 and 79.724267 256ths
>>>> or to the closest kept fractional part 47 and 80 256ths
>>>> or 12112 as a pseudo float.
>>>> The maximum error with this rounding will be 1 part in 512 and symmetric 
>>>> instead of the 1 part in 256 always in one direction without.
>>>>
>>>> Now if FRACBITS was still 6:
>>>> core_pct = 47.31142292%
>>>> or 47 and 19.931 64ths
>>>> or to the closest kept fractional part 47 and 20 64ths
>>>> or 3028 as a pseudo float.
>>>> The maximum error with this rounding will be 1 part in 128 and symmetric 
>>>> instead of the 1 part in 64 (1.6% !!!) always in one direction without.
>>>>
>>>> Hope this helps.
>>>>
>>>
>>> Yes, it helps. Thanks a lot!
>>>
>>> But please note that the maximum error without this rounding will be 1.6% 
>>> _only_
>>> in fractional part. In the real number it will be much smaller:
> 
> Fair comment. Thanks.
> 
>>>
>>> 47.19 instead of 47.20
>>>
>>> And using FRAC_BITS 8:
>>>
>>> 47.79 instead of 47.80
>>>
> 
> I really wouldn't write it that way, as I find it misleading. It is really 47 
> and 19 256ths...
> Anyway, I think we all understand.
> 
>>> This is a 0.0002% difference. I can't see how this is can affect the 
>>> calculations
>>> even with FRAC_BITS 6.
> 
> O.K. The solution is overkill and div_u64 could have been used instead of 
> div_u64_rem.
> On my list, it is the lowest of priorities.
> 
>>>
>>> Another thing is that this algorithm generally is used to round to the
>>> nearest integer. I'm not sure if it's valid to appl

Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-13 Thread Stratos Karafotis
On 13/06/2014 04:48 μμ, Dirk Brandewie wrote:
> On 06/12/2014 01:03 PM, Rafael J. Wysocki wrote:
>> On Thursday, June 12, 2014 05:35:59 PM Stratos Karafotis wrote:
>>> On 12/06/2014 12:15 πμ, Doug Smythies wrote:
>>>>
>>>>
>>>> -Original Message-
>>>> From: Stratos Karafotis [mailto:strat...@semaphore.gr]
>>>> Sent: June-11-2014 13:20
>>>> To: Doug Smythies
>>>> Cc: linux...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>>>> r...@rjwysocki.net; viresh.ku...@linaro.org; dirk.j.brande...@intel.com
>>>> Subject: Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct
>>>>
>>>> On 2014.06.11 13:20 Stratos Karafotis wrote:
>>>>> On 11/06/2014 06:02 μμ, Doug Smythies wrote:
>>>>>>
>>>>>> On 2104.06.11 07:08 Stratos Karafotis wrote:
>>>>>>> On 11/06/2014 04:41 μμ, Doug Smythies wrote:
>>>>>>>
>>>>>>> No.
>>>>>>
>>>>>>> The intent was only ever to round properly the pseudo floating point 
>>>>>>> result of the divide.
>>>>>>> It was much more important (ugh, well 4 times more) when FRACBITS was 
>>>>>>> still 6, which also got changed to 8 in a recent patch.
>>>>>>>
>>>>>>
>>>>>> Are you sure?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> This rounding was very recently added.
>>>>>>> As far as I can understand, I don't see the meaning of this rounding, 
>>>>>>> as is.
>>>>>>> Even if FRAC_BITS was 6, I think it would have almost no improvement in
>>>>>>> calculations.
>>>>>>
>>>>>> Note: I had not seen this e-mail when I wrote a few minutes ago:
>>>>>>
>>>>>> You may be correct.
>>>>>> If Dirk agrees, I will re-analyse the entire driver for rounding effects 
>>>>>> soon.
>>>>>> When FRACBITS was 6 there were subtle cases where the driver would get 
>>>>>> stuck, and not make a final pstate change, with the default PID gains.
>>>>>> Other things have changed, and the analysis needs to be re-done.
>>>>>>
>>>>
>>>>> Could you please elaborate a little bit more what we need these 2 lines 
>>>>> below?
>>>>>
> 
> Sorry for being MIA on this thread I have been up to my eyeballs.
> 
>>>>> if ((rem << 1) >= int_tofp(sample->mperf))
>>>>> core_pct += 1;
> 
> The rounding should have been
>core_pct += (1 << (FRAC_BITS-1));
> Since core_pct is is in fixeded point notation at this point. Adding .5 to
> core_pct to round up.
> 
> As Stratos pointed out the the current code only adds 1/256 to core_pct
> 
> Since core_pct_busy stays in fixed point through out the rest of the
> calculations ans we only do the rounding when the PID is returning an
> int I think we can safely remove these two lines.
> 

Please let me know if you want me to send a new patch for this (after the merge
window). Or will you or Doug handle this?


>> Depending on the original reason, it may or may not be.
>>
>> In theory, it may help reduce numerical drift resulting from rounding always 
>> in
>> one direction only, but I'm not really sure if that matters here.
>>
>> Doug seems to have carried out full analysis, though.
>>
>> Rafael
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 

Thank you all, for your comments!

Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-12 Thread Stratos Karafotis
On 12/06/2014 12:15 πμ, Doug Smythies wrote:
> 
> 
> -Original Message-
> From: Stratos Karafotis [mailto:strat...@semaphore.gr] 
> Sent: June-11-2014 13:20
> To: Doug Smythies
> Cc: linux...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> r...@rjwysocki.net; viresh.ku...@linaro.org; dirk.j.brande...@intel.com
> Subject: Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct
> 
> On 2014.06.11 13:20 Stratos Karafotis wrote:
>> On 11/06/2014 06:02 μμ, Doug Smythies wrote:
>>>
>>> On 2104.06.11 07:08 Stratos Karafotis wrote:
>>>> On 11/06/2014 04:41 μμ, Doug Smythies wrote:
>>>>
>>>> No.
>>>
>>>> The intent was only ever to round properly the pseudo floating point 
>>>> result of the divide.
>>>> It was much more important (ugh, well 4 times more) when FRACBITS was 
>>>> still 6, which also got changed to 8 in a recent patch.
>>>>
>>>
>>> Are you sure?
>>>
>>> Yes.
>>>
>>>> This rounding was very recently added.
>>>> As far as I can understand, I don't see the meaning of this rounding, as 
>>>> is.
>>>> Even if FRAC_BITS was 6, I think it would have almost no improvement in
>>>> calculations.
>>>
>>> Note: I had not seen this e-mail when I wrote a few minutes ago:
>>>
>>> You may be correct.
>>> If Dirk agrees, I will re-analyse the entire driver for rounding effects 
>>> soon.
>>> When FRACBITS was 6 there were subtle cases where the driver would get 
>>> stuck, and not make a final pstate change, with the default PID gains.
>>> Other things have changed, and the analysis needs to be re-done.
>>>
> 
>> Could you please elaborate a little bit more what we need these 2 lines 
>> below?
>>
>>if ((rem << 1) >= int_tofp(sample->mperf))
>>core_pct += 1;
>>
>> Because nothing is mentioned for them in commit's changelog.
>> Do we need to round core_pct or not?
>> Because if we try to round it, I think this patch should work.
> 
> As mentioned originally, they are there just to round the pseudo floating 
> number, not the integer portion only.
> Let us bring back the very numbers you originally gave and work through it.
> 
> aperf = 5024
> mperf = 10619
> 
> core_pct = 47.31142292%
> or 47 and 79.724267 256ths
> or to the closest kept fractional part 47 and 80 256ths
> or 12112 as a pseudo float.
> The maximum error with this rounding will be 1 part in 512 and symmetric 
> instead of the 1 part in 256 always in one direction without.
> 
> Now if FRACBITS was still 6:
> core_pct = 47.31142292%
> or 47 and 19.931 64ths
> or to the closest kept fractional part 47 and 20 64ths
> or 3028 as a pseudo float.
> The maximum error with this rounding will be 1 part in 128 and symmetric 
> instead of the 1 part in 64 (1.6% !!!) always in one direction without.
> 
> Hope this helps.
> 

Yes, it helps. Thanks a lot!

But please note that the maximum error without this rounding will be 1.6% _only_
in fractional part. In the real number it will be much smaller:

47.19 instead of 47.20

And using FRAC_BITS 8:

47.79 instead of 47.80

This is a 0.0002% difference. I can't see how this is can affect the 
calculations
even with FRAC_BITS 6.

Another thing is that this algorithm generally is used to round to the
nearest integer. I'm not sure if it's valid to apply it for the rounding of
the fractional part of fixed point number.

Please see below the algorithm (with numbers of the specific example presented
as real):

aperf = 5024
mperf = 10619

aperf * 100 / mperf = 47.31142292

core_pct = 47
rem = 3307

if (3307 * 2) >= 10619
core_pct = core_pct + 0.004

The original algorithm adds 1 to the quotient to round the integer part of the
division. In the specific example we add 0.004 to round the fractional part.

Fortunately, as you mentioned, this does not change the final calculation
considering that we do _not_ want an integer rounding.

IMHO, If we need integer rounding we also need this patch, otherwise
we can safely remove this rounding.


Thanks,
Stratos




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-11 Thread Stratos Karafotis
On 11/06/2014 06:02 μμ, Doug Smythies wrote:
> 
> On 2104.06.11 07:08 Stratos Karafotis wrote:
>> On 11/06/2014 04:41 μμ, Doug Smythies wrote:
>>
>> No.
>>
>> The intent was only ever to round properly the pseudo floating point result 
>> of the divide.
>> It was much more important (ugh, well 4 times more) when FRACBITS was still 
>> 6, which also got changed to 8 in a recent patch.
>>
> 
> Are you sure?
> 
> Yes.
> 
>> This rounding was very recently added.
>> As far as I can understand, I don't see the meaning of this rounding, as is.
>> Even if FRAC_BITS was 6, I think it would have almost no improvement in
>> calculations.
> 
> Note: I had not seen this e-mail when I wrote a few minutes ago:
> 
> You may be correct.
> If Dirk agrees, I will re-analyse the entire driver for rounding effects soon.
> When FRACBITS was 6 there were subtle cases where the driver would get stuck, 
> and not make a final pstate change, with the default PID gains.
> Other things have changed, and the analysis needs to be re-done.
> 

Could you please elaborate a little bit more what we need these 2 lines below?

if ((rem << 1) >= int_tofp(sample->mperf))
core_pct += 1;

Because nothing is mentioned for them in commit's changelog.
Do we need to round core_pct or not?
Because if we try to round it, I think this patch should work.

Thanks,
Stratos


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-11 Thread Stratos Karafotis
On 11/06/2014 04:41 μμ, Doug Smythies wrote:
> 
> On 2014.06.11 05:34 Stratos Karafotis wrote:
> 
>> Local variable core_pct holds fixed point values.
>> When we round it we add "1" to core_pct. This has almost
>> no effect.
>>
>> So, add int_toftp(1) to core_pct when rounding.
>>
>> For example, in a given sample point (values taken from
>> tracepoint) with:
>> aperf = 5024
>> mperf = 10619
>>
>> the core_pct is (before rounding):
>> core_pct = 12111
>> fp_toint(core_pct) = 47
>>
>> After rounding:
>> core_pct = 12112
>> fp_toint(core_pct) = 47
>>
>> After rounding with int_toftp(1):
>> core_pct = 12367
>> fp_toint(core_pct) = 48
>>
>> Signed-off-by: Stratos Karafotis 
>> ---
>>
>> Hi Rafael,
>>
>> I'm sorry for submitting again in merge window, but
>> I thought that maybe we need this fix for 3.16.
>>
>>
>> drivers/cpufreq/intel_pstate.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 4e7f492..dd80aa2 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -564,7 +564,7 @@ static inline void intel_pstate_calc_busy(struct cpudata 
>> *cpu)
>>  core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem);
>>
>>  if ((rem << 1) >= int_tofp(sample->mperf))
>> -core_pct += 1;
>> +core_pct += int_tofp(1);
>>
>>  sample->freq = fp_toint(
>>  mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
>> -- 
>> 1.9.3
> 
> No.
> 
> The intent was only ever to round properly the pseudo floating point result 
> of the divide.
> It was much more important (ugh, well 4 times more) when FRACBITS was still 
> 6, which also got changed to 8 in a recent patch.
> 

Are you sure? This rounding was very recently added.
As far as I can understand, I don't see the meaning of this rounding, as is.
Even if FRAC_BITS was 6, I think it would have almost no improvement in
calculations.


Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: intel_pstate: Fix rounding of core_pct

2014-06-11 Thread Stratos Karafotis
Local variable core_pct holds fixed point values.
When we round it we add "1" to core_pct. This has almost
no effect.

So, add int_toftp(1) to core_pct when rounding.

For example, in a given sample point (values taken from
tracepoint) with:
aperf = 5024
mperf = 10619

the core_pct is (before rounding):
core_pct = 12111
fp_toint(core_pct) = 47

After rounding:
core_pct = 12112
fp_toint(core_pct) = 47

After rounding with int_toftp(1):
core_pct = 12367
fp_toint(core_pct) = 48

Signed-off-by: Stratos Karafotis 
---

Hi Rafael,

I'm sorry for submitting again in merge window, but
I thought that maybe we need this fix for 3.16.


 drivers/cpufreq/intel_pstate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4e7f492..dd80aa2 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -564,7 +564,7 @@ static inline void intel_pstate_calc_busy(struct cpudata 
*cpu)
core_pct = div_u64_rem(core_pct, int_tofp(sample->mperf), &rem);
 
if ((rem << 1) >= int_tofp(sample->mperf))
-   core_pct += 1;
+   core_pct += int_tofp(1);
 
sample->freq = fp_toint(
mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-10 Thread Stratos Karafotis
On 11/06/2014 12:38 πμ, Rafael J. Wysocki wrote:
> On Wednesday, June 11, 2014 12:02:09 AM Stratos Karafotis wrote:
>> On 10/06/2014 11:43 μμ, Rafael J. Wysocki wrote:
>>> On Tuesday, June 10, 2014 11:14:53 PM Stratos Karafotis wrote:
>>>> On 10/06/2014 11:17 μμ, Rafael J. Wysocki wrote:
>>>>> On Tuesday, June 10, 2014 10:26:44 AM Dirk Brandewie wrote:
>>>>>> On 06/10/2014 08:31 AM, Rafael J. Wysocki wrote:
>>>>>>> On Tuesday, June 10, 2014 08:12:48 AM Dirk Brandewie wrote:
>>>>>>>> On 06/09/2014 02:01 PM, Stratos Karafotis wrote:
>>>>>>>>> Remove unnecessary blank lines.
>>>>>>>>> Remove unnecessary parentheses.
>>>>>>>>> Remove unnecessary braces.
>>>>>>>>> Put the code in one line where possible.
>>>>>>>>> Add blank lines after variable declarations.
>>>>>>>>> Alignment to open parenthesis.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't have an issue with this patch in general but I would rather
>>>>>>>> the cleanup be done when there is a functional change in the given
>>>>>>>> hunk of code otherwise you are setting up a fence for 
>>>>>>>> stable/backporters
>>>>>>>> of functional changes in the future.
>>>>>>>
>>>>>>> I actually prefer separate cleanups so as to avoid doing multiple things
>>>>>>> in one patch.
>>>>>>>
>>>>>>> Rafael
>>>>>>>
>>>>>> I don't have strong feelings either way I was just trying to be kind
>>>>>> to the maintainers of distro kernels.
>>>>>
>>>>> And mixing fixes with cleanups in one patch doesn't do any good to them.
>>>>>
>>>>> Trust me, I used to work for a distro. :-)
>>>>>
>>>>
>>>> So, should I proceed and split the patch or drop it? :)
>>>
>>> I'm not sure why you'd want to split it?
>>
>> Forgive me, but I'm totally confused. I asked because you mentioned that
>> you prefer separate cleanups.
> 
> That was in a reply to Dirk who suggested doing cleanups along with
> fixes (or at least I understood what he said this way).
> 
> I tried to explain why I didn't think that this was a good idea.
> 
>> So, my question was if you want me to separate this patch into more (one
>> per change) or entirely drop it (because it would cause problems to 
>> backporters
>> or maintainers).
> 
> Cleanups are generally OK, but it's better to do one kind of a cleanup
> per patch.  Like whitespace fixes in one patch, cleanup of expressions in
> another.
> 

OK, thanks for the clarification! I will do it in separate patches.

>>
>>> That said you're changing things that are intentional.  For example,
>>> the
>>>
>>> if (acpi_disabled
>>> || ...)
>>>
>>> is.  And the result of (a * 100) / b may generally be different from
>>> a * 100 / b for integers (if the division is carried out first).
>>
>> I thought that (a * 100) / b is always equivalent to a * 100 / b.
> 
> I'm not actually sure if that's guaranteed by C standards.  It surely
> wasn't some time ago (when there was no formal C standard).
>

I think it is, according to C precedence table.
But, anyway my motivation to the specific cleanup was the different style
in the same block code:

limits.min_perf_pct = (policy->min * 100) / policy->cpuinfo.max_freq;
...
limits.max_policy_pct = policy->max * 100 / policy->cpuinfo.max_freq;

Thanks again!
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 11:43 μμ, Rafael J. Wysocki wrote:
> On Tuesday, June 10, 2014 11:14:53 PM Stratos Karafotis wrote:
>> On 10/06/2014 11:17 μμ, Rafael J. Wysocki wrote:
>>> On Tuesday, June 10, 2014 10:26:44 AM Dirk Brandewie wrote:
>>>> On 06/10/2014 08:31 AM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, June 10, 2014 08:12:48 AM Dirk Brandewie wrote:
>>>>>> On 06/09/2014 02:01 PM, Stratos Karafotis wrote:
>>>>>>> Remove unnecessary blank lines.
>>>>>>> Remove unnecessary parentheses.
>>>>>>> Remove unnecessary braces.
>>>>>>> Put the code in one line where possible.
>>>>>>> Add blank lines after variable declarations.
>>>>>>> Alignment to open parenthesis.
>>>>>>>
>>>>>>
>>>>>> I don't have an issue with this patch in general but I would rather
>>>>>> the cleanup be done when there is a functional change in the given
>>>>>> hunk of code otherwise you are setting up a fence for stable/backporters
>>>>>> of functional changes in the future.
>>>>>
>>>>> I actually prefer separate cleanups so as to avoid doing multiple things
>>>>> in one patch.
>>>>>
>>>>> Rafael
>>>>>
>>>> I don't have strong feelings either way I was just trying to be kind
>>>> to the maintainers of distro kernels.
>>>
>>> And mixing fixes with cleanups in one patch doesn't do any good to them.
>>>
>>> Trust me, I used to work for a distro. :-)
>>>
>>
>> So, should I proceed and split the patch or drop it? :)
> 
> I'm not sure why you'd want to split it?

Forgive me, but I'm totally confused. I asked because you mentioned that
you prefer separate cleanups.
So, my question was if you want me to separate this patch into more (one
per change) or entirely drop it (because it would cause problems to backporters
or maintainers).

> That said you're changing things that are intentional.  For example,
> the
> 
>   if (acpi_disabled
>   || ...)
> 
> is.  And the result of (a * 100) / b may generally be different from
> a * 100 / b for integers (if the division is carried out first).

I thought that (a * 100) / b is always equivalent to a * 100 / b.


Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 11:17 μμ, Rafael J. Wysocki wrote:
> On Tuesday, June 10, 2014 10:26:44 AM Dirk Brandewie wrote:
>> On 06/10/2014 08:31 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, June 10, 2014 08:12:48 AM Dirk Brandewie wrote:
>>>> On 06/09/2014 02:01 PM, Stratos Karafotis wrote:
>>>>> Remove unnecessary blank lines.
>>>>> Remove unnecessary parentheses.
>>>>> Remove unnecessary braces.
>>>>> Put the code in one line where possible.
>>>>> Add blank lines after variable declarations.
>>>>> Alignment to open parenthesis.
>>>>>
>>>>
>>>> I don't have an issue with this patch in general but I would rather
>>>> the cleanup be done when there is a functional change in the given
>>>> hunk of code otherwise you are setting up a fence for stable/backporters
>>>> of functional changes in the future.
>>>
>>> I actually prefer separate cleanups so as to avoid doing multiple things
>>> in one patch.
>>>
>>> Rafael
>>>
>> I don't have strong feelings either way I was just trying to be kind
>> to the maintainers of distro kernels.
> 
> And mixing fixes with cleanups in one patch doesn't do any good to them.
> 
> Trust me, I used to work for a distro. :-)
> 

So, should I proceed and split the patch or drop it? :)

Stratos


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] cpufreq: intel_pstate: Simplify code in intel_pstate_adjust_busy_pstate

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 08:07 μμ, Dirk Brandewie wrote:
> On 06/10/2014 07:51 AM, Stratos Karafotis wrote:
>> On 10/06/2014 08:27 πμ, Viresh Kumar wrote:
>>> On 10 June 2014 02:30, Stratos Karafotis  wrote:
>>>> Simplify the code by removing the inline functions
>>>> pstate_increase and pstate_decrease and use directly the
>>>> intel_pstate_set_pstate.
>>>>
> 
> Doesn't apply without your scaled_busy change spin this patch with
> out the scaled_busy change and explain the change more fully in the
> commit message to cover Viresh's question and I am good with this change.
> 
> 

OK, I will.

Thanks,
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] cpufreq: intel_pstate: Add debugfs file stats

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 08:05 μμ, Dirk Brandewie wrote:
> On 06/10/2014 09:21 AM, Stratos Karafotis wrote:
>> On 10/06/2014 06:47 μμ, Dirk Brandewie wrote:
>>> On 06/09/2014 02:00 PM, Stratos Karafotis wrote:
>>>> Add stats file in debugfs under driver's parent directory
>>>> (pstate_snb) which counts the time in nsecs per requested
>>>> P state and the number of times the specific state
>>>> was requested.
>>>>
>>>> The file presents the statistics per logical CPU in the
>>>> following format. The time is displayed in msecs:
>>>>
>>>
>>> NAK
>>>
>>> This adds significantly to the memory footprint to gather information
>>> that is available by post processing the perf tracepoint information.
>>> The increase isn't horrible on single socket desktop processor machines
>>> but gets big with server class machines.  One vendor I have talked to 
>>> considers
>>> a machine with 1024 cpus to be a SMALL machine.
>>>
>>
>> If I am not wrong the sizeof pstate_stat is 20B. On my CPU with 20 P states, 
>> we
>> need 400B per logical CPU (3200B total in my desktop) plus 64B for stats 
>> pointers.
>>
>> In your example this would need about 400KB - 500KB?
>> Is it too much for 1024 a CPUs system?
> 
> For something that will likely not be used IMO yes.
> 
>>
>> I think it's a useful piece of info that we can have it directly without
>> post processing tracepoint.
>> Is it acceptable to conditionally compile it with a new CONFIG option?
> 
> 
> I can see where the information could be useful but the set of people
> that would find it useful is very small.  Having information about residency 
> since boot is interesting but just barely.  This file will encourage people
> to build tools/scripts that rely on this file and they will complain bitterly
> if/when it changes or goes away so you would be creating a defacto ABI in
> debugfs.
> 
> 
> This functionality will *not* be supportable in up coming processors where HWP
> is being used.  See section 14.4 of the current SDM vol. 3 
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
> 

I will drop this patch in v2.
Thanks a lot for your comments and your time!

Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] cpufreq: intel_pstate: Avoid duplicate call of intel_pstate_get_scaled_busy

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 07:05 μμ, Dirk Brandewie wrote:
> On 06/09/2014 02:00 PM, Stratos Karafotis wrote:
>> Store busy_scaled value to avoid to duplicate call of
>> intel_pstate_get_scaled_busy on every sampling interval.
>>
> 
> The second call *only* happens if the tracepoint is being used otherwise
> the whole function call to  trace_pstate_sample() is a noop.

Yes, I'm sorry, I forgot to add this in my changelog. I have written this
in cover letter.
I made this change mostly to support patch 3/7.

> This makes the code less readable IMHO the reader is left wondering
> how cpu->sample.busy_scaled was set in intel_pstate_adjust_busy_pstate()
> 

I agree that the the original code is more readable. If we don't care
about the small overhead when tracing is on and forget patch 3/7,
of course the original code is by far better.


>> Also, rename the function to intel_pstate_calc_scaled_busy.
>>
>> Signed-off-by: Stratos Karafotis 
>> ---
>>   drivers/cpufreq/intel_pstate.c | 12 ++--
>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 4e7f492..31e2ae5 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -55,6 +55,7 @@ static inline int32_t div_fp(int32_t x, int32_t y)
>>
>>   struct sample {
>>   int32_t core_pct_busy;
>> +int32_t busy_scaled;
>>   u64 aperf;
>>   u64 mperf;
>>   int freq;
>> @@ -604,7 +605,7 @@ static inline void intel_pstate_set_sample_time(struct 
>> cpudata *cpu)
>>   mod_timer_pinned(&cpu->timer, jiffies + delay);
>>   }
>>
>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>> +static inline void intel_pstate_calc_scaled_busy(struct cpudata *cpu)
>>   {
>>   int32_t core_busy, max_pstate, current_pstate, sample_ratio;
>>   u32 duration_us;
>> @@ -624,20 +625,19 @@ static inline int32_t 
>> intel_pstate_get_scaled_busy(struct cpudata *cpu)
>>   core_busy = mul_fp(core_busy, sample_ratio);
>>   }
>>
>> -return core_busy;
>> +cpu->sample.busy_scaled = core_busy;
>>   }
>>
>>   static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>>   {
>> -int32_t busy_scaled;
>>   struct _pid *pid;
>>   signed int ctl = 0;
>>   int steps;
>>
>>   pid = &cpu->pid;
>> -busy_scaled = intel_pstate_get_scaled_busy(cpu);
>> +intel_pstate_calc_scaled_busy(cpu);
>>
>> -ctl = pid_calc(pid, busy_scaled);
>> +ctl = pid_calc(pid, cpu->sample.busy_scaled);
>>
>>   steps = abs(ctl);
>>
>> @@ -659,7 +659,7 @@ static void intel_pstate_timer_func(unsigned long __data)
>>   intel_pstate_adjust_busy_pstate(cpu);
>>
>>   trace_pstate_sample(fp_toint(sample->core_pct_busy),
>> -fp_toint(intel_pstate_get_scaled_busy(cpu)),
>> +fp_toint(sample->busy_scaled),
>>   cpu->pstate.current_pstate,
>>   sample->mperf,
>>   sample->aperf,
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] cpufreq: intel_pstate: Add debugfs file stats

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 06:47 μμ, Dirk Brandewie wrote:
> On 06/09/2014 02:00 PM, Stratos Karafotis wrote:
>> Add stats file in debugfs under driver's parent directory
>> (pstate_snb) which counts the time in nsecs per requested
>> P state and the number of times the specific state
>> was requested.
>>
>> The file presents the statistics per logical CPU in the
>> following format. The time is displayed in msecs:
>>
> 
> NAK
> 
> This adds significantly to the memory footprint to gather information
> that is available by post processing the perf tracepoint information.
> The increase isn't horrible on single socket desktop processor machines
> but gets big with server class machines.  One vendor I have talked to 
> considers
> a machine with 1024 cpus to be a SMALL machine.
> 

If I am not wrong the sizeof pstate_stat is 20B. On my CPU with 20 P states, we
need 400B per logical CPU (3200B total in my desktop) plus 64B for stats 
pointers.

In your example this would need about 400KB - 500KB?
Is it too much for 1024 a CPUs system?

I think it's a useful piece of info that we can have it directly without
post processing tracepoint.
Is it acceptable to conditionally compile it with a new CONFIG option?


Thanks,
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] cpufreq: intel_pstate: Simplify code in intel_pstate_adjust_busy_pstate

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 08:27 πμ, Viresh Kumar wrote:
> On 10 June 2014 02:30, Stratos Karafotis  wrote:
>> Simplify the code by removing the inline functions
>> pstate_increase and pstate_decrease and use directly the
>> intel_pstate_set_pstate.
>>
>> Signed-off-by: Stratos Karafotis 
>> ---
>>  drivers/cpufreq/intel_pstate.c | 26 +++---
>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 3a49269..26a0262 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -588,21 +588,6 @@ static void intel_pstate_set_pstate(struct cpudata 
>> *cpu, int pstate)
>> pstate_funcs.set(cpu, pstate);
>>  }
>>
>> -static inline void intel_pstate_pstate_increase(struct cpudata *cpu, int 
>> steps)
>> -{
>> -   int target;
>> -   target = cpu->pstate.current_pstate + steps;
>> -
>> -   intel_pstate_set_pstate(cpu, target);
>> -}
>> -
>> -static inline void intel_pstate_pstate_decrease(struct cpudata *cpu, int 
>> steps)
>> -{
>> -   int target;
>> -   target = cpu->pstate.current_pstate - steps;
>> -   intel_pstate_set_pstate(cpu, target);
>> -}
>> -
>>  static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
>>  {
>> cpu->pstate.min_pstate = pstate_funcs.get_min();
>> @@ -695,20 +680,15 @@ static inline void 
>> intel_pstate_calc_scaled_busy(struct cpudata *cpu)
>>  static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>>  {
>> struct _pid *pid;
>> -   signed int ctl = 0;
>> -   int steps;
>> +   signed int ctl;
>>
>> pid = &cpu->pid;
>> intel_pstate_calc_scaled_busy(cpu);
>>
>> ctl = pid_calc(pid, cpu->sample.busy_scaled);
>>
>> -   steps = abs(ctl);
>> -
>> -   if (ctl < 0)
>> -   intel_pstate_pstate_increase(cpu, steps);
>> -   else
>> -   intel_pstate_pstate_decrease(cpu, steps);
>> +   /* Negative values of ctl increase the pstate and vice versa */
>> +   intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
>>  }
> 
> I am not very good at this driver but there is some obvious functional
> change here. Earlier we used to pass
> 'cpu->pstate.current_pstate {-|+} steps' and now you are doing '-ctl' only
> 

The original code is:   

if (ctl < 0)
intel_pstate_pstate_increase(cpu, steps);
else
intel_pstate_pstate_decrease(cpu, steps);

Without inlines functions intel_pstate_pstate_increase() and
intel_pstate_pstate_decrease() we get:

if (ctl < 0)
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate + 
steps);
else
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - 
steps);


But steps = abs(ctl), so:

if (ctl < 0)
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate + 
abs(ctl));
else
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - 
abs(ctl));

By definition, abs(ctl) = ctl if ctl >= 0, -ctl if ctl < 0. Thus:

if (ctl < 0)
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate + 
(-ctl));
else
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);

And:
if (ctl < 0)
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
else
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);

Finally remove the unnecessary if statement.
intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);

So, this is equivalent with the original code.

Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] cpufreq: intel_pstate: Remove redundant includes

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 08:29 πμ, Viresh Kumar wrote:
> On 10 June 2014 02:30, Stratos Karafotis  wrote:
>> Also put them in alphabetical order.
>>
>> Signed-off-by: Stratos Karafotis 
>> ---
>>  drivers/cpufreq/intel_pstate.c | 17 ++---
>>  1 file changed, 2 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 26a0262..d4f0518 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -10,26 +10,13 @@
>>   * of the License.
>>   */
>>
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> +#include 
>>  #include 
>>  #include 
>> -#include 
>> -#include 
>> -#include 
>>  #include 
>> -#include 
>> +#include 
>>  #include 
>>
>> -#include 
>> -#include 
>>  #include 
> 
> As a rule, header files for all the symbols directly used by a file must
> be included directly by the file and must not depend on indirect inclusions.
> 
> So even if it compiles, its the wrong thing to do. Though you can obviously
> remove the headers which aren't used.
> 

I didn't know this. I will drop this patch. I'm sorry for the noise.

Thanks for your comments!
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-10 Thread Stratos Karafotis
On 10/06/2014 12:22 πμ, Joe Perches wrote:
> On Tue, 2014-06-10 at 00:01 +0300, Stratos Karafotis wrote:
>> Remove unnecessary braces.
> 
> []
> 
>> @@ -204,20 +203,16 @@ static inline void intel_pstate_busy_pid_reset(struct 
>> cpudata *cpu)
> 
>>  static inline void intel_pstate_reset_all_pid(void)
>>  {
>>  unsigned int cpu;
>> -for_each_online_cpu(cpu) {
>> +
>> +for_each_online_cpu(cpu)
>>  if (all_cpu_data[cpu])
>>  intel_pstate_busy_pid_reset(all_cpu_data[cpu]);
>> -}
> 
> It's pretty traditional to keep the braces here
> as it generally makes it clearer for the reader.
> 
>   for (...) {
>   if (foo)
>   bar();
>   }
> 
> is generally used over
> 
>   for (...)
>   if (foo)
>   bar();
> 
> Just like using
> 
>   if (foo) {
>   /* commment */
>   bar();
>   }

OK, I will revert these changes in v2.

>> @@ -748,15 +744,14 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
> []
>> -pr_info("Intel pstate controlling: cpu %d\n", cpunum);
>> +pr_info("Intel pstate controlling: CPU %d\n", cpunum);
> 
> cpu is very slightly preferred lower case.
> 
> $ git grep -E -i '^[^"]*"[^"]*\bcpu\b'|grep -w -i -o cpu | sort |uniq -c | 
> sort -rn
>2705 cpu
>2084 CPU
>  17 Cpu
> 

Although, I believe that the term 'CPU' is more appropriate, I'll revert this
as the majority and Dirk prefer it. :)

Thanks for your comments!
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] cpufreq: intel_pstate: Make intel_pstate_kobject local

2014-06-09 Thread Stratos Karafotis
On 10/06/2014 12:07 πμ, David Rientjes wrote:
> On Tue, 10 Jun 2014, Stratos Karafotis wrote:
> 
>> Since we never remove sysfs entry, we can make the intel_pstate_kobject
>> local.
>>
> 
> For even more savings, this function and 
> intel_pstate_debug_expose_params() can be annotated with __init and freed 
> after bootstrap.

Thanks for your comment, David!
I will do this in v2.


Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] cpufreq: intel_pstate: Remove duplicate CPU ID check

2014-06-09 Thread Stratos Karafotis
We check the CPU ID during driver init. There is no need
to do it again per logical CPU initialization.

So, remove the duplicate check.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index aebd457..4e7f492 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -691,14 +691,8 @@ MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
 
 static int intel_pstate_init_cpu(unsigned int cpunum)
 {
-
-   const struct x86_cpu_id *id;
struct cpudata *cpu;
 
-   id = x86_match_cpu(intel_pstate_cpu_ids);
-   if (!id)
-   return -ENODEV;
-
all_cpu_data[cpunum] = kzalloc(sizeof(struct cpudata), GFP_KERNEL);
if (!all_cpu_data[cpunum])
return -ENOMEM;
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] cpufreq: intel_pstate: Add debugfs file stats

2014-06-09 Thread Stratos Karafotis
Add stats file in debugfs under driver's parent directory
(pstate_snb) which counts the time in nsecs per requested
P state and the number of times the specific state
was requested.

The file presents the statistics per logical CPU in the
following format. The time is displayed in msecs:

CPU0
P-stateTime Count
 16 4882777 23632
 17   21210   174
 18  549781  3300
 19   51171   461
 20   35487   394
 21   18173   219
 22   13752   258
 236048   172
 247754   177
 254587   151
 265465   162
 27143247
 28 86354
 29144850
 30103047
 31147262
 32222168
 33186960
 34214070
 39   85446  3803

...

The file can be used for debugging but also for monitoring
various system workloads.

Also, make the debugfs_parent local as we never remove
the driver's debugfs files.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 80 +-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 31e2ae5..3a49269 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -86,6 +86,12 @@ struct _pid {
int32_t last_err;
 };
 
+struct pstate_stat {
+   int pstate;
+   u64 time;
+   u64 count;
+};
+
 struct cpudata {
int cpu;
 
@@ -99,6 +105,7 @@ struct cpudata {
u64 prev_aperf;
u64 prev_mperf;
struct sample sample;
+   struct pstate_stat *stats;
 };
 
 static struct cpudata **all_cpu_data;
@@ -256,9 +263,59 @@ static struct pid_param pid_files[] = {
{NULL, NULL}
 };
 
-static struct dentry *debugfs_parent;
+static inline unsigned int stats_state_index(struct cpudata *cpu, int pstate)
+{
+   if (pstate <= cpu->pstate.max_pstate)
+   return pstate - cpu->pstate.min_pstate;
+   else
+   return cpu->pstate.max_pstate - cpu->pstate.min_pstate + 1;
+}
+
+static int stats_debug_show(struct seq_file *m, void *unused)
+{
+   struct cpudata *cpu;
+   int i, j, cnt;
+
+   get_online_cpus();
+   for_each_online_cpu(i) {
+   if (all_cpu_data[i])
+   cpu = all_cpu_data[i];
+   else
+   continue;
+
+   seq_printf(m, "CPU%u\n", i);
+   seq_puts(m, "P-stateTime Count\n");
+
+   cnt = cpu->pstate.max_pstate - cpu->pstate.min_pstate + 2;
+   for (j = 0; j < cnt; j++)
+   seq_printf(m, "%7u %11llu %9llu\n",
+  cpu->stats[j].pstate,
+  cpu->stats[j].time / USEC_PER_MSEC,
+  cpu->stats[j].count);
+
+   seq_puts(m, "\n");
+   }
+   put_online_cpus();
+
+   return 0;
+}
+
+static int stats_debug_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, stats_debug_show, inode->i_private);
+}
+
+static const struct file_operations fops_stats_pstate = {
+   .open   = stats_debug_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+   .owner  = THIS_MODULE,
+};
+
 static void intel_pstate_debug_expose_params(void)
 {
+   struct dentry *debugfs_parent;
int i = 0;
 
debugfs_parent = debugfs_create_dir("pstate_snb", NULL);
@@ -270,6 +327,8 @@ static void intel_pstate_debug_expose_params(void)
&fops_pid_param);
i++;
}
+   debugfs_create_file("stats", S_IRUSR | S_IRGRP, debugfs_parent, NULL,
+   &fops_stats_pstate);
 }
 
 /** debugfs end /
@@ -610,6 +669,7 @@ static inline void intel_pstate_calc_scaled_busy(struct 
cpudata *cpu)
int32_t core_busy, max_pstate, current_pstate, sample_ratio;
u32 duration_us;
u32 sample_time;
+   unsigned int i;
 
core_busy = cpu->sample.core_pct_busy;
max_pstate = int_tofp(cpu->pstate.max_pstate);
@@ -626,6 +686,10 @@ static inline void intel_pstate_calc_scaled_busy(struct 
cpudata *cpu)
}
 
cpu->sample.busy_scaled = core_busy;
+
+   i = stats_state_index(cpu, cpu->pstate.current_pstate);
+   cpu->stats[i].time += duration_us;
+   cpu->stats[i].count++;
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
@@ -692,6 +756,7 @@ MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
 stati

[PATCH 7/7] cpufreq: intel_pstate: Make intel_pstate_kobject local

2014-06-09 Thread Stratos Karafotis
Since we never remove sysfs entry, we can make the intel_pstate_kobject
local.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index fa44f0f..9533fff 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -387,10 +387,10 @@ static struct attribute *intel_pstate_attributes[] = {
 static struct attribute_group intel_pstate_attr_group = {
.attrs = intel_pstate_attributes,
 };
-static struct kobject *intel_pstate_kobject;
 
 static void intel_pstate_sysfs_expose_params(void)
 {
+   struct kobject *intel_pstate_kobject;
int rc;
 
intel_pstate_kobject = kobject_create_and_add("intel_pstate",
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-09 Thread Stratos Karafotis
Remove unnecessary blank lines.
Remove unnecessary parentheses.
Remove unnecessary braces.
Put the code in one line where possible.
Add blank lines after variable declarations.
Alignment to open parenthesis.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 96 --
 1 file changed, 45 insertions(+), 51 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index d4f0518..fa44f0f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -142,7 +142,7 @@ static struct perf_limits limits = {
 };
 
 static inline void pid_reset(struct _pid *pid, int setpoint, int busy,
-   int deadband, int integral) {
+int deadband, int integral) {
pid->setpoint = setpoint;
pid->deadband  = deadband;
pid->integral  = int_tofp(integral);
@@ -161,7 +161,6 @@ static inline void pid_i_gain_set(struct _pid *pid, int 
percent)
 
 static inline void pid_d_gain_set(struct _pid *pid, int percent)
 {
-
pid->d_gain = div_fp(int_tofp(percent), int_tofp(100));
 }
 
@@ -192,9 +191,9 @@ static signed int pid_calc(struct _pid *pid, int32_t busy)
 
result = pterm + mul_fp(pid->integral, pid->i_gain) + dterm;
if (result >= 0)
-   result = result + (1 << (FRAC_BITS-1));
+   result += 1 << (FRAC_BITS-1);
else
-   result = result - (1 << (FRAC_BITS-1));
+   result -= 1 << (FRAC_BITS-1);
return (signed int)fp_toint(result);
 }
 
@@ -204,20 +203,16 @@ static inline void intel_pstate_busy_pid_reset(struct 
cpudata *cpu)
pid_d_gain_set(&cpu->pid, pid_params.d_gain_pct);
pid_i_gain_set(&cpu->pid, pid_params.i_gain_pct);
 
-   pid_reset(&cpu->pid,
-   pid_params.setpoint,
-   100,
-   pid_params.deadband,
-   0);
+   pid_reset(&cpu->pid, pid_params.setpoint, 100, pid_params.deadband, 0);
 }
 
 static inline void intel_pstate_reset_all_pid(void)
 {
unsigned int cpu;
-   for_each_online_cpu(cpu) {
+
+   for_each_online_cpu(cpu)
if (all_cpu_data[cpu])
intel_pstate_busy_pid_reset(all_cpu_data[cpu]);
-   }
 }
 
 /** debugfs begin /
@@ -227,13 +222,13 @@ static int pid_param_set(void *data, u64 val)
intel_pstate_reset_all_pid();
return 0;
 }
+
 static int pid_param_get(void *data, u64 *val)
 {
*val = *(u32 *)data;
return 0;
 }
-DEFINE_SIMPLE_ATTRIBUTE(fops_pid_param, pid_param_get,
-   pid_param_set, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(fops_pid_param, pid_param_get, pid_param_set, 
"%llu\n");
 
 struct pid_param {
char *name;
@@ -310,8 +305,8 @@ static void intel_pstate_debug_expose_params(void)
return;
while (pid_files[i].name) {
debugfs_create_file(pid_files[i].name, 0660,
-   debugfs_parent, pid_files[i].value,
-   &fops_pid_param);
+   debugfs_parent, pid_files[i].value,
+   &fops_pid_param);
i++;
}
debugfs_create_file("stats", S_IRUSR | S_IRGRP, debugfs_parent, NULL,
@@ -329,10 +324,11 @@ static void intel_pstate_debug_expose_params(void)
}
 
 static ssize_t store_no_turbo(struct kobject *a, struct attribute *b,
-   const char *buf, size_t count)
+ const char *buf, size_t count)
 {
unsigned int input;
int ret;
+
ret = sscanf(buf, "%u", &input);
if (ret != 1)
return -EINVAL;
@@ -342,10 +338,11 @@ static ssize_t store_no_turbo(struct kobject *a, struct 
attribute *b,
 }
 
 static ssize_t store_max_perf_pct(struct kobject *a, struct attribute *b,
-   const char *buf, size_t count)
+ const char *buf, size_t count)
 {
unsigned int input;
int ret;
+
ret = sscanf(buf, "%u", &input);
if (ret != 1)
return -EINVAL;
@@ -353,14 +350,16 @@ static ssize_t store_max_perf_pct(struct kobject *a, 
struct attribute *b,
limits.max_sysfs_pct = clamp_t(int, input, 0 , 100);
limits.max_perf_pct = min(limits.max_policy_pct, limits.max_sysfs_pct);
limits.max_perf = div_fp(int_tofp(limits.max_perf_pct), int_tofp(100));
+
return count;
 }
 
 static ssize_t store_min_perf_pct(struct kobject *a, struct attribute *b,
-   const char *buf, size_t count)
+ const char *buf, size_t count)
 {
unsigned int input;
  

[PATCH 5/7] cpufreq: intel_pstate: Remove redundant includes

2014-06-09 Thread Stratos Karafotis
Also put them in alphabetical order.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 26a0262..d4f0518 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -10,26 +10,13 @@
  * of the License.
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include 
 #include 
 #include 
-#include 
-#include 
-#include 
 #include 
-#include 
+#include 
 #include 
 
-#include 
-#include 
 #include 
 
 #define BYT_RATIOS 0x66a
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] cpufreq: intel_pstate: Avoid duplicate call of intel_pstate_get_scaled_busy

2014-06-09 Thread Stratos Karafotis
Store busy_scaled value to avoid to duplicate call of
intel_pstate_get_scaled_busy on every sampling interval.

Also, rename the function to intel_pstate_calc_scaled_busy.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4e7f492..31e2ae5 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -55,6 +55,7 @@ static inline int32_t div_fp(int32_t x, int32_t y)
 
 struct sample {
int32_t core_pct_busy;
+   int32_t busy_scaled;
u64 aperf;
u64 mperf;
int freq;
@@ -604,7 +605,7 @@ static inline void intel_pstate_set_sample_time(struct 
cpudata *cpu)
mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline void intel_pstate_calc_scaled_busy(struct cpudata *cpu)
 {
int32_t core_busy, max_pstate, current_pstate, sample_ratio;
u32 duration_us;
@@ -624,20 +625,19 @@ static inline int32_t intel_pstate_get_scaled_busy(struct 
cpudata *cpu)
core_busy = mul_fp(core_busy, sample_ratio);
}
 
-   return core_busy;
+   cpu->sample.busy_scaled = core_busy;
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-   int32_t busy_scaled;
struct _pid *pid;
signed int ctl = 0;
int steps;
 
pid = &cpu->pid;
-   busy_scaled = intel_pstate_get_scaled_busy(cpu);
+   intel_pstate_calc_scaled_busy(cpu);
 
-   ctl = pid_calc(pid, busy_scaled);
+   ctl = pid_calc(pid, cpu->sample.busy_scaled);
 
steps = abs(ctl);
 
@@ -659,7 +659,7 @@ static void intel_pstate_timer_func(unsigned long __data)
intel_pstate_adjust_busy_pstate(cpu);
 
trace_pstate_sample(fp_toint(sample->core_pct_busy),
-   fp_toint(intel_pstate_get_scaled_busy(cpu)),
+   fp_toint(sample->busy_scaled),
cpu->pstate.current_pstate,
sample->mperf,
sample->aperf,
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] cpufreq: intel_pstate: Simplify code in intel_pstate_adjust_busy_pstate

2014-06-09 Thread Stratos Karafotis
Simplify the code by removing the inline functions
pstate_increase and pstate_decrease and use directly the
intel_pstate_set_pstate.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 3a49269..26a0262 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -588,21 +588,6 @@ static void intel_pstate_set_pstate(struct cpudata *cpu, 
int pstate)
pstate_funcs.set(cpu, pstate);
 }
 
-static inline void intel_pstate_pstate_increase(struct cpudata *cpu, int steps)
-{
-   int target;
-   target = cpu->pstate.current_pstate + steps;
-
-   intel_pstate_set_pstate(cpu, target);
-}
-
-static inline void intel_pstate_pstate_decrease(struct cpudata *cpu, int steps)
-{
-   int target;
-   target = cpu->pstate.current_pstate - steps;
-   intel_pstate_set_pstate(cpu, target);
-}
-
 static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
 {
cpu->pstate.min_pstate = pstate_funcs.get_min();
@@ -695,20 +680,15 @@ static inline void intel_pstate_calc_scaled_busy(struct 
cpudata *cpu)
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
struct _pid *pid;
-   signed int ctl = 0;
-   int steps;
+   signed int ctl;
 
pid = &cpu->pid;
intel_pstate_calc_scaled_busy(cpu);
 
ctl = pid_calc(pid, cpu->sample.busy_scaled);
 
-   steps = abs(ctl);
-
-   if (ctl < 0)
-   intel_pstate_pstate_increase(cpu, steps);
-   else
-   intel_pstate_pstate_decrease(cpu, steps);
+   /* Negative values of ctl increase the pstate and vice versa */
+   intel_pstate_set_pstate(cpu, cpu->pstate.current_pstate - ctl);
 }
 
 static void intel_pstate_timer_func(unsigned long __data)
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] cpufreq: intel_pstate: Debugfs file addition and cleanups

2014-06-09 Thread Stratos Karafotis
Hi all,

Below some work on intel_pstate driver.
Mostly a cleanup (patches 1/7, 5/7, 6/7, 7/7), some code simplification
(patch 4/7) and the addition of stats file in debugfs that presents
driver's statistics (patch 3/7).

Please note that patch 3/7 depends on 2/7 (which also simplifies calculation
if tracing is on).


Thanks!

Stratos Karafotis (7):
  cpufreq: intel_pstate: Remove duplicate CPU ID check
  cpufreq: intel_pstate: Avoid duplicate call of
intel_pstate_get_scaled_busy
  cpufreq: intel_pstate: Add debugfs file stats
  cpufreq: intel_pstate: Simplify code in
intel_pstate_adjust_busy_pstate
  cpufreq: intel_pstate: Remove redundant includes
  cpufreq: intel_pstate: Trivial code cleanup
  cpufreq: intel_pstate: Make intel_pstate_kobject local

 drivers/cpufreq/intel_pstate.c | 237 +++--
 1 file changed, 135 insertions(+), 102 deletions(-)

-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: intel_pstate: Remove unused member name of cpudata

2014-05-20 Thread Stratos Karafotis
Although, a value is assigned to member name of struct cpudata,
it is never used.

We can safely remove it.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 24a534a..a6d5afa 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -88,8 +88,6 @@ struct _pid {
 struct cpudata {
int cpu;
 
-   char name[64];
-
struct timer_list timer;
 
struct pstate_data pstate;
@@ -544,8 +542,6 @@ static inline void intel_pstate_pstate_decrease(struct 
cpudata *cpu, int steps)
 
 static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
 {
-   sprintf(cpu->name, "Intel 2nd generation core");
-
cpu->pstate.min_pstate = pstate_funcs.get_min();
cpu->pstate.max_pstate = pstate_funcs.get_max();
cpu->pstate.turbo_pstate = pstate_funcs.get_turbo();
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-16 Thread Stratos Karafotis
Hi all!

On 12/05/2014 11:30 μμ, Stratos Karafotis wrote:
> On 09/05/2014 05:56 μμ, Stratos Karafotis wrote:
>> Hi Dirk,
>>
>> On 08/05/2014 11:52 μμ, Dirk Brandewie wrote:
>>> On 05/05/2014 04:57 PM, Stratos Karafotis wrote:
>>>> Currently the driver calculates the next pstate proportional to
>>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>>
>>>> Using the scaled load (core_busy) to calculate the next pstate
>>>> is not always correct, because there are cases that the load is
>>>> independent from current pstate. For example, a tight 'for' loop
>>>> through many sampling intervals will cause a load of 100% in
>>>> every pstate.
>>>>
>>>> So, change the above method and calculate the next pstate with
>>>> the assumption that the next pstate should not depend on the
>>>> current pstate. The next pstate should only be proportional
>>>> to measured load. Use the linear function to calculate the load:
>>>>
>>>> Next P-state = A + B * load
>>>>
>>>> where A = min_state and B = (max_pstate - min_pstate) / 100
>>>> If turbo is enabled the B = (turbo_pstate - min_pstate) / 100
>>>> The load is calculated using the kernel time functions.
>>>>
>>
>> Thank you very much for your comments and for your time to test my patch!
>>
>>
>>>
>>> This will hurt your power numbers under "normal" conditions where you
>>> are not running a performance workload. Consider the following:
>>>
>>>1. The system is idle, all core at min P state and utilization is low 
>>> say < 10%
>>>2. You run something that drives the load as seen by the kernel to 100%
>>>   which scaled by the current P state.
>>>
>>> This would cause the P state to go from min -> max in one step.  Which is
>>> what you want if you are only looking at a single core.  But this will also
>>> drag every core in the package to the max P state as well.  This would be 
>>> fine
>>
>> I think, this will also happen using the original driver (before your
>> new patch 4/5), after some sampling intervals.
>>
>>
>>> if the power vs frequency cure was linear all the cores would finish
>>> their work faster and go idle sooner (race to halt) and maybe spend
>>> more time in a deeper C state which dwarfs the amount of power we can
>>> save by controlling P states. Unfortunately this is *not* the case, 
>>> power vs frequency curve is non-linear and get very steep in the turbo
>>> range.  If it were linear there would be no reason to have P state
>>> control you could select the highest P state and walk away.
>>>
>>> Being conservative on the way up and aggressive on way down give you
>>> the best power efficiency on non-benchmark loads.  Most benchmarks
>>> are pretty useless for measuring power efficiency (unless they were
>>> designed for it) since they are measuring how fast something can be
>>> done which is measuring the efficiency at max performance.
>>>
>>> The performance issues you pointed out were caused by commit 
>>> fcb6a15c intel_pstate: Take core C0 time into account for core busy 
>>> calculation
>>> and the ensuing problem is caused. These have been fixed in the patch set
>>>
>>>https://lkml.org/lkml/2014/5/8/574
>>>
>>> The performance comparison between before/after this patch set, your patch
>>> and ondemand/acpi_cpufreq is available at:
>>> http://openbenchmarking.org/result/1405085-PL-C0200965993
>>> ffmpeg was added to the set of benchmarks because there was a regression
>>> reported against this benchmark as well.
>>> https://bugzilla.kernel.org/show_bug.cgi?id=75121
>>
>> Of course, I agree generally with your comments above. But I believe that
>> the we should scale the core as soon as we measure high load. 
>>
>> I tested your new patches and I confirm your benchmarks. But I think
>> they are against the above theory (at least on low loads).
>> With the new patches I get increased frequencies even on an idle system.
>> Please compare the results below.
>>
>> With your latest patches during a mp3 decoding (a non-benchmark load)
>> the energy consumption increased to 5187.52 J from 5036.57 J (almost 3%).
>>
>>
>> Thanks again,
>> Stratos
>>
> 
> I would like to explain a little bit further the logic behin

[PATCH RESEND] cpufreq: Break out early when frequency equals target_freq

2014-05-14 Thread Stratos Karafotis
Many drivers keep frequencies in frequency table in ascending
or descending order. When governor tries to change to policy->min
or policy->max respectively then the cpufreq_frequency_table_target
could return on first iteration. This will save some iteration cycles.

So, break out early when a frequency in cpufreq_frequency_table
equals to target one.

Testing this during kernel compilation using ondemand governor
with a frequency table in ascending order, the
cpufreq_frequency_table_target returned early on the first
iteration at about 30% of times called.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/freq_table.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 8e518c6..1632981 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -137,9 +137,13 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
i = pos - table;
if ((freq < policy->min) || (freq > policy->max))
continue;
+   if (freq == target_freq) {
+   optimal.driver_data = i;
+   break;
+   }
switch (relation) {
case CPUFREQ_RELATION_H:
-   if (freq <= target_freq) {
+   if (freq < target_freq) {
if (freq >= optimal.frequency) {
optimal.frequency = freq;
optimal.driver_data = i;
@@ -152,7 +156,7 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
}
break;
case CPUFREQ_RELATION_L:
-   if (freq >= target_freq) {
+   if (freq > target_freq) {
if (freq <= optimal.frequency) {
optimal.frequency = freq;
optimal.driver_data = i;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-13 Thread Stratos Karafotis
On 13/05/2014 12:59 πμ, Yuyang Du wrote:
 Maybe, in some cases yes. But not always. 
 For example, please consider a CPU running a tight "for" loop in 100MHz 
 for a couple of seconds. This produces a load of 100%. 
 It will produce the same load (100%) in any other frequency. 
>>>
>>> Still fundamentally wrong, because you are not making a fair 
>>> comparison ("load" in 100MHz vs. any other freq). 
>>>
>>
>> I'm sorry, I didn't understand you. What do you mean it's not fair?
>>
>> In the above example (considering a CPU with min freq 100MHz and max freq 
>> 1000Mhz) a load of 100% should also be 100 in other next frequency.
>>
>> If we scale the load we will calculate the load in 100Mhz to 10%. I believe 
>> that this is not true.
> 
> The amount of work @100MHz is the same as the amount of work @1000MHZ, in your
> example? Put another way, your proposed method does not do any extra better,
> but do worse in other cases (what if @1000MHz, the load drops to 10%).
> 
> That said, your case cannot be used against associating freq with load. That 
> also
> said, by associating freq with load, we will finally get highest freq as well
> (in your case).
> 
> Yuyang
> 

[I rewrite my last post, because I think something happened with my email server
and the message haven't delivered properly]

I mean that if a CPU was busy 100% at 100MHz it would be most probably (or we
should consider that would be) busy 100% at 1000MHz.

We don't know the amount of load in next sampling period. We also
don't know the type of load. A mathematical calculation that started in
previous sampling period and kept the CPU 100% busy, will most probably
keep the CPU also 100% busy in the next sampling period.

Scaling the load will be wrong in this case.

Of course, I don't say that the "amount" of load in these 2 periods are the 
same.

If @1000Mhz the load drops to 10%, the proposed method will select as target 
freq
190MHz.


Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-12 Thread Stratos Karafotis
On 12/05/2014 11:01 μμ, Yuyang Du wrote:
> On Tue, May 13, 2014 at 06:59:42AM +0300, Stratos Karafotis wrote:
>> Hi,
>>
>> On 12/05/2014 10:34 μμ, Yuyang Du wrote:
>>> On Mon, May 12, 2014 at 11:30:03PM +0300, Stratos Karafotis wrote:
>>>> On 09/05/2014 05:56 μμ, Stratos Karafotis wrote:
>>>>
>>>> Next performance state = min_perf + (max_perf - min_perf) * load / 100
>>>>
>>> Hi,
>>>
>>> This formula is fundamentally broken. You need to associate the load with 
>>> its
>>> frequency.
>>
>> Could you please explain why is it broken? I think the load should be
>> independent from the current frequency.
> 
> Why independent? The load not (somewhat) determined by that?
> 
> 

Maybe, in some cases yes. But not always.
For example, please consider a CPU running a tight "for" loop in 100MHz
for a couple of seconds. This produces a load of 100%.
It will produce the same load (100%) in any other frequency.


Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-12 Thread Stratos Karafotis
Hi,

On 12/05/2014 10:34 μμ, Yuyang Du wrote:
> On Mon, May 12, 2014 at 11:30:03PM +0300, Stratos Karafotis wrote:
>> On 09/05/2014 05:56 μμ, Stratos Karafotis wrote:
>>
>> Next performance state = min_perf + (max_perf - min_perf) * load / 100
>>
> Hi,
> 
> This formula is fundamentally broken. You need to associate the load with its
> frequency.

Could you please explain why is it broken? I think the load should be
independent from the current frequency.

Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-12 Thread Stratos Karafotis
On 09/05/2014 05:56 μμ, Stratos Karafotis wrote:
> Hi Dirk,
> 
> On 08/05/2014 11:52 μμ, Dirk Brandewie wrote:
>> On 05/05/2014 04:57 PM, Stratos Karafotis wrote:
>>> Currently the driver calculates the next pstate proportional to
>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>
>>> Using the scaled load (core_busy) to calculate the next pstate
>>> is not always correct, because there are cases that the load is
>>> independent from current pstate. For example, a tight 'for' loop
>>> through many sampling intervals will cause a load of 100% in
>>> every pstate.
>>>
>>> So, change the above method and calculate the next pstate with
>>> the assumption that the next pstate should not depend on the
>>> current pstate. The next pstate should only be proportional
>>> to measured load. Use the linear function to calculate the load:
>>>
>>> Next P-state = A + B * load
>>>
>>> where A = min_state and B = (max_pstate - min_pstate) / 100
>>> If turbo is enabled the B = (turbo_pstate - min_pstate) / 100
>>> The load is calculated using the kernel time functions.
>>>
> 
> Thank you very much for your comments and for your time to test my patch!
> 
> 
>>
>> This will hurt your power numbers under "normal" conditions where you
>> are not running a performance workload. Consider the following:
>>
>>1. The system is idle, all core at min P state and utilization is low say 
>> < 10%
>>2. You run something that drives the load as seen by the kernel to 100%
>>   which scaled by the current P state.
>>
>> This would cause the P state to go from min -> max in one step.  Which is
>> what you want if you are only looking at a single core.  But this will also
>> drag every core in the package to the max P state as well.  This would be 
>> fine
> 
> I think, this will also happen using the original driver (before your
> new patch 4/5), after some sampling intervals.
> 
> 
>> if the power vs frequency cure was linear all the cores would finish
>> their work faster and go idle sooner (race to halt) and maybe spend
>> more time in a deeper C state which dwarfs the amount of power we can
>> save by controlling P states. Unfortunately this is *not* the case, 
>> power vs frequency curve is non-linear and get very steep in the turbo
>> range.  If it were linear there would be no reason to have P state
>> control you could select the highest P state and walk away.
>>
>> Being conservative on the way up and aggressive on way down give you
>> the best power efficiency on non-benchmark loads.  Most benchmarks
>> are pretty useless for measuring power efficiency (unless they were
>> designed for it) since they are measuring how fast something can be
>> done which is measuring the efficiency at max performance.
>>
>> The performance issues you pointed out were caused by commit 
>> fcb6a15c intel_pstate: Take core C0 time into account for core busy 
>> calculation
>> and the ensuing problem is caused. These have been fixed in the patch set
>>
>>https://lkml.org/lkml/2014/5/8/574
>>
>> The performance comparison between before/after this patch set, your patch
>> and ondemand/acpi_cpufreq is available at:
>> http://openbenchmarking.org/result/1405085-PL-C0200965993
>> ffmpeg was added to the set of benchmarks because there was a regression
>> reported against this benchmark as well.
>> https://bugzilla.kernel.org/show_bug.cgi?id=75121
> 
> Of course, I agree generally with your comments above. But I believe that
> the we should scale the core as soon as we measure high load. 
> 
> I tested your new patches and I confirm your benchmarks. But I think
> they are against the above theory (at least on low loads).
> With the new patches I get increased frequencies even on an idle system.
> Please compare the results below.
> 
> With your latest patches during a mp3 decoding (a non-benchmark load)
> the energy consumption increased to 5187.52 J from 5036.57 J (almost 3%).
> 
> 
> Thanks again,
> Stratos
> 

I would like to explain a little bit further the logic behind this patch.

The patch is based on the following assumptions (some of them are pretty
obvious but please let me mention them):

1) We define the load of the CPU as the percentage of sampling period that
CPU was busy (not idle), as measured by the kernel.

2) It's not possible to predict (with accuracy) the load of a CPU in future
sampling periods.

3) The load in the next sampling interval is most probable t

Re: [PATCH 4/5] intel_pstate: Remove C0 tracking

2014-05-11 Thread Stratos Karafotis
Hi,

On 12/05/2014 05:14 πμ, Stratos Karafotis wrote:
> From: Dirk Brandewie 
> 
> Commit fcb6a15c intel_pstate: Take core C0 time into account for core busy
> introduced a regression referenced below.  The issue with "lockup"
> after suspend that this commit was addressing is now dealt with in the
> suspend path.
> 
> References:
>https://bugzilla.kernel.org/show_bug.cgi?id=66581
>https://bugzilla.kernel.org/show_bug.cgi?id=75121
> 
> Reported-by: Doug Smythies 
> Signed-off-by: Dirk Brandewie 
> ---
>  drivers/cpufreq/intel_pstate.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index bb20881..4c26faf 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -59,7 +59,6 @@ struct sample {
> int32_t core_pct_busy;
> u64 aperf;
> u64 mperf;
> -   unsigned long long tsc;
> int freq;
>  };
> 
> @@ -100,7 +99,6 @@ struct cpudata {
> 
> u64 prev_aperf;
> u64 prev_mperf;
> -   unsigned long long prev_tsc;
> struct sample sample;
>  };
> 
> @@ -561,46 +559,37 @@ static inline void intel_pstate_calc_busy(struct
> cpudata *cpu,
> struct sample *sample)
>  {
> int32_t core_pct;
> -   int32_t c0_pct;
> 
> core_pct = div_fp(int_tofp((sample->aperf)),
> int_tofp((sample->mperf)));
> core_pct = mul_fp(core_pct, int_tofp(100));
> FP_ROUNDUP(core_pct);
> 
> -   c0_pct = div_fp(int_tofp(sample->mperf), int_tofp(sample->tsc));
> -
> sample->freq = fp_toint(
> mul_fp(int_tofp(cpu->pstate.max_pstate * 1000), core_pct));
> 
> -   sample->core_pct_busy = mul_fp(core_pct, c0_pct);
> +   sample->core_pct_busy = core_pct;
>  }
> 
>  static inline void intel_pstate_sample(struct cpudata *cpu)
>  {
> u64 aperf, mperf;
> -   unsigned long long tsc;
> 
> rdmsrl(MSR_IA32_APERF, aperf);
> rdmsrl(MSR_IA32_MPERF, mperf);
> -   tsc = native_read_tsc();
> 
> aperf = aperf >> FRAC_BITS;
> mperf = mperf >> FRAC_BITS;
> -   tsc = tsc >> FRAC_BITS;
> 
> cpu->sample.aperf = aperf;
> cpu->sample.mperf = mperf;
> -   cpu->sample.tsc = tsc;
> cpu->sample.aperf -= cpu->prev_aperf;
> cpu->sample.mperf -= cpu->prev_mperf;
> -   cpu->sample.tsc -= cpu->prev_tsc;
> 
> intel_pstate_calc_busy(cpu, &cpu->sample);
> 
> cpu->prev_aperf = aperf;
> cpu->prev_mperf = mperf;
> -   cpu->prev_tsc = tsc;
>  }
> 
>  static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> --
> 1.9.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

With this patch, my CPU (Core i7-3770 @ 3.90GHz) seems to never use lowest
frequencies. Even on an idle system I get always ~2GHz. Normally,
on an idle system it used to be 1.6GHz.
On very small loads (mp3 decoding) the CPU goes up to 2.7G GHz (it used to
be 1.6GHz)

Reverting, this patch on my local build, the problem is resolved.


Thanks,
Stratos Karafotis

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] cpufreq: powernow-k8: Suppress checkpatch warnings

2014-05-11 Thread Stratos Karafotis
Suppress the following checkpatch.pl warnings:

- WARNING: Prefer pr_err(... to printk(KERN_ERR ...
- WARNING: Prefer pr_info(... to printk(KERN_INFO ...
- WARNING: Prefer pr_warn(... to printk(KERN_WARNING ...
- WARNING: quoted string split across lines
- WARNING: please, no spaces at the start of a line

Also, define the pr_fmt macro instead of PFX for the module name.

Signed-off-by: Stratos Karafotis 
---

Changes v1 -> v2
- Use pr_err_once instead of printk_once
- Change missing_pss_msg to macro (because pr_err_once
doesn't compile otherwise)
- Put one pr_err message in a single line instead of two
- Ignore "line over 80 characters" warnings
- Change the word "Fix" in the subject of the patch to
"Suppress" as the patch doesn't really fix anything

 drivers/cpufreq/powernow-k8.c | 180 +-
 drivers/cpufreq/powernow-k8.h |   2 +-
 2 files changed, 74 insertions(+), 108 deletions(-)

diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index 1b6ae6b..f9ce7e4 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -27,6 +27,8 @@
  *  power and thermal data sheets, (e.g. 30417.pdf, 30430.pdf, 43375.pdf)
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -45,7 +47,6 @@
 #include 
 #include 
 
-#define PFX "powernow-k8: "
 #define VERSION "version 2.20.00"
 #include "powernow-k8.h"
 
@@ -161,7 +162,7 @@ static int write_new_fid(struct powernow_k8_data *data, u32 
fid)
u32 i = 0;
 
if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
-   printk(KERN_ERR PFX "internal error - overflow on fid write\n");
+   pr_err("internal error - overflow on fid write\n");
return 1;
}
 
@@ -175,9 +176,7 @@ static int write_new_fid(struct powernow_k8_data *data, u32 
fid)
do {
wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
if (i++ > 100) {
-   printk(KERN_ERR PFX
-   "Hardware error - pending bit very stuck - "
-   "no further pstate changes possible\n");
+   pr_err("Hardware error - pending bit very stuck - no 
further pstate changes possible\n");
return 1;
}
} while (query_current_values_with_pending_wait(data));
@@ -185,15 +184,13 @@ static int write_new_fid(struct powernow_k8_data *data, 
u32 fid)
count_off_irt(data);
 
if (savevid != data->currvid) {
-   printk(KERN_ERR PFX
-   "vid change on fid trans, old 0x%x, new 0x%x\n",
-   savevid, data->currvid);
+   pr_err("vid change on fid trans, old 0x%x, new 0x%x\n",
+  savevid, data->currvid);
return 1;
}
 
if (fid != data->currfid) {
-   printk(KERN_ERR PFX
-   "fid trans failed, fid 0x%x, curr 0x%x\n", fid,
+   pr_err("fid trans failed, fid 0x%x, curr 0x%x\n", fid,
data->currfid);
return 1;
}
@@ -209,7 +206,7 @@ static int write_new_vid(struct powernow_k8_data *data, u32 
vid)
int i = 0;
 
if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
-   printk(KERN_ERR PFX "internal error - overflow on vid write\n");
+   pr_err("internal error - overflow on vid write\n");
return 1;
}
 
@@ -223,23 +220,19 @@ static int write_new_vid(struct powernow_k8_data *data, 
u32 vid)
do {
wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
if (i++ > 100) {
-   printk(KERN_ERR PFX "internal error - pending bit "
-   "very stuck - no further pstate "
-   "changes possible\n");
+   pr_err("internal error - pending bit very stuck - no 
further pstate changes possible\n");
return 1;
}
} while (query_current_values_with_pending_wait(data));
 
if (savefid != data->currfid) {
-   printk(KERN_ERR PFX "fid changed on vid trans, old "
-   "0x%x new 0x%x\n",
-  savefid, data->currfid);
+   pr_err("fid changed on vid trans, old 0x%x new 0x%x\n",
+   savefid, data->currfid);
return 1;
}
 
if (vid != data->currvid) {
-   

Re: [RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-09 Thread Stratos Karafotis
Hi Dirk,

On 08/05/2014 11:52 μμ, Dirk Brandewie wrote:
> On 05/05/2014 04:57 PM, Stratos Karafotis wrote:
>> Currently the driver calculates the next pstate proportional to
>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>
>> Using the scaled load (core_busy) to calculate the next pstate
>> is not always correct, because there are cases that the load is
>> independent from current pstate. For example, a tight 'for' loop
>> through many sampling intervals will cause a load of 100% in
>> every pstate.
>>
>> So, change the above method and calculate the next pstate with
>> the assumption that the next pstate should not depend on the
>> current pstate. The next pstate should only be proportional
>> to measured load. Use the linear function to calculate the load:
>>
>> Next P-state = A + B * load
>>
>> where A = min_state and B = (max_pstate - min_pstate) / 100
>> If turbo is enabled the B = (turbo_pstate - min_pstate) / 100
>> The load is calculated using the kernel time functions.
>>

Thank you very much for your comments and for your time to test my patch!


> 
> This will hurt your power numbers under "normal" conditions where you
> are not running a performance workload. Consider the following:
> 
>1. The system is idle, all core at min P state and utilization is low say 
> < 10%
>2. You run something that drives the load as seen by the kernel to 100%
>   which scaled by the current P state.
> 
> This would cause the P state to go from min -> max in one step.  Which is
> what you want if you are only looking at a single core.  But this will also
> drag every core in the package to the max P state as well.  This would be fine

I think, this will also happen using the original driver (before your
new patch 4/5), after some sampling intervals.


> if the power vs frequency cure was linear all the cores would finish
> their work faster and go idle sooner (race to halt) and maybe spend
> more time in a deeper C state which dwarfs the amount of power we can
> save by controlling P states. Unfortunately this is *not* the case, 
> power vs frequency curve is non-linear and get very steep in the turbo
> range.  If it were linear there would be no reason to have P state
> control you could select the highest P state and walk away.
> 
> Being conservative on the way up and aggressive on way down give you
> the best power efficiency on non-benchmark loads.  Most benchmarks
> are pretty useless for measuring power efficiency (unless they were
> designed for it) since they are measuring how fast something can be
> done which is measuring the efficiency at max performance.
> 
> The performance issues you pointed out were caused by commit 
> fcb6a15c intel_pstate: Take core C0 time into account for core busy 
> calculation
> and the ensuing problem is caused. These have been fixed in the patch set
> 
>https://lkml.org/lkml/2014/5/8/574
> 
> The performance comparison between before/after this patch set, your patch
> and ondemand/acpi_cpufreq is available at:
> http://openbenchmarking.org/result/1405085-PL-C0200965993
> ffmpeg was added to the set of benchmarks because there was a regression
> reported against this benchmark as well.
> https://bugzilla.kernel.org/show_bug.cgi?id=75121

Of course, I agree generally with your comments above. But I believe that
the we should scale the core as soon as we measure high load. 

I tested your new patches and I confirm your benchmarks. But I think
they are against the above theory (at least on low loads).
With the new patches I get increased frequencies even on an idle system.
Please compare the results below.

With your latest patches during a mp3 decoding (a non-benchmark load)
the energy consumption increased to 5187.52 J from 5036.57 J (almost 3%).


Thanks again,
Stratos



With my patch
-
[root@albert ~]# 
/home/stratosk/kernels/linux-pm/tools/power/x86/turbostat/turbostat -i 60
Core CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  CPU%c3  
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt 
GFXWatt 
   -   -   10.0616453392   00.260.00   
99.670.00  32  320.000.000.000.00   20.182.00   
 0.02
   0   0   20.1016233392   00.630.01   
99.260.00  32  320.000.000.000.00   20.182.00   
 0.02
   0   4   00.0116183392   00.72
   1   1   10.0316183392   00.030.00   
99.940.00  27
   1   5   00.0116063392   00.05
   2   2   00.0216353392   00.280.0

Re: [PATCH] cpufreq: Break out early when frequency equals target_freq

2014-05-08 Thread Stratos Karafotis
On 29/04/2014 11:28 μμ, Stratos Karafotis wrote:
> Many drivers keep frequencies in frequency table in ascending
> or descending order. When governor tries to change to policy->min
> or policy->max respectively then the cpufreq_frequency_table_target
> could return on first iteration. This will save some iteration cycles.
> 
> So, break out early when a frequency in cpufreq_frequency_table
> equals to target one.
> 
> Testing this during kernel compilation using ondemand governor
> with a frequency table in ascending order, the
> cpufreq_frequency_table_target returned early on the first
> iteration at about 30% of times called.
> 
> Signed-off-by: Stratos Karafotis 
> ---
>  drivers/cpufreq/freq_table.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
> index 8e518c6..1632981 100644
> --- a/drivers/cpufreq/freq_table.c
> +++ b/drivers/cpufreq/freq_table.c
> @@ -137,9 +137,13 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
> *policy,
>   i = pos - table;
>   if ((freq < policy->min) || (freq > policy->max))
>   continue;
> + if (freq == target_freq) {
> + optimal.driver_data = i;
> + break;
> + }
>   switch (relation) {
>   case CPUFREQ_RELATION_H:
> - if (freq <= target_freq) {
> + if (freq < target_freq) {
>   if (freq >= optimal.frequency) {
>   optimal.frequency = freq;
>   optimal.driver_data = i;
> @@ -152,7 +156,7 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
> *policy,
>   }
>   break;
>   case CPUFREQ_RELATION_L:
> - if (freq >= target_freq) {
> + if (freq > target_freq) {
>   if (freq <= optimal.frequency) {
>   optimal.frequency = freq;
>   optimal.driver_data = i;
> 

Gentle reminder.

Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: Fix build error on some platforms that use cpufreq_for_each_*

2014-05-07 Thread Stratos Karafotis
On platforms that use cpufreq_for_each_* macros, build fails if
CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform:

drivers/built-in.o: In function `clk_round_parent':
clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
drivers/built-in.o: In function `clk_rate_table_find':
clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
make[3]: *** [vmlinux] Error 1

Fix this making cpufreq_next_valid function inline and move it to
cpufreq.h.

Reported-by: Geert Uytterhoeven 
Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/cpufreq.c | 11 ---
 include/linux/cpufreq.h   | 11 +--
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index bfe82b6..a05c921 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -237,17 +237,6 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
 
-bool cpufreq_next_valid(struct cpufreq_frequency_table **pos)
-{
-   while ((*pos)->frequency != CPUFREQ_TABLE_END)
-   if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
-   return true;
-   else
-   (*pos)++;
-   return false;
-}
-EXPORT_SYMBOL_GPL(cpufreq_next_valid);
-
 /*
  *EXTERNALLY AFFECTING FREQUENCY CHANGES *
  */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 9d803b5..3f45889 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -489,8 +489,15 @@ static inline void dev_pm_opp_free_cpufreq_table(struct 
device *dev,
 }
 #endif
 
-
-bool cpufreq_next_valid(struct cpufreq_frequency_table **pos);
+static inline bool cpufreq_next_valid(struct cpufreq_frequency_table **pos)
+{
+   while ((*pos)->frequency != CPUFREQ_TABLE_END)
+   if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
+   return true;
+   else
+   (*pos)++;
+   return false;
+}
 
 /*
  * cpufreq_for_each_entry -iterate over a cpufreq_frequency_table
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/8] Introduce new cpufreq helper macros

2014-05-07 Thread Stratos Karafotis
Hi Rafael,

On 07/05/2014 04:13 μμ, Rafael J. Wysocki wrote:
> On Wednesday, May 07, 2014 10:53:16 AM Viresh Kumar wrote:
>> On 6 May 2014 23:25, Stratos Karafotis  wrote:
>>> My bad. I'm sorry for this. :(
>>>
>>> Rafael,
>>> A solution could be to make cpufreq_next_valid an inline function in 
>>> cpufreq.h,
>>> but as Viresh mentioned this would be very inefficient because of multiple 
>>> copies.
>>
>> That statement was true when we didn't had this problem..
>>
>>> So, maybe it's better to revert the 2 patches that don't depend on 
>>> CONFIG_CPU_FREQ:
>>>
>>> 4229e1c61a4a ("sh: clk: Use cpufreq_for_each_valid_entry macro for 
>>> iteration") and
>>> 04ae58645afa ("irda: sh_sir: Use cpufreq_for_each_valid_entry macro for 
>>> iteration").
>>
>> This doesn't look right. It can happen to some other drivers as well in 
>> future.
>> So, there are two solutions I can think of:
>> 1. move cpufreq_next_valid and rename it to __cpufreq_next_valid(). Also 
>> make it
>> inline. Then create two versions of cpufreq_next_valid(), one inlined (only 
>> when
>> CONFIG_CPU_FREQ=n) and other one in cpufreq.c (non- inlined)..
>>
>> But probably that would be called ugly by some people :)
>>
>> 2. Make cpufreq_next_valid() inline and forget about extra space it takes :)
>>
>> @Rafel: Let me know which one you like :)
> 
> 2.
> 
> 

Do you want me to resend the entire patch set or only patch 1/8?


Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/8] Introduce new cpufreq helper macros

2014-05-06 Thread Stratos Karafotis
Hi all,

On 06/05/2014 06:24 μμ, Geert Uytterhoeven wrote:
> Hi Stratos,
> 
> On Wed, Apr 30, 2014 at 12:26 AM, Rafael J. Wysocki  
> wrote:
>> On Tuesday, April 29, 2014 07:05:17 PM Stratos Karafotis wrote:
>>> On 29/04/2014 07:17 πμ, Viresh Kumar wrote:
>>>> On 26 April 2014 01:45, Stratos Karafotis  wrote:
>>>>> This patch set introduces two freq_table helper macros which
>>>>> can be used for iteration over cpufreq_frequency_table and
>>>>> makes the necessary changes to cpufreq core and drivers that
>>>>> use such an iteration procedure.
>>>>>
>>>>> The motivation was a usage of common procedure to iterate over
>>>>> cpufreq_frequency_table across all drivers and cpufreq core.
>>>>>
>>>>> This was tested on a x86_64 platform.
>>>>> Most files compiled successfully but unfortunately I was not
>>>>> able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
>>>>> due to lack of cross compiler.
>>>>>
>>>>> Changelog
>>>>>
>>>>> v4 -> v5
>>>>> - Fix warnings in printk format specifier for 32 bit
>>>>>   architectures in freq_table.c, longhaul, pasemi, ppc_cbe
>>>>
>>>> Doesn't look much has changed and so it stays as is:
>>>>
>>>> Acked-by: Viresh Kumar 
>>>>
>>>
>>> Thank you very much!
>>
>> I've applied the series to my bleeding-edge branch, will move it to 
>> linux-next
>> after build testing later this week.
> 
> This breaks if CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/
> non-multiplatform:
> 
> drivers/built-in.o: In function `clk_round_parent':
> clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
> drivers/built-in.o: In function `clk_rate_table_find':
> clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
> make[3]: *** [vmlinux] Error 1
> 
> drivers/sh/clk/core.c (pre-CCF shmobile clock core) calls
> cpufreq_for_each_valid_entry():
> 
> #define cpufreq_for_each_valid_entry(pos, table)\
> for (pos = table; cpufreq_next_valid(&pos); pos++)
> 
> but cpufreq_next_valid() in drivers/cpufreq/cpufreq.c is not
> compiled in.

My bad. I'm sorry for this. :(

Rafael,
A solution could be to make cpufreq_next_valid an inline function in cpufreq.h,
but as Viresh mentioned this would be very inefficient because of multiple 
copies.

So, maybe it's better to revert the 2 patches that don't depend on 
CONFIG_CPU_FREQ:

4229e1c61a4a ("sh: clk: Use cpufreq_for_each_valid_entry macro for iteration") 
and
04ae58645afa ("irda: sh_sir: Use cpufreq_for_each_valid_entry macro for 
iteration").

Do I need to send revert patches for the above?


Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-05 Thread Stratos Karafotis
Currently the driver calculates the next pstate proportional to
core_busy factor, scaled by the ratio max_pstate / current_pstate.

Using the scaled load (core_busy) to calculate the next pstate
is not always correct, because there are cases that the load is
independent from current pstate. For example, a tight 'for' loop
through many sampling intervals will cause a load of 100% in
every pstate.

So, change the above method and calculate the next pstate with
the assumption that the next pstate should not depend on the
current pstate. The next pstate should only be proportional
to measured load. Use the linear function to calculate the load:

Next P-state = A + B * load

where A = min_state and B = (max_pstate - min_pstate) / 100
If turbo is enabled the B = (turbo_pstate - min_pstate) / 100
The load is calculated using the kernel time functions.

Also remove the unused pid_calc function and pid structure and
related helper functions. 

Tested on Intel i7-3770 CPU @ 3.40GHz.
Phoronix benchmark of Linux Kernel Compilation 3.1 test (CPU busy 86%)
shows an increase ~1.35% in performance and a decrease by ~0.22% in
energy consumption. When turbo was disabled there was an increase by
~0.94% and a decrease by ~0.37% in energy consumption.

Phoronix Apache benchmark shows more interesting results.
With a CPU busy ~32% there was an increase in performance by ~46.84%
and a decrease in energy consumption by ~4.78%
When turbo was disabled, the performance boost was ~38.56 and
the decrease in energy consumption ~7.96%

Signed-off-by: Stratos Karafotis 
---

Detailed test results can be found in this link:
https://docs.google.com/spreadsheets/d/1xiw8FOswoNFA8seNMz0nYUdhjPPvJ8J2S54kG02dOP8/edit?usp=sharing

 drivers/cpufreq/intel_pstate.c | 208 +++--
 1 file changed, 35 insertions(+), 173 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0999673..124c675 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -32,8 +32,6 @@
 #include 
 #include 
 
-#define SAMPLE_COUNT   3
-
 #define BYT_RATIOS 0x66a
 #define BYT_VIDS   0x66b
 #define BYT_TURBO_RATIOS   0x66c
@@ -55,10 +53,11 @@ static inline int32_t div_fp(int32_t x, int32_t y)
 }
 
 struct sample {
-   int32_t core_pct_busy;
+   unsigned int core_pct_busy;
+   unsigned int duration_us;
+   unsigned int idletime_us;
u64 aperf;
u64 mperf;
-   unsigned long long tsc;
int freq;
 };
 
@@ -75,16 +74,6 @@ struct vid_data {
int32_t ratio;
 };
 
-struct _pid {
-   int setpoint;
-   int32_t integral;
-   int32_t p_gain;
-   int32_t i_gain;
-   int32_t d_gain;
-   int deadband;
-   int32_t last_err;
-};
-
 struct cpudata {
int cpu;
 
@@ -94,22 +83,17 @@ struct cpudata {
 
struct pstate_data pstate;
struct vid_data vid;
-   struct _pid pid;
 
+   ktime_t prev_sample;
+   u64 prev_idle_time_us;
u64 prev_aperf;
u64 prev_mperf;
-   unsigned long long prev_tsc;
struct sample sample;
 };
 
 static struct cpudata **all_cpu_data;
 struct pstate_adjust_policy {
int sample_rate_ms;
-   int deadband;
-   int setpoint;
-   int p_gain_pct;
-   int d_gain_pct;
-   int i_gain_pct;
 };
 
 struct pstate_funcs {
@@ -148,87 +132,10 @@ static struct perf_limits limits = {
.max_sysfs_pct = 100,
 };
 
-static inline void pid_reset(struct _pid *pid, int setpoint, int busy,
-   int deadband, int integral) {
-   pid->setpoint = setpoint;
-   pid->deadband  = deadband;
-   pid->integral  = int_tofp(integral);
-   pid->last_err  = int_tofp(setpoint) - int_tofp(busy);
-}
-
-static inline void pid_p_gain_set(struct _pid *pid, int percent)
-{
-   pid->p_gain = div_fp(int_tofp(percent), int_tofp(100));
-}
-
-static inline void pid_i_gain_set(struct _pid *pid, int percent)
-{
-   pid->i_gain = div_fp(int_tofp(percent), int_tofp(100));
-}
-
-static inline void pid_d_gain_set(struct _pid *pid, int percent)
-{
-
-   pid->d_gain = div_fp(int_tofp(percent), int_tofp(100));
-}
-
-static signed int pid_calc(struct _pid *pid, int32_t busy)
-{
-   signed int result;
-   int32_t pterm, dterm, fp_error;
-   int32_t integral_limit;
-
-   fp_error = int_tofp(pid->setpoint) - busy;
-
-   if (abs(fp_error) <= int_tofp(pid->deadband))
-   return 0;
-
-   pterm = mul_fp(pid->p_gain, fp_error);
-
-   pid->integral += fp_error;
-
-   /* limit the integral term */
-   integral_limit = int_tofp(30);
-   if (pid->integral > integral_limit)
-   pid->integral = integral_limit;
-   if (pid->integral < -integral_limit)
-   pid->integral = -integral_limit;
-
-   dterm = mul_fp(pid->d_gain, fp_error - pid->

Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-02 Thread Stratos Karafotis
On 02/05/2014 03:26 μμ, Rafael J. Wysocki wrote:
> On Thursday, May 01, 2014 06:48:08 PM Dirk Brandewie wrote:
>> On 05/01/2014 04:18 PM, Rafael J. Wysocki wrote:
>>> On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
>>>> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
>>>>> Currently the driver calculates the next pstate proportional to
>>>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>>>
>>>>> Using the scaled load (core_busy) to calculate the next pstate
>>>>> is not always correct, because there are cases that the load is
>>>>> independent from current pstate. For example, a tight 'for' loop
>>>>> through many sampling intervals will cause a load of 100% in
>>>>> every pstate.
>>>>>
>>>>> So, change the above method and calculate the next pstate with
>>>>> the assumption that the next pstate should not depend on the
>>>>> current pstate. The next pstate should only be directly
>>>>> proportional to measured load.
>>>>>
>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz.
>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>>> increase ~1.5% in performance. Below the test results using turbostat
>>>>> (5 iterations):
>>>>>
>>>>> Without patch:
>>>>>
>>>>> Ph. avg Time  Total time  PkgWatt Total Energy
>>>>>   79.63   266.416 57.74   15382.85984
>>>>>   79.63   265.609 57.87   15370.79283
>>>>>   79.57   266.994 57.54   15362.83476
>>>>>   79.53   265.304 57.83   15342.53032
>>>>>   79.71   265.977 57.76   15362.83152
>>>>> avg   79.61   266.06  57.74   15364.36985
>>>>>
>>>>> With patch:
>>>>>
>>>>> Ph. avg Time  Total time  PkgWatt Total Energy
>>>>>   78.23   258.826 59.14   15306.96964
>>>>>   78.41   259.110 59.15   15326.35650
>>>>>   78.40   258.530 59.26   15320.48780
>>>>>   78.46   258.673 59.20   15313.44160
>>>>>   78.19   259.075 59.16   15326.87700
>>>>> avg   78.34   258.842 59.18   15318.82650
>>>>>
>>>>> The total test time reduced by ~2.6%, while the total energy
>>>>> consumption during a test iteration reduced by ~0.35%
>>>>>
>>>>> Signed-off-by: Stratos Karafotis 
>>>>> ---
>>>>>
>>>>> Changes v1 -> v2
>>>>>   - Enhance change log as Rafael and Viresh suggested
>>>>>
>>>>>
>>>>>drivers/cpufreq/intel_pstate.c | 15 +++
>>>>>1 file changed, 7 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cpufreq/intel_pstate.c 
>>>>> b/drivers/cpufreq/intel_pstate.c
>>>>> index 0999673..8e309db 100644
>>>>> --- a/drivers/cpufreq/intel_pstate.c
>>>>> +++ b/drivers/cpufreq/intel_pstate.c
>>>>> @@ -608,28 +608,27 @@ static inline void 
>>>>> intel_pstate_set_sample_time(struct cpudata *cpu)
>>>>>   mod_timer_pinned(&cpu->timer, jiffies + delay);
>>>>>}
>>>>>
>>>>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>>>>> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
>>>>>{
>>>>> - int32_t core_busy, max_pstate, current_pstate;
>>>>> + int32_t core_busy, max_pstate;
>>>>>
>>>>>   core_busy = cpu->sample.core_pct_busy;
>>>>>   max_pstate = int_tofp(cpu->pstate.max_pstate);
>>>>> - current_pstate = int_tofp(cpu->pstate.current_pstate);
>>>>> - core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
>>>>> + core_busy = mul_fp(core_busy, max_pstate);
>>>>
>>>> NAK,  The goal of this code is to find out how busy the core is at the 
>>>> current
>>>> P state. This change will return a value WAY too high.
>>>>
>>>> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this 
>>>> code
&g

[PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate

2014-05-01 Thread Stratos Karafotis
Currently the driver calculates the next pstate proportional to
core_busy factor, scaled by the ratio max_pstate / current_pstate.

Using the scaled load (core_busy) to calculate the next pstate
is not always correct, because there are cases that the load is 
independent from current pstate. For example, a tight 'for' loop
through many sampling intervals will cause a load of 100% in
every pstate.

So, change the above method and calculate the next pstate with
the assumption that the next pstate should not depend on the
current pstate. The next pstate should only be directly
proportional to measured load. 

Tested on Intel i7-3770 CPU @ 3.40GHz.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. Below the test results using turbostat
(5 iterations):

Without patch:

Ph. avg TimeTotal time  PkgWatt Total Energy
79.63   266.416 57.74   15382.85984
79.63   265.609 57.87   15370.79283
79.57   266.994 57.54   15362.83476
79.53   265.304 57.83   15342.53032
79.71   265.977 57.76   15362.83152
avg 79.61   266.06  57.74   15364.36985

With patch:

Ph. avg TimeTotal time  PkgWatt Total Energy
78.23   258.826 59.14   15306.96964
78.41   259.110 59.15   15326.35650
78.40   258.530 59.26   15320.48780
78.46   258.673 59.20   15313.44160
78.19   259.075 59.16   15326.87700
avg 78.34   258.842 59.18   15318.82650

The total test time reduced by ~2.6%, while the total energy
consumption during a test iteration reduced by ~0.35%

Signed-off-by: Stratos Karafotis 
---

Changes v1 -> v2
- Enhance change log as Rafael and Viresh suggested
 

 drivers/cpufreq/intel_pstate.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0999673..8e309db 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct 
cpudata *cpu)
mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
 {
-   int32_t core_busy, max_pstate, current_pstate;
+   int32_t core_busy, max_pstate;
 
core_busy = cpu->sample.core_pct_busy;
max_pstate = int_tofp(cpu->pstate.max_pstate);
-   current_pstate = int_tofp(cpu->pstate.current_pstate);
-   core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+   core_busy = mul_fp(core_busy, max_pstate);
return FP_ROUNDUP(core_busy);
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-   int32_t busy_scaled;
+   int32_t busy;
struct _pid *pid;
signed int ctl = 0;
int steps;
 
pid = &cpu->pid;
-   busy_scaled = intel_pstate_get_scaled_busy(cpu);
+   busy = intel_pstate_get_busy(cpu);
 
-   ctl = pid_calc(pid, busy_scaled);
+   ctl = pid_calc(pid, busy);
 
steps = abs(ctl);
 
@@ -651,7 +650,7 @@ static void intel_pstate_timer_func(unsigned long __data)
intel_pstate_adjust_busy_pstate(cpu);
 
trace_pstate_sample(fp_toint(sample->core_pct_busy),
-   fp_toint(intel_pstate_get_scaled_busy(cpu)),
+   fp_toint(intel_pstate_get_busy(cpu)),
cpu->pstate.current_pstate,
sample->mperf,
sample->aperf,
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/8] Introduce new cpufreq helper macros

2014-04-30 Thread Stratos Karafotis
On 30/04/2014 01:26 πμ, Rafael J. Wysocki wrote:
> On Tuesday, April 29, 2014 07:05:17 PM Stratos Karafotis wrote:
>> On 29/04/2014 07:17 πμ, Viresh Kumar wrote:
>>> On 26 April 2014 01:45, Stratos Karafotis  wrote:
>>>> This patch set introduces two freq_table helper macros which
>>>> can be used for iteration over cpufreq_frequency_table and
>>>> makes the necessary changes to cpufreq core and drivers that
>>>> use such an iteration procedure.
>>>>
>>>> The motivation was a usage of common procedure to iterate over
>>>> cpufreq_frequency_table across all drivers and cpufreq core.
>>>>
>>>> This was tested on a x86_64 platform.
>>>> Most files compiled successfully but unfortunately I was not
>>>> able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
>>>> due to lack of cross compiler.
>>>>
>>>> Changelog
>>>>
>>>> v4 -> v5
>>>> - Fix warnings in printk format specifier for 32 bit
>>>>   architectures in freq_table.c, longhaul, pasemi, ppc_cbe
>>>
>>> Doesn't look much has changed and so it stays as is:
>>>
>>> Acked-by: Viresh Kumar 
>>>
>>
>> Thank you very much!
> 
> I've applied the series to my bleeding-edge branch, will move it to linux-next
> after build testing later this week.
> 
> Thanks!
> 

Thanks so much!


Stratos Karafotis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: Break out early when frequency equals target_freq

2014-04-29 Thread Stratos Karafotis
Many drivers keep frequencies in frequency table in ascending
or descending order. When governor tries to change to policy->min
or policy->max respectively then the cpufreq_frequency_table_target
could return on first iteration. This will save some iteration cycles.

So, break out early when a frequency in cpufreq_frequency_table
equals to target one.

Testing this during kernel compilation using ondemand governor
with a frequency table in ascending order, the
cpufreq_frequency_table_target returned early on the first
iteration at about 30% of times called.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/freq_table.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 8e518c6..1632981 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -137,9 +137,13 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
i = pos - table;
if ((freq < policy->min) || (freq > policy->max))
continue;
+   if (freq == target_freq) {
+   optimal.driver_data = i;
+   break;
+   }
switch (relation) {
case CPUFREQ_RELATION_H:
-   if (freq <= target_freq) {
+   if (freq < target_freq) {
if (freq >= optimal.frequency) {
optimal.frequency = freq;
optimal.driver_data = i;
@@ -152,7 +156,7 @@ int cpufreq_frequency_table_target(struct cpufreq_policy 
*policy,
}
break;
case CPUFREQ_RELATION_L:
-   if (freq >= target_freq) {
+   if (freq > target_freq) {
if (freq <= optimal.frequency) {
optimal.frequency = freq;
optimal.driver_data = i;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: intel_pstate: Remove sample parameter in intel_pstate_calc_busy

2014-04-29 Thread Stratos Karafotis
Since commit d37e2b7644 ("intel_pstate: remove unneeded sample buffers")
we use only one sample. So, there is no need to pass the sample
pointer to intel_pstate_calc_busy. Instead, get the pointer from
cpudata. Also, remove the unused SAMPLE_COUNT macro.

While at it, reformat the first line in this function.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 95b3958..8192ff1 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -32,8 +32,6 @@
 #include 
 #include 
 
-#define SAMPLE_COUNT   3
-
 #define BYT_RATIOS 0x66a
 #define BYT_VIDS   0x66b
 #define BYT_TURBO_RATIOS   0x66c
@@ -553,14 +551,13 @@ static void intel_pstate_get_cpu_pstates(struct cpudata 
*cpu)
intel_pstate_set_pstate(cpu, cpu->pstate.max_pstate);
 }
 
-static inline void intel_pstate_calc_busy(struct cpudata *cpu,
-   struct sample *sample)
+static inline void intel_pstate_calc_busy(struct cpudata *cpu)
 {
+   struct sample *sample = &cpu->sample;
int32_t core_pct;
int32_t c0_pct;
 
-   core_pct = div_fp(int_tofp((sample->aperf)),
-   int_tofp((sample->mperf)));
+   core_pct = div_fp(int_tofp(sample->aperf), int_tofp(sample->mperf));
core_pct = mul_fp(core_pct, int_tofp(100));
FP_ROUNDUP(core_pct);
 
@@ -595,7 +592,7 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
cpu->sample.mperf -= cpu->prev_mperf;
cpu->sample.tsc -= cpu->prev_tsc;
 
-   intel_pstate_calc_busy(cpu, &cpu->sample);
+   intel_pstate_calc_busy(cpu);
 
cpu->prev_aperf = aperf;
cpu->prev_mperf = mperf;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: intel_pstate: Disable interrupts during MSRs reading

2014-04-29 Thread Stratos Karafotis
According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".

So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
Since the TSC is also takes place in the calculation include it in
the protected section. This should increase the accuracy of the
calculations.

Tested on Intel i7-3770 CPU @ 3.40GHz.
Benchmarks (Phoronix Linux compilation) show a slightly increase
in performance by ~0.1% and a decrease in power consumption by
~0.13%.

Signed-off-by: Stratos Karafotis 
---

I hope it's not too inappropriate to post test results in
conjunction with my previous patch about the change of calculation
method of next p-state (https://lkml.org/lkml/2014/4/27/100).
I used Phoronix benchmark - Linux Kernel Compilation 3.1.
Below the test results using turbostat (5 iterations):

Without 2 patches:

Ph. avg TimeTotal time  PkgWatt Total Energy
79.63   266.416 57.74   15382.85984
79.63   265.609 57.87   15370.79283
79.57   266.994 57.54   15362.83476
79.53   265.304 57.83   15342.53032
79.71   265.977 57.76   15362.83152
avg 79.61   266.06  57.74   15364.36985

With change calculation patch:

Ph. avg TimeTotal time  PkgWatt Total Energy
78.23   258.826 59.14   15306.96964
78.41   259.110 59.15   15326.35650
78.40   258.530 59.26   15320.48780
78.46   258.673 59.20   15313.44160
78.19   259.075 59.16   15326.87700
avg 78.34   258.842 59.18   15318.82650

With change calculation patch + this one

Ph. avg TimeTotal time  PkgWatt Total Energy
78.33   259.320 59.06   15315.43920
78.28   258.245 59.20   15288.10400
78.08   257.523 59.32   15276.26436
78.38   259.084 59.13   15319.63692
78.31   258.232 59.22   15292.49904
avg 78.27   258.480 59.18   15298.38870


 drivers/cpufreq/intel_pstate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8e309db..95b3958 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -576,10 +576,13 @@ static inline void intel_pstate_sample(struct cpudata 
*cpu)
 {
u64 aperf, mperf;
unsigned long long tsc;
+   unsigned long flags;
 
+   local_irq_save(flags);
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
tsc = native_read_tsc();
+   local_irq_restore(flags);
 
aperf = aperf >> FRAC_BITS;
mperf = mperf >> FRAC_BITS;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-04-29 Thread Stratos Karafotis
On 29/04/2014 07:58 πμ, Viresh Kumar wrote:
> Cc'd Dirk,
> 
> On 28 April 2014 03:42, Stratos Karafotis  wrote:
>> Currently the driver calculates the next pstate proportional to
>> core_busy factor and reverse proportional to current pstate.
>>
>> Change the above method and calculate the next pstate independently
>> of current pstate.
> 
> We must mention why the change is required.
> 

Hi Viresh,

Actually, I can't say that it's required. :)
I just believe that calculation of next p-state should be independent
from current one. In my opinion we can't scale the load across different
p-states, because it's not always equivalent.

For example suppose a load of 100% because of a tight for loop in the
current p-state. It will be also a 100% load in any other p-state.
It will be wrong if we scale the load in the calculation formula
according to the current p-state.

I included the test results in the change log to point out an improvement
because of this patch.

I will enrich more the change log as you suggested.

Thanks,
Stratos Karafotis

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 0/8] Introduce new cpufreq helper macros

2014-04-29 Thread Stratos Karafotis
On 29/04/2014 07:17 πμ, Viresh Kumar wrote:
> On 26 April 2014 01:45, Stratos Karafotis  wrote:
>> This patch set introduces two freq_table helper macros which
>> can be used for iteration over cpufreq_frequency_table and
>> makes the necessary changes to cpufreq core and drivers that
>> use such an iteration procedure.
>>
>> The motivation was a usage of common procedure to iterate over
>> cpufreq_frequency_table across all drivers and cpufreq core.
>>
>> This was tested on a x86_64 platform.
>> Most files compiled successfully but unfortunately I was not
>> able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
>> due to lack of cross compiler.
>>
>> Changelog
>>
>> v4 -> v5
>> - Fix warnings in printk format specifier for 32 bit
>>   architectures in freq_table.c, longhaul, pasemi, ppc_cbe
> 
> Doesn't look much has changed and so it stays as is:
> 
> Acked-by: Viresh Kumar 
> 

Thank you very much!


Stratos Karafotis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: intel_pstate: Change the calculation of next pstate

2014-04-27 Thread Stratos Karafotis
Currently the driver calculates the next pstate proportional to
core_busy factor and reverse proportional to current pstate.

Change the above method and calculate the next pstate independently
of current pstate.

Tested on Intel i7-3770 CPU @ 3.40GHz.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. Below the test results using turbostat
(5 iterations):

Without patch:

Ph. avg TimeTotal time  PkgWatt Total Energy
79.63   266.416 57.74   15382.85984
79.63   265.609 57.87   15370.79283
79.57   266.994 57.54   15362.83476
79.53   265.304 57.83   15342.53032
79.71   265.977 57.76   15362.83152
avg 79.61   266.06  57.74   15364.36985

With patch:

Ph. avg TimeTotal time  PkgWatt Total Energy
78.23   258.826 59.14   15306.96964
78.41   259.110 59.15   15326.35650
78.40   258.530 59.26   15320.48780
78.46   258.673 59.20   15313.44160
78.19   259.075 59.16   15326.87700
avg 78.34   258.842 59.18   15318.82650

The total test time reduced by ~2.6%, while the total energy
consumption during a test iteration reduced by ~0.35%

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/intel_pstate.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0999673..8e309db 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct 
cpudata *cpu)
mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
 {
-   int32_t core_busy, max_pstate, current_pstate;
+   int32_t core_busy, max_pstate;
 
core_busy = cpu->sample.core_pct_busy;
max_pstate = int_tofp(cpu->pstate.max_pstate);
-   current_pstate = int_tofp(cpu->pstate.current_pstate);
-   core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+   core_busy = mul_fp(core_busy, max_pstate);
return FP_ROUNDUP(core_busy);
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-   int32_t busy_scaled;
+   int32_t busy;
struct _pid *pid;
signed int ctl = 0;
int steps;
 
pid = &cpu->pid;
-   busy_scaled = intel_pstate_get_scaled_busy(cpu);
+   busy = intel_pstate_get_busy(cpu);
 
-   ctl = pid_calc(pid, busy_scaled);
+   ctl = pid_calc(pid, busy);
 
steps = abs(ctl);
 
@@ -651,7 +650,7 @@ static void intel_pstate_timer_func(unsigned long __data)
intel_pstate_adjust_busy_pstate(cpu);
 
trace_pstate_sample(fp_toint(sample->core_pct_busy),
-   fp_toint(intel_pstate_get_scaled_busy(cpu)),
+   fp_toint(intel_pstate_get_busy(cpu)),
cpu->pstate.current_pstate,
sample->mperf,
sample->aperf,
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/8] cpufreq: Use cpufreq_for_each_* macros for frequency table iteration

2014-04-26 Thread Stratos Karafotis
Hi Prabhakar,

On 26/04/2014 12:57 μμ, Prabhakar Lad wrote:
> Hi Stratos,
> 
> Thanks for the patch,
> 
> On Sat, Apr 26, 2014 at 1:45 AM, Stratos Karafotis
>  wrote:
>> The cpufreq core now supports the cpufreq_for_each_entry and
>> cpufreq_for_each_valid_entry macros helpers for iteration over the
>> cpufreq_frequency_table, so use them.
>>
>> It should have no functional changes.
>>
>> Signed-off-by: Stratos Karafotis 
> 
> For patches 1 & 2:  Acked-by: Lad, Prabhakar 
> 
> and for patch 3: Acked-and-tested-by: Lad, Prabhakar
> 
> 
> Thanks,
> --Prabhakar lad
> 

Thank you very much!


Stratos Karafotis


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 5/8] mfd: db8500-prcmu: Use cpufreq_for_each_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/mfd/db8500-prcmu.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/mfd/db8500-prcmu.c b/drivers/mfd/db8500-prcmu.c
index 7694e07..b11fdd6 100644
--- a/drivers/mfd/db8500-prcmu.c
+++ b/drivers/mfd/db8500-prcmu.c
@@ -1734,18 +1734,17 @@ static struct cpufreq_frequency_table 
db8500_cpufreq_table[] = {
 
 static long round_armss_rate(unsigned long rate)
 {
+   struct cpufreq_frequency_table *pos;
long freq = 0;
-   int i = 0;
 
/* cpufreq table frequencies is in KHz. */
rate = rate / 1000;
 
/* Find the corresponding arm opp from the cpufreq table. */
-   while (db8500_cpufreq_table[i].frequency != CPUFREQ_TABLE_END) {
-   freq = db8500_cpufreq_table[i].frequency;
+   cpufreq_for_each_entry(pos, db8500_cpufreq_table) {
+   freq = pos->frequency;
if (freq == rate)
break;
-   i++;
}
 
/* Return the last valid value, even if a match was not found. */
@@ -1886,23 +1885,21 @@ static void set_clock_rate(u8 clock, unsigned long rate)
 
 static int set_armss_rate(unsigned long rate)
 {
-   int i = 0;
+   struct cpufreq_frequency_table *pos;
 
/* cpufreq table frequencies is in KHz. */
rate = rate / 1000;
 
/* Find the corresponding arm opp from the cpufreq table. */
-   while (db8500_cpufreq_table[i].frequency != CPUFREQ_TABLE_END) {
-   if (db8500_cpufreq_table[i].frequency == rate)
+   cpufreq_for_each_entry(pos, db8500_cpufreq_table)
+   if (pos->frequency == rate)
break;
-   i++;
-   }
 
-   if (db8500_cpufreq_table[i].frequency != rate)
+   if (pos->frequency != rate)
return -EINVAL;
 
/* Set the new arm opp. */
-   return db8500_prcmu_set_arm_opp(db8500_cpufreq_table[i].driver_data);
+   return db8500_prcmu_set_arm_opp(pos->driver_data);
 }
 
 static int set_plldsi_rate(unsigned long rate)
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 7/8] irda: sh_sir: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---

Please note that I was no able to compile test this patch due to 
lack of cross compiler.

 drivers/net/irda/sh_sir.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/irda/sh_sir.c b/drivers/net/irda/sh_sir.c
index cadf52e..e3fe9a2 100644
--- a/drivers/net/irda/sh_sir.c
+++ b/drivers/net/irda/sh_sir.c
@@ -217,21 +217,17 @@ crc_init_out:
 static u32 sh_sir_find_sclk(struct clk *irda_clk)
 {
struct cpufreq_frequency_table *freq_table = irda_clk->freq_table;
+   struct cpufreq_frequency_table *pos;
struct clk *pclk = clk_get(NULL, "peripheral_clk");
u32 limit, min = 0x, tmp;
-   int i, index = 0;
+   int index = 0;
 
limit = clk_get_rate(pclk);
clk_put(pclk);
 
/* IrDA can not set over peripheral_clk */
-   for (i = 0;
-freq_table[i].frequency != CPUFREQ_TABLE_END;
-i++) {
-   u32 freq = freq_table[i].frequency;
-
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
+   cpufreq_for_each_valid_entry(pos, freq_table) {
+   u32 freq = pos->frequency;
 
/* IrDA should not over peripheral_clk */
if (freq > limit)
@@ -240,7 +236,7 @@ static u32 sh_sir_find_sclk(struct clk *irda_clk)
tmp = freq % SCLK_BASE;
if (tmp < min) {
min = tmp;
-   index = i;
+   index = pos - freq_table;
}
}
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 6/8] thermal: cpu_cooling: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

Also remove the redundant !! operator.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/thermal/cpu_cooling.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 4246262..84a75f8 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -144,11 +144,11 @@ static int get_property(unsigned int cpu, unsigned long 
input,
unsigned int *output,
enum cpufreq_cooling_property property)
 {
-   int i, j;
+   int i;
unsigned long max_level = 0, level = 0;
unsigned int freq = CPUFREQ_ENTRY_INVALID;
int descend = -1;
-   struct cpufreq_frequency_table *table =
+   struct cpufreq_frequency_table *pos, *table =
cpufreq_frequency_get_table(cpu);
 
if (!output)
@@ -157,20 +157,16 @@ static int get_property(unsigned int cpu, unsigned long 
input,
if (!table)
return -EINVAL;
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   /* ignore invalid entries */
-   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   cpufreq_for_each_valid_entry(pos, table) {
/* ignore duplicate entry */
-   if (freq == table[i].frequency)
+   if (freq == pos->frequency)
continue;
 
/* get the frequency order */
if (freq != CPUFREQ_ENTRY_INVALID && descend == -1)
-   descend = !!(freq > table[i].frequency);
+   descend = freq > pos->frequency;
 
-   freq = table[i].frequency;
+   freq = pos->frequency;
max_level++;
}
 
@@ -190,29 +186,26 @@ static int get_property(unsigned int cpu, unsigned long 
input,
if (property == GET_FREQ)
level = descend ? input : (max_level - input);
 
-   for (i = 0, j = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   /* ignore invalid entry */
-   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   i = 0;
+   cpufreq_for_each_valid_entry(pos, table) {
/* ignore duplicate entry */
-   if (freq == table[i].frequency)
+   if (freq == pos->frequency)
continue;
 
/* now we have a valid frequency entry */
-   freq = table[i].frequency;
+   freq = pos->frequency;
 
if (property == GET_LEVEL && (unsigned int)input == freq) {
/* get level by frequency */
-   *output = descend ? j : (max_level - j);
+   *output = descend ? i : (max_level - i);
return 0;
}
-   if (property == GET_FREQ && level == j) {
+   if (property == GET_FREQ && level == i) {
/* get frequency by level */
*output = freq;
return 0;
}
-   j++;
+   i++;
}
 
return -EINVAL;
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/sh/clk/core.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/sh/clk/core.c b/drivers/sh/clk/core.c
index 7472785..be56b22 100644
--- a/drivers/sh/clk/core.c
+++ b/drivers/sh/clk/core.c
@@ -196,17 +196,11 @@ int clk_rate_table_find(struct clk *clk,
struct cpufreq_frequency_table *freq_table,
unsigned long rate)
 {
-   int i;
-
-   for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   unsigned long freq = freq_table[i].frequency;
+   struct cpufreq_frequency_table *pos;
 
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
-
-   if (freq == rate)
-   return i;
-   }
+   cpufreq_for_each_valid_entry(pos, freq_table)
+   if (pos->frequency == rate)
+   return pos - freq_table;
 
return -ENOENT;
 }
@@ -575,11 +569,7 @@ long clk_round_parent(struct clk *clk, unsigned long 
target,
return abs(target - *best_freq);
}
 
-   for (freq = parent->freq_table; freq->frequency != CPUFREQ_TABLE_END;
-freq++) {
-   if (freq->frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   cpufreq_for_each_valid_entry(freq, parent->freq_table) {
if (unlikely(freq->frequency / target <= div_min - 1)) {
unsigned long freq_max;
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/8] ARM: davinci: da850: Use cpufreq_for_each_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 arch/arm/mach-davinci/da850.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-davinci/da850.c b/arch/arm/mach-davinci/da850.c
index 85399c9..45ce065 100644
--- a/arch/arm/mach-davinci/da850.c
+++ b/arch/arm/mach-davinci/da850.c
@@ -1092,20 +1092,21 @@ int da850_register_cpufreq(char *async_clk)
 
 static int da850_round_armrate(struct clk *clk, unsigned long rate)
 {
-   int i, ret = 0, diff;
+   int ret = 0, diff;
unsigned int best = (unsigned int) -1;
struct cpufreq_frequency_table *table = cpufreq_info.freq_table;
+   struct cpufreq_frequency_table *pos;
 
rate /= 1000; /* convert to kHz */
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   diff = table[i].frequency - rate;
+   cpufreq_for_each_entry(pos, table) {
+   diff = pos->frequency - rate;
if (diff < 0)
diff = -diff;
 
if (diff < best) {
best = diff;
-   ret = table[i].frequency;
+   ret = pos->frequency;
}
}
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 4/8] mips: lemote 2f: Use cpufreq_for_each_entry macro for iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 arch/mips/loongson/lemote-2f/clock.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/mips/loongson/lemote-2f/clock.c 
b/arch/mips/loongson/lemote-2f/clock.c
index e1f427f..1eed38e 100644
--- a/arch/mips/loongson/lemote-2f/clock.c
+++ b/arch/mips/loongson/lemote-2f/clock.c
@@ -91,9 +91,9 @@ EXPORT_SYMBOL(clk_put);
 
 int clk_set_rate(struct clk *clk, unsigned long rate)
 {
+   struct cpufreq_frequency_table *pos;
int ret = 0;
int regval;
-   int i;
 
if (likely(clk->ops && clk->ops->set_rate)) {
unsigned long flags;
@@ -106,22 +106,16 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
if (unlikely(clk->flags & CLK_RATE_PROPAGATES))
propagate_rate(clk);
 
-   for (i = 0; loongson2_clockmod_table[i].frequency != CPUFREQ_TABLE_END;
-i++) {
-   if (loongson2_clockmod_table[i].frequency ==
-   CPUFREQ_ENTRY_INVALID)
-   continue;
-   if (rate == loongson2_clockmod_table[i].frequency)
+   cpufreq_for_each_valid_entry(pos, loongson2_clockmod_table)
+   if (rate == pos->frequency)
break;
-   }
-   if (rate != loongson2_clockmod_table[i].frequency)
+   if (rate != pos->frequency)
return -ENOTSUPP;
 
clk->rate = rate;
 
regval = LOONGSON_CHIPCFG0;
-   regval = (regval & ~0x7) |
-   (loongson2_clockmod_table[i].driver_data - 1);
+   regval = (regval & ~0x7) | (pos->driver_data - 1);
LOONGSON_CHIPCFG0 = regval;
 
return ret;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/8] cpufreq: Use cpufreq_for_each_* macros for frequency table iteration

2014-04-25 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry and
cpufreq_for_each_valid_entry macros helpers for iteration over the
cpufreq_frequency_table, so use them.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/acpi-cpufreq.c   |  9 +++---
 drivers/cpufreq/arm_big_little.c | 16 +--
 drivers/cpufreq/cpufreq_stats.c  | 24 ++--
 drivers/cpufreq/dbx500-cpufreq.c |  8 ++
 drivers/cpufreq/elanfreq.c   |  9 +++---
 drivers/cpufreq/exynos-cpufreq.c | 11 ---
 drivers/cpufreq/exynos5440-cpufreq.c | 30 +--
 drivers/cpufreq/freq_table.c | 56 
 drivers/cpufreq/longhaul.c   | 11 ---
 drivers/cpufreq/pasemi-cpufreq.c | 10 +++
 drivers/cpufreq/powernow-k6.c| 14 -
 drivers/cpufreq/ppc_cbe_cpufreq.c|  9 +++---
 drivers/cpufreq/s3c2416-cpufreq.c| 40 +++---
 drivers/cpufreq/s3c64xx-cpufreq.c| 15 --
 14 files changed, 116 insertions(+), 146 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 000e4e0..b0c18ed 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -213,7 +213,7 @@ static unsigned extract_io(u32 value, struct 
acpi_cpufreq_data *data)
 
 static unsigned extract_msr(u32 msr, struct acpi_cpufreq_data *data)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
struct acpi_processor_performance *perf;
 
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
@@ -223,10 +223,9 @@ static unsigned extract_msr(u32 msr, struct 
acpi_cpufreq_data *data)
 
perf = data->acpi_data;
 
-   for (i = 0; data->freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   if (msr == perf->states[data->freq_table[i].driver_data].status)
-   return data->freq_table[i].frequency;
-   }
+   cpufreq_for_each_entry(pos, data->freq_table)
+   if (msr == perf->states[pos->driver_data].status)
+   return pos->frequency;
return data->freq_table[0].frequency;
 }
 
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index bad2ed3..1f4d4e3 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -226,22 +226,22 @@ static inline u32 get_table_count(struct 
cpufreq_frequency_table *table)
 /* get the minimum frequency in the cpufreq_frequency_table */
 static inline u32 get_table_min(struct cpufreq_frequency_table *table)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
uint32_t min_freq = ~0;
-   for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++)
-   if (table[i].frequency < min_freq)
-   min_freq = table[i].frequency;
+   cpufreq_for_each_entry(pos, table)
+   if (pos->frequency < min_freq)
+   min_freq = pos->frequency;
return min_freq;
 }
 
 /* get the maximum frequency in the cpufreq_frequency_table */
 static inline u32 get_table_max(struct cpufreq_frequency_table *table)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
uint32_t max_freq = 0;
-   for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++)
-   if (table[i].frequency > max_freq)
-   max_freq = table[i].frequency;
+   cpufreq_for_each_entry(pos, table)
+   if (pos->frequency > max_freq)
+   max_freq = pos->frequency;
return max_freq;
 }
 
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index ecaaebf..0cd9b4d 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -182,11 +182,11 @@ static void cpufreq_stats_free_table(unsigned int cpu)
 
 static int __cpufreq_stats_create_table(struct cpufreq_policy *policy)
 {
-   unsigned int i, j, count = 0, ret = 0;
+   unsigned int i, count = 0, ret = 0;
struct cpufreq_stats *stat;
unsigned int alloc_size;
unsigned int cpu = policy->cpu;
-   struct cpufreq_frequency_table *table;
+   struct cpufreq_frequency_table *pos, *table;
 
table = cpufreq_frequency_get_table(cpu);
if (unlikely(!table))
@@ -205,12 +205,8 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy)
stat->cpu = cpu;
per_cpu(cpufreq_stats_table, cpu) = stat;
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   unsigned int freq = table[i].frequency;
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
+   cpufreq_for_each_valid_entry(pos, table)
count++;
-   }
 
alloc_size = count * sizeof(int) + count * sizeof(u64);
 
@@ -228,15 +224,11 @@ static int __cpufreq_stats_create_table(struct 

[PATCH v5 1/8] cpufreq: Introduce macros for cpufreq_frequency_table iteration

2014-04-25 Thread Stratos Karafotis
Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.

This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:

- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 Documentation/cpu-freq/cpu-drivers.txt | 19 +++
 drivers/cpufreq/cpufreq.c  | 11 +++
 include/linux/cpufreq.h| 21 +
 3 files changed, 51 insertions(+)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt 
b/Documentation/cpu-freq/cpu-drivers.txt
index 48da5fd..b045fe5 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -228,3 +228,22 @@ is the corresponding frequency table helper for the 
->target
 stage. Just pass the values to this function, and the unsigned int
 index returns the number of the frequency table entry which contains
 the frequency the CPU shall be set to.
+
+The following macros can be used as iterators over cpufreq_frequency_table:
+
+cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency
+table.
+
+cpufreq-for_each_valid_entry(pos, table) - iterates over all entries,
+excluding CPUFREQ_ENTRY_INVALID frequencies.
+Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and
+"table" - the cpufreq_frequency_table * you want to iterate over.
+
+For example:
+
+   struct cpufreq_frequency_table *pos, *driver_freq_table;
+
+   cpufreq_for_each_entry(pos, driver_freq_table) {
+   /* Do something with pos */
+   pos->frequency = ...
+   }
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index abda660..a517da9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -237,6 +237,17 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
 
+bool cpufreq_next_valid(struct cpufreq_frequency_table **pos)
+{
+   while ((*pos)->frequency != CPUFREQ_TABLE_END)
+   if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
+   return true;
+   else
+   (*pos)++;
+   return false;
+}
+EXPORT_SYMBOL_GPL(cpufreq_next_valid);
+
 /*
  *EXTERNALLY AFFECTING FREQUENCY CHANGES *
  */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5ae5100..77a5fa1 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -468,6 +468,27 @@ struct cpufreq_frequency_table {
* order */
 };
 
+bool cpufreq_next_valid(struct cpufreq_frequency_table **pos);
+
+/*
+ * cpufreq_for_each_entry -iterate over a cpufreq_frequency_table
+ * @pos:   the cpufreq_frequency_table * to use as a loop cursor.
+ * @table: the cpufreq_frequency_table * to iterate over.
+ */
+
+#define cpufreq_for_each_entry(pos, table) \
+   for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++)
+
+/*
+ * cpufreq_for_each_valid_entry - iterate over a cpufreq_frequency_table
+ * excluding CPUFREQ_ENTRY_INVALID frequencies.
+ * @pos:the cpufreq_frequency_table * to use as a loop cursor.
+ * @table:  the cpufreq_frequency_table * to iterate over.
+ */
+
+#define cpufreq_for_each_valid_entry(pos, table)   \
+   for (pos = table; cpufreq_next_valid(&pos); pos++)
+
 int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table);
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/8] Introduce new cpufreq helper macros

2014-04-25 Thread Stratos Karafotis
Hi all,

This patch set introduces two freq_table helper macros which
can be used for iteration over cpufreq_frequency_table and
makes the necessary changes to cpufreq core and drivers that
use such an iteration procedure.

The motivation was a usage of common procedure to iterate over
cpufreq_frequency_table across all drivers and cpufreq core.

This was tested on a x86_64 platform.
Most files compiled successfully but unfortunately I was not
able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c 
due to lack of cross compiler.

Changelog 

v4 -> v5
- Fix warnings in printk format specifier for 32 bit 
  architectures in freq_table.c, longhaul, pasemi, ppc_cbe
v3 -> v4
- No changes
v2 -> v3
- Better formatting in Documentation
- Fix spell error in comments in cpufreq.h
- Added 'ARM' prefix in patch 3/8 subject as Sekhar Nori
  suggested
v1 -> v2
- Rearrange patches
- Remove redundant braces
- Fix a newly introduced bug in exynos5440
- Use cpufreq_for_each_valid_entry instead of
cpufreq_for_each_entry in cpufreq_frequency_table_get_index()
- Drop redundant double ! operator in cpu_cooling
- Change the pos loop cursor variable to freq_pos in longhaul
- Declare pos variable on a separate line

Stratos Karafotis (8):
  cpufreq: Introduce macros for cpufreq_frequency_table iteration
  cpufreq: Use cpufreq_for_each_* macros for frequency table iteration
  ARM: davinci: da850: Use cpufreq_for_each_entry macro for iteration
  mips: lemote 2f: Use cpufreq_for_each_entry macro for iteration
  mfd: db8500-prcmu: Use cpufreq_for_each_entry macro for iteration
  thermal: cpu_cooling: Use cpufreq_for_each_valid_entry macro for
iteration
  irda: sh_sir: Use cpufreq_for_each_valid_entry macro for iteration
  sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

 Documentation/cpu-freq/cpu-drivers.txt | 19 
 arch/arm/mach-davinci/da850.c  |  9 +++---
 arch/mips/loongson/lemote-2f/clock.c   | 16 +++---
 drivers/cpufreq/acpi-cpufreq.c |  9 +++---
 drivers/cpufreq/arm_big_little.c   | 16 +-
 drivers/cpufreq/cpufreq.c  | 11 +++
 drivers/cpufreq/cpufreq_stats.c| 24 +--
 drivers/cpufreq/dbx500-cpufreq.c   |  8 ++---
 drivers/cpufreq/elanfreq.c |  9 +++---
 drivers/cpufreq/exynos-cpufreq.c   | 11 +++
 drivers/cpufreq/exynos5440-cpufreq.c   | 30 +-
 drivers/cpufreq/freq_table.c   | 56 +++---
 drivers/cpufreq/longhaul.c | 11 +++
 drivers/cpufreq/pasemi-cpufreq.c   | 10 +++---
 drivers/cpufreq/powernow-k6.c  | 14 -
 drivers/cpufreq/ppc_cbe_cpufreq.c  |  9 +++---
 drivers/cpufreq/s3c2416-cpufreq.c  | 40 +++-
 drivers/cpufreq/s3c64xx-cpufreq.c  | 15 +++--
 drivers/mfd/db8500-prcmu.c | 19 +---
 drivers/net/irda/sh_sir.c  | 14 +++--
 drivers/sh/clk/core.c  | 20 +++-
 drivers/thermal/cpu_cooling.c  | 33 
 include/linux/cpufreq.h| 21 +
 23 files changed, 208 insertions(+), 216 deletions(-)

-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/8] cpufreq: Use cpufreq_for_each_* macros for frequency table iteration

2014-04-25 Thread Stratos Karafotis
Hi Prabhakar,

On 25/04/2014 03:31 μμ, Prabhakar Lad wrote:
> Hi Stratos,
> 
> Thanks for the patch.
> 
> On Tue, Apr 22, 2014 at 4:30 AM, Stratos Karafotis
>  wrote:
>> The cpufreq core now supports the cpufreq_for_each_entry and
>> cpufreq_for_each_valid_entry macros helpers for iteration over the
>> cpufreq_frequency_table, so use them.
>>
>> It should have no functional changes.
>>
> This patch produces following build warning,
> 
> drivers/cpufreq/freq_table.c: In function 'cpufreq_frequency_table_cpuinfo':
> drivers/cpufreq/freq_table.c:36:3: warning: format '%lu' expects
> argument of type 'long unsigned int', but argument 2 has type 'int'
> [-Wformat=]
>pr_debug("table entry %lu: %u kHz\n", pos - table, freq);

Thanks for this finding.
I will fix it and resend the patch.


Stratos Karafotis


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: powernow-k8: Fix checkpatch warnings

2014-04-23 Thread Stratos Karafotis
On 23/04/2014 07:46 πμ, Viresh Kumar wrote:
> On 23 April 2014 02:43, Stratos Karafotis  wrote:
>> @@ -342,7 +333,7 @@ static int core_voltage_pre_transition(struct 
>> powernow_k8_data *data,
>> return 1;
>>
>> if (savefid != data->currfid) {
>> -   printk(KERN_ERR PFX "ph1 err, currfid changed 0x%x\n",
>> +   pr_err("ph1 err, currfid changed 0x%x\n",
>> data->currfid);
> 
> This will come in single line?
> 
>> @@ -529,43 +510,39 @@ static int check_pst_table(struct powernow_k8_data 
>> *data, struct pst_s *pst,
>>
>> for (j = 0; j < data->numps; j++) {
>> if (pst[j].vid > LEAST_VID) {
>> -   printk(KERN_ERR FW_BUG PFX "vid %d invalid : 0x%x\n",
>> -  j, pst[j].vid);
>> +   pr_err(FW_BUG "vid %d invalid : 0x%x\n", j,
>> +  pst[j].vid);
> 
> Same here.
> 
>>  static const char missing_pss_msg[] =
>> KERN_ERR
> 
> remove this and use pr_err_once instead of printk_once()
> 
>> -   FW_BUG PFX "No compatible ACPI _PSS objects found.\n"
>> -   FW_BUG PFX "First, make sure Cool'N'Quiet is enabled in the BIOS.\n"
>> -   FW_BUG PFX "If that doesn't help, try upgrading your BIOS.\n";
>> +   FW_BUG KBUILD_MODNAME
>> +   "No compatible ACPI _PSS objects found.\n"
> 
> Don't break these, even if they cross 80 columns.
> 

Thanks for your review!

Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: powernow-k8: Fix checkpatch warnings

2014-04-23 Thread Stratos Karafotis
On 23/04/2014 01:37 μμ, Rafael J. Wysocki wrote:
> On Wednesday, April 23, 2014 12:13:54 AM Stratos Karafotis wrote:
>> Fix the following checkpatch warnings:
> 
> In addition to comments from Viresh, I have a general one.
> 
> Some of the checkpatch.pl warnings are not worth fixing at all ->
> 
>> - WARNING: Prefer pr_err(... to printk(KERN_ERR ...
>> - WARNING: Prefer pr_info(... to printk(KERN_INFO ...
>> - WARNING: Prefer pr_warn(... to printk(KERN_WARNING ...
>> - WARNING: quoted string split across lines
>> - WARNING: line over 80 characters
> 
> -> and the "line over 80 characters" ones are outright wrong in many cases,
> so please don't "fix" them.
> 

Hi Rafael,

Thanks for your comments!

Could you please clarify if you want me to drop the entire patch or
send it only with the changes about the last warning found ("no spaces
at the start of a line)?

Also, I would like to take the opportunity and ask a question. :)

Reading the code, sometimes, I find some minor formatting issues.
Like the checkpatch warnings or unnecessary parentheses and braces.

For example the line bellow:
if ((freq < policy->min) || (freq > policy->max))

I know that this is not actually an issue and a patch with such changes
is (somehow) a noise for the maintainers. But, should it be "fixed" or not?

Thanks for your time,

Stratos


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq: powernow-k8: Fix checkpatch warnings

2014-04-22 Thread Stratos Karafotis
Fix the following checkpatch warnings:

- WARNING: Prefer pr_err(... to printk(KERN_ERR ...
- WARNING: Prefer pr_info(... to printk(KERN_INFO ...
- WARNING: Prefer pr_warn(... to printk(KERN_WARNING ...
- WARNING: quoted string split across lines
- WARNING: line over 80 characters
- WARNING: please, no spaces at the start of a line

Also, define the pr_fmt macro instead of PFX for the module name.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/powernow-k8.c | 180 ++
 drivers/cpufreq/powernow-k8.h |  11 ++-
 2 files changed, 84 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/powernow-k8.c b/drivers/cpufreq/powernow-k8.c
index 1b6ae6b..fa0386e 100644
--- a/drivers/cpufreq/powernow-k8.c
+++ b/drivers/cpufreq/powernow-k8.c
@@ -27,6 +27,8 @@
  *  power and thermal data sheets, (e.g. 30417.pdf, 30430.pdf, 43375.pdf)
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -45,7 +47,6 @@
 #include 
 #include 
 
-#define PFX "powernow-k8: "
 #define VERSION "version 2.20.00"
 #include "powernow-k8.h"
 
@@ -161,7 +162,7 @@ static int write_new_fid(struct powernow_k8_data *data, u32 
fid)
u32 i = 0;
 
if ((fid & INVALID_FID_MASK) || (data->currvid & INVALID_VID_MASK)) {
-   printk(KERN_ERR PFX "internal error - overflow on fid write\n");
+   pr_err("internal error - overflow on fid write\n");
return 1;
}
 
@@ -175,9 +176,7 @@ static int write_new_fid(struct powernow_k8_data *data, u32 
fid)
do {
wrmsr(MSR_FIDVID_CTL, lo, data->plllock * PLL_LOCK_CONVERSION);
if (i++ > 100) {
-   printk(KERN_ERR PFX
-   "Hardware error - pending bit very stuck - "
-   "no further pstate changes possible\n");
+   pr_err("Hardware error - pending bit very stuck - no 
further pstate changes possible\n");
return 1;
}
} while (query_current_values_with_pending_wait(data));
@@ -185,16 +184,14 @@ static int write_new_fid(struct powernow_k8_data *data, 
u32 fid)
count_off_irt(data);
 
if (savevid != data->currvid) {
-   printk(KERN_ERR PFX
-   "vid change on fid trans, old 0x%x, new 0x%x\n",
-   savevid, data->currvid);
+   pr_err("vid change on fid trans, old 0x%x, new 0x%x\n",
+  savevid, data->currvid);
return 1;
}
 
if (fid != data->currfid) {
-   printk(KERN_ERR PFX
-   "fid trans failed, fid 0x%x, curr 0x%x\n", fid,
-   data->currfid);
+   pr_err("fid trans failed, fid 0x%x, curr 0x%x\n", fid,
+  data->currfid);
return 1;
}
 
@@ -209,7 +206,7 @@ static int write_new_vid(struct powernow_k8_data *data, u32 
vid)
int i = 0;
 
if ((data->currfid & INVALID_FID_MASK) || (vid & INVALID_VID_MASK)) {
-   printk(KERN_ERR PFX "internal error - overflow on vid write\n");
+   pr_err("internal error - overflow on vid write\n");
return 1;
}
 
@@ -223,23 +220,19 @@ static int write_new_vid(struct powernow_k8_data *data, 
u32 vid)
do {
wrmsr(MSR_FIDVID_CTL, lo, STOP_GRANT_5NS);
if (i++ > 100) {
-   printk(KERN_ERR PFX "internal error - pending bit "
-   "very stuck - no further pstate "
-   "changes possible\n");
+   pr_err("internal error - pending bit very stuck - no 
further pstate changes possible\n");
return 1;
}
} while (query_current_values_with_pending_wait(data));
 
if (savefid != data->currfid) {
-   printk(KERN_ERR PFX "fid changed on vid trans, old "
-   "0x%x new 0x%x\n",
-  savefid, data->currfid);
+   pr_err("fid changed on vid trans, old 0x%x new 0x%x\n",
+   savefid, data->currfid);
return 1;
}
 
if (vid != data->currvid) {
-   printk(KERN_ERR PFX "vid trans failed, vid 0x%x, "
-   "curr 0x%x\n",
+   pr_err("vid trans failed, vid 0x%x, curr 0x%x\n",
vid, data->currvid);
return 1;
}
@@ -283,8 +276,7 @@ static int transition_fid_vid(struct p

[PATCH] cpufreq: Kconfig: Fix spelling errors

2014-04-22 Thread Stratos Karafotis
Fix 4 spelling errors in help sections.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/Kconfig.arm | 4 ++--
 drivers/cpufreq/Kconfig.x86 | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 5805035..6e05a1e 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -85,7 +85,7 @@ config ARM_EXYNOS_CPU_FREQ_BOOST_SW
  It allows usage of special frequencies for Samsung Exynos
  processors if thermal conditions are appropriate.
 
- It reguires, for safe operation, thermal framework with properly
+ It requires, for safe operation, thermal framework with properly
  defined trip points.
 
  If in doubt, say N.
@@ -186,7 +186,7 @@ config ARM_S3C2416_CPUFREQ
  S3C2450 SoC. The S3C2416 supports changing the rate of the
  armdiv clock source and also entering a so called dynamic
  voltage scaling mode in which it is possible to reduce the
- core voltage of the cpu.
+ core voltage of the CPU.
 
  If in doubt, say N.
 
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index d369349..89ae88f 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -10,7 +10,7 @@ config X86_INTEL_PSTATE
  The driver implements an internal governor and will become
   the scaling driver and governor for Sandy bridge processors.
 
- When this driver is enabled it will become the perferred
+ When this driver is enabled it will become the preferred
   scaling driver for Sandy bridge processors.
 
  If in doubt, say N.
@@ -52,7 +52,7 @@ config X86_ACPI_CPUFREQ_CPB
help
  The powernow-k8 driver used to provide a sysfs knob called "cpb"
  to disable the Core Performance Boosting feature of AMD CPUs. This
- file has now been superseeded by the more generic "boost" entry.
+ file has now been superseded by the more generic "boost" entry.
 
  By enabling this option the acpi_cpufreq driver provides the old
  entry in addition to the new boost ones, for compatibility reasons.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/sh/clk/core.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/sh/clk/core.c b/drivers/sh/clk/core.c
index 7472785..be56b22 100644
--- a/drivers/sh/clk/core.c
+++ b/drivers/sh/clk/core.c
@@ -196,17 +196,11 @@ int clk_rate_table_find(struct clk *clk,
struct cpufreq_frequency_table *freq_table,
unsigned long rate)
 {
-   int i;
-
-   for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   unsigned long freq = freq_table[i].frequency;
+   struct cpufreq_frequency_table *pos;
 
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
-
-   if (freq == rate)
-   return i;
-   }
+   cpufreq_for_each_valid_entry(pos, freq_table)
+   if (pos->frequency == rate)
+   return pos - freq_table;
 
return -ENOENT;
 }
@@ -575,11 +569,7 @@ long clk_round_parent(struct clk *clk, unsigned long 
target,
return abs(target - *best_freq);
}
 
-   for (freq = parent->freq_table; freq->frequency != CPUFREQ_TABLE_END;
-freq++) {
-   if (freq->frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   cpufreq_for_each_valid_entry(freq, parent->freq_table) {
if (unlikely(freq->frequency / target <= div_min - 1)) {
unsigned long freq_max;
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/8] mips: lemote 2f: Use cpufreq_for_each_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 arch/mips/loongson/lemote-2f/clock.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/mips/loongson/lemote-2f/clock.c 
b/arch/mips/loongson/lemote-2f/clock.c
index e1f427f..1eed38e 100644
--- a/arch/mips/loongson/lemote-2f/clock.c
+++ b/arch/mips/loongson/lemote-2f/clock.c
@@ -91,9 +91,9 @@ EXPORT_SYMBOL(clk_put);

 int clk_set_rate(struct clk *clk, unsigned long rate)
 {
+   struct cpufreq_frequency_table *pos;
int ret = 0;
int regval;
-   int i;

if (likely(clk->ops && clk->ops->set_rate)) {
unsigned long flags;
@@ -106,22 +106,16 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
if (unlikely(clk->flags & CLK_RATE_PROPAGATES))
propagate_rate(clk);

-   for (i = 0; loongson2_clockmod_table[i].frequency != CPUFREQ_TABLE_END;
-i++) {
-   if (loongson2_clockmod_table[i].frequency ==
-   CPUFREQ_ENTRY_INVALID)
-   continue;
-   if (rate == loongson2_clockmod_table[i].frequency)
+   cpufreq_for_each_valid_entry(pos, loongson2_clockmod_table)
+   if (rate == pos->frequency)
break;
-   }
-   if (rate != loongson2_clockmod_table[i].frequency)
+   if (rate != pos->frequency)
return -ENOTSUPP;

clk->rate = rate;

regval = LOONGSON_CHIPCFG0;
-   regval = (regval & ~0x7) |
-   (loongson2_clockmod_table[i].driver_data - 1);
+   regval = (regval & ~0x7) | (pos->driver_data - 1);
LOONGSON_CHIPCFG0 = regval;

return ret;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 7/8] irda: sh_sir: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---

Please note that I was no able to compile test this patch due to 
lack of cross compiler.

 drivers/net/irda/sh_sir.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/irda/sh_sir.c b/drivers/net/irda/sh_sir.c
index cadf52e..e3fe9a2 100644
--- a/drivers/net/irda/sh_sir.c
+++ b/drivers/net/irda/sh_sir.c
@@ -217,21 +217,17 @@ crc_init_out:
 static u32 sh_sir_find_sclk(struct clk *irda_clk)
 {
struct cpufreq_frequency_table *freq_table = irda_clk->freq_table;
+   struct cpufreq_frequency_table *pos;
struct clk *pclk = clk_get(NULL, "peripheral_clk");
u32 limit, min = 0x, tmp;
-   int i, index = 0;
+   int index = 0;
 
limit = clk_get_rate(pclk);
clk_put(pclk);
 
/* IrDA can not set over peripheral_clk */
-   for (i = 0;
-freq_table[i].frequency != CPUFREQ_TABLE_END;
-i++) {
-   u32 freq = freq_table[i].frequency;
-
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
+   cpufreq_for_each_valid_entry(pos, freq_table) {
+   u32 freq = pos->frequency;
 
/* IrDA should not over peripheral_clk */
if (freq > limit)
@@ -240,7 +236,7 @@ static u32 sh_sir_find_sclk(struct clk *irda_clk)
tmp = freq % SCLK_BASE;
if (tmp < min) {
min = tmp;
-   index = i;
+   index = pos - freq_table;
}
}
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 6/8] thermal: cpu_cooling: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_valid_entry macro
helper for iteration over the cpufreq_frequency_table, so use it.

Also remove the redundant !! operator.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/thermal/cpu_cooling.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 4246262..84a75f8 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -144,11 +144,11 @@ static int get_property(unsigned int cpu, unsigned long 
input,
unsigned int *output,
enum cpufreq_cooling_property property)
 {
-   int i, j;
+   int i;
unsigned long max_level = 0, level = 0;
unsigned int freq = CPUFREQ_ENTRY_INVALID;
int descend = -1;
-   struct cpufreq_frequency_table *table =
+   struct cpufreq_frequency_table *pos, *table =
cpufreq_frequency_get_table(cpu);
 
if (!output)
@@ -157,20 +157,16 @@ static int get_property(unsigned int cpu, unsigned long 
input,
if (!table)
return -EINVAL;
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   /* ignore invalid entries */
-   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   cpufreq_for_each_valid_entry(pos, table) {
/* ignore duplicate entry */
-   if (freq == table[i].frequency)
+   if (freq == pos->frequency)
continue;
 
/* get the frequency order */
if (freq != CPUFREQ_ENTRY_INVALID && descend == -1)
-   descend = !!(freq > table[i].frequency);
+   descend = freq > pos->frequency;
 
-   freq = table[i].frequency;
+   freq = pos->frequency;
max_level++;
}
 
@@ -190,29 +186,26 @@ static int get_property(unsigned int cpu, unsigned long 
input,
if (property == GET_FREQ)
level = descend ? input : (max_level - input);
 
-   for (i = 0, j = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   /* ignore invalid entry */
-   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
-   continue;
-
+   i = 0;
+   cpufreq_for_each_valid_entry(pos, table) {
/* ignore duplicate entry */
-   if (freq == table[i].frequency)
+   if (freq == pos->frequency)
continue;
 
/* now we have a valid frequency entry */
-   freq = table[i].frequency;
+   freq = pos->frequency;
 
if (property == GET_LEVEL && (unsigned int)input == freq) {
/* get level by frequency */
-   *output = descend ? j : (max_level - j);
+   *output = descend ? i : (max_level - i);
return 0;
}
-   if (property == GET_FREQ && level == j) {
+   if (property == GET_FREQ && level == i) {
/* get frequency by level */
*output = freq;
return 0;
}
-   j++;
+   i++;
}
 
return -EINVAL;
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 5/8] mfd: db8500-prcmu: Use cpufreq_for_each_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/mfd/db8500-prcmu.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/mfd/db8500-prcmu.c b/drivers/mfd/db8500-prcmu.c
index 7694e07..b11fdd6 100644
--- a/drivers/mfd/db8500-prcmu.c
+++ b/drivers/mfd/db8500-prcmu.c
@@ -1734,18 +1734,17 @@ static struct cpufreq_frequency_table 
db8500_cpufreq_table[] = {
 
 static long round_armss_rate(unsigned long rate)
 {
+   struct cpufreq_frequency_table *pos;
long freq = 0;
-   int i = 0;
 
/* cpufreq table frequencies is in KHz. */
rate = rate / 1000;
 
/* Find the corresponding arm opp from the cpufreq table. */
-   while (db8500_cpufreq_table[i].frequency != CPUFREQ_TABLE_END) {
-   freq = db8500_cpufreq_table[i].frequency;
+   cpufreq_for_each_entry(pos, db8500_cpufreq_table) {
+   freq = pos->frequency;
if (freq == rate)
break;
-   i++;
}
 
/* Return the last valid value, even if a match was not found. */
@@ -1886,23 +1885,21 @@ static void set_clock_rate(u8 clock, unsigned long rate)
 
 static int set_armss_rate(unsigned long rate)
 {
-   int i = 0;
+   struct cpufreq_frequency_table *pos;
 
/* cpufreq table frequencies is in KHz. */
rate = rate / 1000;
 
/* Find the corresponding arm opp from the cpufreq table. */
-   while (db8500_cpufreq_table[i].frequency != CPUFREQ_TABLE_END) {
-   if (db8500_cpufreq_table[i].frequency == rate)
+   cpufreq_for_each_entry(pos, db8500_cpufreq_table)
+   if (pos->frequency == rate)
break;
-   i++;
-   }
 
-   if (db8500_cpufreq_table[i].frequency != rate)
+   if (pos->frequency != rate)
return -EINVAL;
 
/* Set the new arm opp. */
-   return db8500_prcmu_set_arm_opp(db8500_cpufreq_table[i].driver_data);
+   return db8500_prcmu_set_arm_opp(pos->driver_data);
 }
 
 static int set_plldsi_rate(unsigned long rate)
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/8] ARM: davinci: da850: Use cpufreq_for_each_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 arch/arm/mach-davinci/da850.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-davinci/da850.c b/arch/arm/mach-davinci/da850.c
index 85399c9..45ce065 100644
--- a/arch/arm/mach-davinci/da850.c
+++ b/arch/arm/mach-davinci/da850.c
@@ -1092,20 +1092,21 @@ int da850_register_cpufreq(char *async_clk)
 
 static int da850_round_armrate(struct clk *clk, unsigned long rate)
 {
-   int i, ret = 0, diff;
+   int ret = 0, diff;
unsigned int best = (unsigned int) -1;
struct cpufreq_frequency_table *table = cpufreq_info.freq_table;
+   struct cpufreq_frequency_table *pos;
 
rate /= 1000; /* convert to kHz */
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   diff = table[i].frequency - rate;
+   cpufreq_for_each_entry(pos, table) {
+   diff = pos->frequency - rate;
if (diff < 0)
diff = -diff;
 
if (diff < best) {
best = diff;
-   ret = table[i].frequency;
+   ret = pos->frequency;
}
}
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/8] cpufreq: Use cpufreq_for_each_* macros for frequency table iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry and
cpufreq_for_each_valid_entry macros helpers for iteration over the
cpufreq_frequency_table, so use them.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 drivers/cpufreq/acpi-cpufreq.c   |  9 +++---
 drivers/cpufreq/arm_big_little.c | 16 +--
 drivers/cpufreq/cpufreq_stats.c  | 24 ++--
 drivers/cpufreq/dbx500-cpufreq.c |  8 ++
 drivers/cpufreq/elanfreq.c   |  9 +++---
 drivers/cpufreq/exynos-cpufreq.c | 11 ---
 drivers/cpufreq/exynos5440-cpufreq.c | 30 +--
 drivers/cpufreq/freq_table.c | 56 
 drivers/cpufreq/longhaul.c   | 13 -
 drivers/cpufreq/pasemi-cpufreq.c | 10 +++
 drivers/cpufreq/powernow-k6.c| 14 -
 drivers/cpufreq/ppc_cbe_cpufreq.c|  9 +++---
 drivers/cpufreq/s3c2416-cpufreq.c| 40 +++---
 drivers/cpufreq/s3c64xx-cpufreq.c| 15 --
 14 files changed, 117 insertions(+), 147 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 000e4e0..b0c18ed 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -213,7 +213,7 @@ static unsigned extract_io(u32 value, struct 
acpi_cpufreq_data *data)
 
 static unsigned extract_msr(u32 msr, struct acpi_cpufreq_data *data)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
struct acpi_processor_performance *perf;
 
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
@@ -223,10 +223,9 @@ static unsigned extract_msr(u32 msr, struct 
acpi_cpufreq_data *data)
 
perf = data->acpi_data;
 
-   for (i = 0; data->freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   if (msr == perf->states[data->freq_table[i].driver_data].status)
-   return data->freq_table[i].frequency;
-   }
+   cpufreq_for_each_entry(pos, data->freq_table)
+   if (msr == perf->states[pos->driver_data].status)
+   return pos->frequency;
return data->freq_table[0].frequency;
 }
 
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index bad2ed3..1f4d4e3 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -226,22 +226,22 @@ static inline u32 get_table_count(struct 
cpufreq_frequency_table *table)
 /* get the minimum frequency in the cpufreq_frequency_table */
 static inline u32 get_table_min(struct cpufreq_frequency_table *table)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
uint32_t min_freq = ~0;
-   for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++)
-   if (table[i].frequency < min_freq)
-   min_freq = table[i].frequency;
+   cpufreq_for_each_entry(pos, table)
+   if (pos->frequency < min_freq)
+   min_freq = pos->frequency;
return min_freq;
 }
 
 /* get the maximum frequency in the cpufreq_frequency_table */
 static inline u32 get_table_max(struct cpufreq_frequency_table *table)
 {
-   int i;
+   struct cpufreq_frequency_table *pos;
uint32_t max_freq = 0;
-   for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++)
-   if (table[i].frequency > max_freq)
-   max_freq = table[i].frequency;
+   cpufreq_for_each_entry(pos, table)
+   if (pos->frequency > max_freq)
+   max_freq = pos->frequency;
return max_freq;
 }
 
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index ecaaebf..0cd9b4d 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -182,11 +182,11 @@ static void cpufreq_stats_free_table(unsigned int cpu)
 
 static int __cpufreq_stats_create_table(struct cpufreq_policy *policy)
 {
-   unsigned int i, j, count = 0, ret = 0;
+   unsigned int i, count = 0, ret = 0;
struct cpufreq_stats *stat;
unsigned int alloc_size;
unsigned int cpu = policy->cpu;
-   struct cpufreq_frequency_table *table;
+   struct cpufreq_frequency_table *pos, *table;
 
table = cpufreq_frequency_get_table(cpu);
if (unlikely(!table))
@@ -205,12 +205,8 @@ static int __cpufreq_stats_create_table(struct 
cpufreq_policy *policy)
stat->cpu = cpu;
per_cpu(cpufreq_stats_table, cpu) = stat;
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   unsigned int freq = table[i].frequency;
-   if (freq == CPUFREQ_ENTRY_INVALID)
-   continue;
+   cpufreq_for_each_valid_entry(pos, table)
count++;
-   }
 
alloc_size = count * sizeof(int) + count * sizeof(u64);
 
@@ -228,15 +224,11 @@ static int __cpufre

[PATCH v4 1/8] cpufreq: Introduce macros for cpufreq_frequency_table iteration

2014-04-21 Thread Stratos Karafotis
Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.

This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:

- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---
 Documentation/cpu-freq/cpu-drivers.txt | 19 +++
 drivers/cpufreq/cpufreq.c  | 11 +++
 include/linux/cpufreq.h| 21 +
 3 files changed, 51 insertions(+)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt 
b/Documentation/cpu-freq/cpu-drivers.txt
index 48da5fd..b045fe5 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -228,3 +228,22 @@ is the corresponding frequency table helper for the 
->target
 stage. Just pass the values to this function, and the unsigned int
 index returns the number of the frequency table entry which contains
 the frequency the CPU shall be set to.
+
+The following macros can be used as iterators over cpufreq_frequency_table:
+
+cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency
+table.
+
+cpufreq-for_each_valid_entry(pos, table) - iterates over all entries,
+excluding CPUFREQ_ENTRY_INVALID frequencies.
+Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and
+"table" - the cpufreq_frequency_table * you want to iterate over.
+
+For example:
+
+   struct cpufreq_frequency_table *pos, *driver_freq_table;
+
+   cpufreq_for_each_entry(pos, driver_freq_table) {
+   /* Do something with pos */
+   pos->frequency = ...
+   }
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index abda660..a517da9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -237,6 +237,17 @@ void cpufreq_cpu_put(struct cpufreq_policy *policy)
 }
 EXPORT_SYMBOL_GPL(cpufreq_cpu_put);
 
+bool cpufreq_next_valid(struct cpufreq_frequency_table **pos)
+{
+   while ((*pos)->frequency != CPUFREQ_TABLE_END)
+   if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
+   return true;
+   else
+   (*pos)++;
+   return false;
+}
+EXPORT_SYMBOL_GPL(cpufreq_next_valid);
+
 /*
  *EXTERNALLY AFFECTING FREQUENCY CHANGES *
  */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5ae5100..77a5fa1 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -468,6 +468,27 @@ struct cpufreq_frequency_table {
* order */
 };
 
+bool cpufreq_next_valid(struct cpufreq_frequency_table **pos);
+
+/*
+ * cpufreq_for_each_entry -iterate over a cpufreq_frequency_table
+ * @pos:   the cpufreq_frequency_table * to use as a loop cursor.
+ * @table: the cpufreq_frequency_table * to iterate over.
+ */
+
+#define cpufreq_for_each_entry(pos, table) \
+   for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++)
+
+/*
+ * cpufreq_for_each_valid_entry - iterate over a cpufreq_frequency_table
+ * excluding CPUFREQ_ENTRY_INVALID frequencies.
+ * @pos:the cpufreq_frequency_table * to use as a loop cursor.
+ * @table:  the cpufreq_frequency_table * to iterate over.
+ */
+
+#define cpufreq_for_each_valid_entry(pos, table)   \
+   for (pos = table; cpufreq_next_valid(&pos); pos++)
+
 int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table);
 
-- 
1.9.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/8] ARM: davinci: da850: Use cpufreq_for_each_entry macro for iteration

2014-04-21 Thread Stratos Karafotis
The cpufreq core now supports the cpufreq_for_each_entry macro helper
for iteration over the cpufreq_frequency_table, so use it.

It should have no functional changes.

Signed-off-by: Stratos Karafotis 
---

Changes v2 -> v3
- Added 'ARM' prefix in the subject as Sekhar Nori suggested.

 arch/arm/mach-davinci/da850.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-davinci/da850.c b/arch/arm/mach-davinci/da850.c
index 85399c9..45ce065 100644
--- a/arch/arm/mach-davinci/da850.c
+++ b/arch/arm/mach-davinci/da850.c
@@ -1092,20 +1092,21 @@ int da850_register_cpufreq(char *async_clk)
 
 static int da850_round_armrate(struct clk *clk, unsigned long rate)
 {
-   int i, ret = 0, diff;
+   int ret = 0, diff;
unsigned int best = (unsigned int) -1;
struct cpufreq_frequency_table *table = cpufreq_info.freq_table;
+   struct cpufreq_frequency_table *pos;
 
rate /= 1000; /* convert to kHz */
 
-   for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
-   diff = table[i].frequency - rate;
+   cpufreq_for_each_entry(pos, table) {
+   diff = pos->frequency - rate;
if (diff < 0)
diff = -diff;
 
if (diff < best) {
best = diff;
-   ret = table[i].frequency;
+   ret = pos->frequency;
}
}
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-16 Thread Stratos Karafotis
On 16/04/2014 07:13 πμ, Simon Horman wrote:
> On Wed, Apr 16, 2014 at 01:27:04AM +0300, Stratos Karafotis wrote:
>> The cpufreq core now supports the cpufreq_for_each_valid_entry macro
>> helper for iteration over the cpufreq_frequency_table, so use it.
>>
>> It should have no functional changes.
>>
>> Signed-off-by: Stratos Karafotis 
> 
> Rafael, please feel free to take this one.
> 
> Acked-by: Simon Horman 
> 

Thanks!

Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/8] Introduce new cpufreq helper macros

2014-04-16 Thread Stratos Karafotis
On 16/04/2014 07:01 πμ, Viresh Kumar wrote:
> On 16 April 2014 03:55, Stratos Karafotis  wrote:
>> Hi all,
>>
>> This patch set introduces two freq_table helper macros which
>> can be used for iteration over cpufreq_frequency_table and
>> makes the necessary changes to cpufreq core and drivers that
>> use such an iteration procedure.
>>
>> The motivation was a usage of common procedure to iterate over
>> cpufreq_frequency_table across all drivers and cpufreq core.
>>
>> This was tested on a x86_64 platform.
>> Most files compiled successfully but unfortunately I was not
>> able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
>> due to lack of cross compiler.
>>
>> Changes v1 -> v2
>> - Rearrange patches
>> - Remove redundant braces
>> - Fix a newly introduced bug in exynos5440
>> - Use cpufreq_for_each_valid_entry instead of
>> cpufreq_for_each_entry in cpufreq_frequency_table_get_index()
>> - Drop redundant double ! operator in longhaul and change
> 
> You dropped this !! in thermal stuff and not longhaul :)
> 
>> the pos loop cursor variable to freq_pos.
>> - Declare pos variable on a separate line
>>
>> Stratos Karafotis (8):
>>   cpufreq: Introduce macros for cpufreq_frequency_table iteration
>>   cpufreq: Use cpufreq_for_each_* macros for frequency table iteration
>>   davinci: da850: Use cpufreq_for_each_entry macro for iteration
>>   mips: lemote 2f: se cpufreq_for_each_entry macro for iteration
>>   mfd: db8500-prcmu: Use cpufreq_for_each_entry macro for iteration
>>   thermal: cpu_cooling: Use cpufreq_for_each_valid_entry macro for
>> iteration
>>   irda: sh_sir: Use cpufreq_for_each_valid_entry macro for iteration
>>   sh: clk: Use cpufreq_for_each_valid_entry macro for iteration
> 
> Acked-by: Viresh Kumar 
> 

Thanks!

Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >