Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate

Stratos Karafotis Fri, 02 May 2014 06:46:38 -0700

On 02/05/2014 03:26 μμ, Rafael J. Wysocki wrote:
> On Thursday, May 01, 2014 06:48:08 PM Dirk Brandewie wrote:
>> On 05/01/2014 04:18 PM, Rafael J. Wysocki wrote:
>>> On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
>>>> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
>>>>> Currently the driver calculates the next pstate proportional to
>>>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>>>
>>>>> Using the scaled load (core_busy) to calculate the next pstate
>>>>> is not always correct, because there are cases that the load is
>>>>> independent from current pstate. For example, a tight 'for' loop
>>>>> through many sampling intervals will cause a load of 100% in
>>>>> every pstate.
>>>>>
>>>>> So, change the above method and calculate the next pstate with
>>>>> the assumption that the next pstate should not depend on the
>>>>> current pstate. The next pstate should only be directly
>>>>> proportional to measured load.
>>>>>
>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz.
>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>>> increase ~1.5% in performance. Below the test results using turbostat
>>>>> (5 iterations):
>>>>>
>>>>> Without patch:
>>>>>
>>>>> Ph. avg Time      Total time      PkgWatt         Total Energy
>>>>>   79.63   266.416         57.74           15382.85984
>>>>>   79.63   265.609         57.87           15370.79283
>>>>>   79.57   266.994         57.54           15362.83476
>>>>>   79.53   265.304         57.83           15342.53032
>>>>>   79.71   265.977         57.76           15362.83152
>>>>> avg       79.61   266.06          57.74           15364.36985
>>>>>
>>>>> With patch:
>>>>>
>>>>> Ph. avg Time      Total time      PkgWatt         Total Energy
>>>>>   78.23   258.826         59.14           15306.96964
>>>>>   78.41   259.110         59.15           15326.35650
>>>>>   78.40   258.530         59.26           15320.48780
>>>>>   78.46   258.673         59.20           15313.44160
>>>>>   78.19   259.075         59.16           15326.87700
>>>>> avg       78.34   258.842         59.18           15318.82650
>>>>>
>>>>> The total test time reduced by ~2.6%, while the total energy
>>>>> consumption during a test iteration reduced by ~0.35%
>>>>>
>>>>> Signed-off-by: Stratos Karafotis <[email protected]>
>>>>> ---
>>>>>
>>>>> Changes v1 -> v2
>>>>>   - Enhance change log as Rafael and Viresh suggested
>>>>>
>>>>>
>>>>>    drivers/cpufreq/intel_pstate.c | 15 +++++++--------
>>>>>    1 file changed, 7 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cpufreq/intel_pstate.c 
>>>>> b/drivers/cpufreq/intel_pstate.c
>>>>> index 0999673..8e309db 100644
>>>>> --- a/drivers/cpufreq/intel_pstate.c
>>>>> +++ b/drivers/cpufreq/intel_pstate.c
>>>>> @@ -608,28 +608,27 @@ static inline void 
>>>>> intel_pstate_set_sample_time(struct cpudata *cpu)
>>>>>           mod_timer_pinned(&cpu->timer, jiffies + delay);
>>>>>    }
>>>>>
>>>>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>>>>> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
>>>>>    {
>>>>> - int32_t core_busy, max_pstate, current_pstate;
>>>>> + int32_t core_busy, max_pstate;
>>>>>
>>>>>           core_busy = cpu->sample.core_pct_busy;
>>>>>           max_pstate = int_tofp(cpu->pstate.max_pstate);
>>>>> - current_pstate = int_tofp(cpu->pstate.current_pstate);
>>>>> - core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
>>>>> + core_busy = mul_fp(core_busy, max_pstate);
>>>>
>>>> NAK,  The goal of this code is to find out how busy the core is at the 
>>>> current
>>>> P state. This change will return a value WAY too high.
>>>>
>>>> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this 
>>>> code
>>>> would return a busy value of 3400. The PID  is trying to keep the busy 
>>>> value
>>>> at the setpoint any value of ~3% will drive the P state to the highest 
>>>> turbo
>>>> P state in this example.
>>>
>>> Well, the problem is that the numbers above indicate an improvement in 
>>> energy
>>> efficiency as a result of this patch and we need to explain that result.
>>>
>> The performance governor is the best option for this workload.
>>
>> This change will give you the highest trubo for all but very idle work loads.
> 
> I see.
> 
>> Lets say you have a processor with max P state of 3.4GHz  The current P state
>> is 1.6 GHz so if the processor was 100% in C0 the core_busy values would be
>> 47% This number scaled would be 100%.  With the change above the PID would be
>> reacting to a load of 1598%.  APERF/MPERF give you the percent of entire
>> core scaling it lets you find out how busy your are within the cureent P 
>> state.
> 
> OK
> 
> Stratos seems to be arguing that we can achieve better results, in therms of
> both performance and energy efficiency, if we disregard the history and only
> take the current situation into account.  The patch itself may not be correct,
> but the idea is worth consideration in my opinion, especially in the face
> of the fact that we made a similar change in cpufreq and the results improved.
> 
>


First of all, thank you very much all of you for your time reviewing this patch 
and
for your help!

With the hope that I will not waste more your time, I will try to implement
correctly this assumption, and test it.

My first thought, after Dirk's explanation, is to try to fix the logic in the 
patch
using the c0_pct as the core_pct_busy in the intel_pstate_calc_busy function. 
Thus:

-        sample->core_pct_busy = mul_fp(core_pct, c0_pct);
+        sample->core_pct_busy = c0_pct;

But, I'm not sure if c0_pct is (always) equivalent to CPU load.
Otherwise, I will try to get the absolute load using the get_cpu_idle_time_us 
and
calculate the next pstate directly proportional to this load.


Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate

Reply via email to