On Wednesday, December 16, 2015 12:20:29 PM Chen Yu wrote:
> It is reported that, with CONFIG_HZ_PERIODIC=y cpu stays at the
> lowest frequency even if the usage goes to 100%, neither ondemand
> nor conservative governor works, however performance and
> userspace work as expected. If set with CONFIG_NO_HZ_FULL=y,
> everything goes well.
> 
> This problem is caused by improper calculation of the idle_time
> when the load is extremely high(near 100%). Firstly, cpufreq_governor
> uses get_cpu_idle_time to get the total idle time for specific cpu, then:
> 
> 1.If the system is configured with CONFIG_NO_HZ_FULL, the idle time is
>   returned by ktime_get, which is always increasing, it's OK.
> 2.However, if the system is configured with CONFIG_HZ_PERIODIC,
>   get_cpu_idle_time might not guarantee to be always increasing,
>   because it will leverage get_cpu_idle_time_jiffy to calculate the
>   idle_time, consider the following scenario:
> 
> At T1:
> idle_tick_1 = total_tick_1 - user_tick_1
> 
> sample period(80ms)...
> 
> At T2: ( T2 = T1 + 80ms):
> idle_tick_2 = total_tick_2 - user_tick_2
> 
> Currently the algorithm is using (idle_tick_2 - idle_tick_1) to
> get the delta idle_time during the past sample period, however
> it CAN NOT guarantee that idle_tick_2 >= idle_tick_1, especially
> when cpu load is high.
> (Yes, total_tick_2 >= total_tick_1, and user_tick_2 >= user_tick_1,
> but how about idle_tick_2 and idle_tick_1? No guarantee.)
> So governor might get a negative value of idle_time during the past
> sample period, which might mislead the system that the idle time is
> very big(converted to unsigned int), and the busy time is nearly zero,
> which causes the governor to always choose the lowest cpufreq,
> then cause this problem.
> 
> In theory there are two solutions:
> 
> 1.The logic should not rely on the idle tick during every sample period,
>   but be based on the busy tick directly, as this is how 'top' is
>   implemented.
> 
> 2.Or the logic must make sure that the idle_time is strictly increasing
>   during each sample period, then there would be no negative idle_time
>   anymore. This solution requires minimum modification to current code
>   and this patch uses method 2.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821
> Reported-by: Jan Fikar <j.fi...@gmail.com>
> Signed-off-by: Chen Yu <yu.c.c...@intel.com>

Applied, thanks!

> ---
>  drivers/cpufreq/cpufreq_governor.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c 
> b/drivers/cpufreq/cpufreq_governor.c
> index b260576..363445f 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -84,6 +84,9 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>                       (cur_wall_time - j_cdbs->prev_cpu_wall);
>               j_cdbs->prev_cpu_wall = cur_wall_time;
>  
> +             if (cur_idle_time < j_cdbs->prev_cpu_idle)
> +                     cur_idle_time = j_cdbs->prev_cpu_idle;
> +
>               idle_time = (unsigned int)
>                       (cur_idle_time - j_cdbs->prev_cpu_idle);
>               j_cdbs->prev_cpu_idle = cur_idle_time;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to