On 10/29/2020 15:19,Viresh Kumar<viresh.ku...@linaro.org> wrote: > Your mail client is screwing the "In-reply-to" field of the message > and that prevents it to appear properly in the thread in mailboxes of > other people, please fix that. >
I will try to fix that. > On 29-10-20, 09:43, zhuguangqing83 wrote: > > > diff --git a/kernel/sched/cpufreq_schedutil.c > > > b/kernel/sched/cpufreq_schedutil.c > > > index 0c5c61a095f6..bf7800e853d3 100644 > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct > > > sugov_policy *sg_policy, u64 time, > > > if (sg_policy->next_freq == next_freq) > > > return false; > > > > > > - sg_policy->next_freq = next_freq; > > > sg_policy->last_freq_update_time = time; > > > > > > return true; > > > > It's a little strange that sg_policy->next_freq and > > sg_policy->last_freq_update_time are not updated at the same time. > > > > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy > > > *sg_policy, u64 time, > > > unsigned int next_freq) > > > { > > > if (sugov_update_next_freq(sg_policy, time, next_freq)) > > > - cpufreq_driver_fast_switch(sg_policy->policy, next_freq); > > > + sg_policy->next_freq = > > > cpufreq_driver_fast_switch(sg_policy->policy, next_freq); > > > } > > > > > > > Great, it also takes into account the issue that 0 is returned by the > > driver's ->fast_switch() callback to indicate an error condition. > > Yes but even my change wasn't good enough, more on it later. > > > For policy->min/max may be not the real CPU frequency in OPPs, so > > No, that can't happen. If userspace tries to set a value too large or > too small, we clamp that too to policy->max/min and so the below > problem shall never occur. > > > next_freq got from get_next_freq() which is after clamping between > > policy->min and policy->max may be not the real CPU frequency in OPPs. > > In that case, if we use a real CPU frequency in OPPs returned from > > cpufreq_driver_fast_switch() to compare with next_freq, > > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we > > change the CPU frequency every time schedutil callback gets called from > > the scheduler. I see the current code in get_next_freq() as following, > > the issue mentioned above should not happen. Maybe it's just one of my > > unnecessary worries. > > Coming back to my patch (and yours too), it only fixes the fast-switch > case and not the slow path which can also end up clamping the > frequency. And to be honest, even the drivers can have their own > clamping code in place, no one is stopping them too. > > And we also need to do something about the cached_raw_freq as well, as > it will not be in sync with next_freq anymore. > > Here is another attempt from me tackling this problem. The idea is to > check if the previous freq update went as expected or not. And if not, > we can't rely on next_freq or cached_raw_freq anymore. For this to > work properly, we need to make sure policy->cur isn't getting updated > at the same time when get_next_freq() is running. For that I have > given a different meaning to work_in_progress flag, which now creates > a lockless (kind of) critical section where we won't play with > next_freq while the cpufreq core is updating the frequency. > I think your patch is ok for tackling this problem. > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index 0c5c61a095f6..8991cc31b011 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy > *sg_policy, u64 time, > static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time, > unsigned int next_freq) > { > - if (!sugov_update_next_freq(sg_policy, time, next_freq)) > - return; > - > - if (!sg_policy->work_in_progress) { > - sg_policy->work_in_progress = true; > + if (sugov_update_next_freq(sg_policy, time, next_freq)) > irq_work_queue(&sg_policy->irq_work); > - } > } > > /** > @@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy > *sg_policy, > unsigned int freq = arch_scale_freq_invariant() ? > policy->cpuinfo.max_freq : policy->cur; > > + /* > + * The previous frequency update didn't go as we expected it to be. > Lets > + * start again to make sure we don't miss any updates. > + */ > + if (unlikely(policy->cur != sg_policy->next_freq)) { > + sg_policy->next_freq = 0; > + sg_policy->cached_raw_freq = 0; > + } > + > freq = map_util_freq(util, freq, max); > > if (freq == sg_policy->cached_raw_freq && > !sg_policy->need_freq_update) > @@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data > *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + if (!sg_policy->policy->fast_switch_enabled) { > + raw_spin_lock(&sg_policy->update_lock); > + if (sg_policy->work_in_progress) > + goto unlock; > + } > + Maybe it's better to bring the following code before the code above. if (!sugov_should_update_freq(sg_policy, time)) return; > if (!sugov_should_update_freq(sg_policy, time)) > - return; > + goto unlock; > > /* Limits may have changed, don't skip frequency update */ > busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu); > @@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data > *hook, u64 time, > * concurrently on two different CPUs for the same target and it is > not > * necessary to acquire the lock in the fast switch case. > */ > - if (sg_policy->policy->fast_switch_enabled) { > + if (sg_policy->policy->fast_switch_enabled) > sugov_fast_switch(sg_policy, time, next_f); > - } else { > - raw_spin_lock(&sg_policy->update_lock); > + else > sugov_deferred_update(sg_policy, time, next_f); > + > +unlock: > + if (!sg_policy->policy->fast_switch_enabled) > raw_spin_unlock(&sg_policy->update_lock); > - } > } > > static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 > time) > @@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 > time, unsigned int flags) > > raw_spin_lock(&sg_policy->update_lock); > > + if (sg_policy->work_in_progress) > + goto unlock; > + > sugov_iowait_boost(sg_cpu, time, flags); > sg_cpu->last_update = time; > > @@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 > time, unsigned int flags) > sugov_deferred_update(sg_policy, time, next_f); > } > > +unlock: > raw_spin_unlock(&sg_policy->update_lock); > } > > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct > sugov_policy, work); > - unsigned int freq; > unsigned long flags; > > /* > - * Hold sg_policy->update_lock shortly to handle the case where: > - * incase sg_policy->next_freq is read here, and then updated by > - * sugov_deferred_update() just before work_in_progress is set to > false > - * here, we may miss queueing the new update. > - * > - * Note: If a work was queued after the update_lock is released, > - * sugov_work() will just be called again by kthread_work code; and > the > - * request will be proceed before the sugov thread sleeps. > + * Prevent the schedutil hook to run in parallel while we are updating > + * the frequency here and accessing next_freq. > */ > raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > - freq = sg_policy->next_freq; > - sg_policy->work_in_progress = false; > + sg_policy->work_in_progress = true; > raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > > mutex_lock(&sg_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); > + __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > CPUFREQ_RELATION_L); > mutex_unlock(&sg_policy->work_lock); > + > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); > } > > static void sugov_irq_work(struct irq_work *irq_work) > > -- > viresh