On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
> 
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c 
> > b/kernel/sched/cpufreq_schedutil.c
> > index e13df951aca7..5c482ec38610 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
> > *sg_policy, u64 time)
> >         !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> >             return false;
> >  
> > -   if (sg_policy->work_in_progress)
> > -           return false;
> > -
> >     if (unlikely(sg_policy->need_freq_update)) {
> >             sg_policy->need_freq_update = false;
> >             /*
> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy 
> > *sg_policy, u64 time,
> >  
> >             policy->cur = next_freq;
> >             trace_cpu_frequency(next_freq, smp_processor_id());
> > -   } else {
> > +   } else if (!sg_policy->work_in_progress) {
> >             sg_policy->work_in_progress = true;
> >             irq_work_queue(&sg_policy->irq_work);
> >     }
> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct 
> > update_util_data *hook, u64 time,
> >  
> >     ignore_dl_rate_limit(sg_cpu, sg_policy);
> >  
> > +   /*
> > +    * For slow-switch systems, single policy requests can't run at the
> > +    * moment if update is in progress, unless we acquire update_lock.
> > +    */
> > +   if (sg_policy->work_in_progress)
> > +           return;
> > +
> >     if (!sugov_should_update_freq(sg_policy, time))
> >             return;
> >  
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, 
> > u64 time, unsigned int flags)
> >  static void sugov_work(struct kthread_work *work)
> >  {
> >     struct sugov_policy *sg_policy = container_of(work, struct 
> > sugov_policy, work);
> > +   unsigned int freq;
> > +   unsigned long flags;
> > +
> > +   /*
> > +    * Hold sg_policy->update_lock shortly to handle the case where:
> > +    * incase sg_policy->next_freq is read here, and then updated by
> > +    * sugov_update_shared just before work_in_progress is set to false
> > +    * here, we may miss queueing the new update.
> > +    *
> > +    * Note: If a work was queued after the update_lock is released,
> > +    * sugov_work will just be called again by kthread_work code; and the
> > +    * request will be proceed before the sugov thread sleeps.
> > +    */
> > +   raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > +   freq = sg_policy->next_freq;
> > +   sg_policy->work_in_progress = false;
> > +   raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >  
> >     mutex_lock(&sg_policy->work_lock);
> > -   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > -                           CPUFREQ_RELATION_L);
> > +   __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> >     mutex_unlock(&sg_policy->work_lock);
> > -
> > -   sg_policy->work_in_progress = false;
> >  }
> 
> And I do see a race here for single policy systems doing slow switching.
> 
> Kthread                                                 Sched update
> 
> sugov_work()                                            sugov_update_single()
> 
>         lock();
>         // The CPU is free to rearrange below           
>         // two in any order, so it may clear
>         // the flag first and then read next
>         // freq. Lets assume it does.
>         work_in_progress = false
> 
>                                                         if (work_in_progress)
>                                                                 return;
> 
>                                                         sg_policy->next_freq 
> = 0;
>         freq = sg_policy->next_freq;
>                                                         sg_policy->next_freq 
> = real-next-freq;
>         unlock();
> 

I agree with the race you describe for single policy slow-switch. Good find :)

The mainline sugov_work could also do such reordering in sugov_work, I think. 
Even
with the mutex_unlock in mainline's sugov_work, that work_in_progress write 
could
be reordered by the CPU to happen before the read of next_freq. AIUI,
mutex_unlock is expected to be only a release-barrier.

Although to be safe, I could just put an smp_mb() there. I believe with that,
no locking would be needed for such case.

I'll send out a v3 with Acks for the original patch, and the send out the
smp_mb() as a separate patch if that's Ok.

thanks,

 - Joel

Reply via email to