6] sched: idle: Select idle state before stopping the tick

Rafael J. Wysocki Sun, 11 Mar 2018 03:31:17 -0700

On Sunday, March 11, 2018 2:44:37 AM CET Frederic Weisbecker wrote:
> On Fri, Mar 09, 2018 at 10:46:55AM +0100, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/kernel/time/tick-sched.h
> > +++ linux-pm/kernel/time/tick-sched.h
> > @@ -30,6 +30,7 @@ enum tick_nohz_mode {
> >   *                 when the CPU returns from nohz sleep.
> >   * @next_tick:             Next tick to be fired when in dynticks mode.
> >   * @tick_stopped:  Indicator that the idle tick has been stopped
> > + * @tick_may_stop: Indicator that the idle tick may be stopped shortly
> 
> Perhaps we can set timer_expires to 0 instead when we want to invalidate
> the last value?


timer_expires can be 0 for other reasons.

I can use timer_expires_basemono for that (I actually did that in one version
of my patches, but it looked odd and I decided to use a new field for clarity)
if you prefer to avoid adding an extra field.

> >   * @idle_jiffies:  jiffies at the entry to idle for idle time accounting
> >   * @idle_calls:            Total number of idle calls
> >   * @idle_sleeps:   Number of idle calls, where the sched tick was stopped
> > @@ -38,7 +39,6 @@ enum tick_nohz_mode {
> >   * @idle_exittime: Time when the idle state was left
> >   * @idle_sleeptime:        Sum of the time slept in idle with sched tick 
> > stopped
> >   * @iowait_sleeptime:      Sum of the time slept in idle with sched tick 
> > stopped, with IO outstanding
> > - * @sleep_length:  Duration of the current idle sleep
> >   * @do_timer_lst:  CPU was the last one doing do_timer before going idle
> >   */
> >  struct tick_sched {
> > @@ -49,6 +49,7 @@ struct tick_sched {
> >     ktime_t                         next_tick;
> >     int                             inidle;
> >     int                             tick_stopped;
> > +   int                             tick_may_stop;
> >     unsigned long                   idle_jiffies;
> >     unsigned long                   idle_calls;
> >     unsigned long                   idle_sleeps;
> > @@ -58,8 +59,9 @@ struct tick_sched {
> >     ktime_t                         idle_exittime;
> >     ktime_t                         idle_sleeptime;
> >     ktime_t                         iowait_sleeptime;
> > -   ktime_t                         sleep_length;
> >     unsigned long                   last_jiffies;
> > +   u64                             timer_expires;
> > +   u64                             timer_expires_basemono;
> 
> We may need documentation for the above fields too.

OK

> > Index: linux-pm/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux-pm.orig/kernel/time/tick-sched.c
> > +++ linux-pm/kernel/time/tick-sched.c
> > @@ -652,13 +652,10 @@ static inline bool local_timer_softirq_p
> >     return local_softirq_pending() & TIMER_SOFTIRQ;
> >  }
> >  
> > -static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > -                                    ktime_t now, int cpu)
> > +static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu)
> 
> Since we don't seem to have a lower level version, can we remove the 
> underscores?

Sure, will do.

> >  {
> > -   struct clock_event_device *dev = 
> > __this_cpu_read(tick_cpu_device.evtdev);
> >     u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
> >     unsigned long seq, basejiff;
> > -   ktime_t tick;
> >  
> >     /* Read jiffies and the time when jiffies were updated last */
> >     do {
> [...]
> 
> 
> > +static void __tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> 
> (Same comment here about the underscores).

Yup.

> > +{
> > +   struct clock_event_device *dev = 
> > __this_cpu_read(tick_cpu_device.evtdev);
> > +   u64 basemono = ts->timer_expires_basemono;
> > +   u64 expires = ts->timer_expires;
> > +   ktime_t tick = expires;
> > +
> > +   /* Make sure we won't be trying to stop it twice in a row. */
> > +   ts->tick_may_stop = 0;
> > +
> > +   /*
> > +    * If this CPU is the one which updates jiffies, then give up
> > +    * the assignment and let it be taken by the CPU which runs
> > +    * the tick timer next, which might be this CPU as well. If we
> > +    * don't drop this here the jiffies might be stale and
> > +    * do_timer() never invoked. Keep track of the fact that it
> > +    * was the one which had the do_timer() duty last.
> > +    */
> > +   if (cpu == tick_do_timer_cpu) {
> > +           tick_do_timer_cpu = TICK_DO_TIMER_NONE;
> > +           ts->do_timer_last = 1;
> > +   }
> >  
> >     /* Skip reprogram of event if its not changed */
> >     if (ts->tick_stopped && (expires == ts->next_tick)) {
> >             /* Sanity check: make sure clockevent is actually programmed */
> >             if (tick == KTIME_MAX || ts->next_tick == 
> > hrtimer_get_expires(&ts->sched_timer))
> > -                   goto out;
> > +                   return;
> >  
> >             WARN_ON_ONCE(1);
> >             printk_once("basemono: %llu ts->next_tick: %llu 
> > dev->next_event: %llu timer->active: %d timer->expires: %llu\n",
> [...]
> > +void tick_nohz_idle_retain_tick(void)
> > +{
> > +   __this_cpu_write(tick_cpu_sched.tick_may_stop, 0);
> > +}
> > +
> 
> So, I've become overly-paranoid about cached expiration on nohz code, we've 
> run
> into bugs that took months to debug before. It seems that the cached version 
> shouldn't
> leak in any way there, still can we have checks such as this in 
> tick_nohz_idle_enter/exit()?
> 
>      WARN_ON_ONCE(__this_cpu_read(tick_cpu_sched.tick_may_stop));

OK

> Otherwise a leaking cached expiration may mislead further nohz tick stop and
> bypass calls to tick_nohz_next_event().
> 
> Also let's make sure we never call tick_nohz_get_sleep_length() outside idle:
> 
>      WARN_ON_ONCE(!__this_cpu_read(tick_cpu_sched.inidle));

OK

I'll respin the series with the above comments addressed early next week.

Thanks!

Re: [RFC/RFT][PATCH v3 5/6] sched: idle: Select idle state before stopping the tick

Reply via email to