Re: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems

Rafael J. Wysocki Tue, 08 Oct 2019 03:49:55 -0700

On Tue, Oct 8, 2019 at 11:51 AM Rafael J. Wysocki <raf...@kernel.org> wrote:
>
> On Tue, Oct 8, 2019 at 8:20 AM Doug Smythies <dsmyth...@telus.net> wrote:
> >
> > On 2019.10.06 08:34 Rafael J. Wysocki wrote:
> > > On Sun, Oct 6, 2019 at 4:46 PM Doug Smythies <dsmyth...@telus.net> wrote:
> > >> On 2019.10.01 02:32 Rafael J. Wysocki wrote:
> > >>> On Sun, Sep 29, 2019 at 6:05 PM Doug Smythies <dsmyth...@telus.net> 
> > >>> wrote:
> > >>>> On 2019.09.26 09:32 Doug Smythies wrote:
> > >>>>
> > >>>>> If the deepest idle state is disabled, the system
> > >>>>> can become somewhat unstable, with anywhere between no problem
> > >>>>> at all, to the occasional temporary jump using a lot more
> > >>>>> power for a few seconds, to a permanent jump using a lot more
> > >>>>> power continuously. I have been unable to isolate the exact
> > >>>>> test load conditions under which this will occur. However,
> > >>>>> temporarily disabling and then enabling other idle states
> > >>>>> seems to make for a somewhat repeatable test. It is important
> > >>>>> to note that the issue occurs with only ever disabling the deepest
> > >>>>> idle state, just not reliably.
> > >>>>>
> > >>>>> I want to know how you want to proceed before I do a bunch of
> > >>>>> regression testing.
> > >>>>
> > >> I do not think I stated it clearly before: The problem here is that some 
> > >> CPUs
> > >> seem to get stuck in idle state 0, and when they do power consumption 
> > >> spikes,
> > >> often by several hundred % and often indefinitely.
> > >
> > > That indeed has not been clear to me, thanks for the clarification!
> >
> > >
> > >> I made a hack job automated test:
> > >> Kernel  tests                 fail rate
> > >> 5.4-rc1                6616           13.45%
> > >> 5.3              2376            4.50%
> > >> 5.3-teov7       12136            0.00%  <<< teo.c reverted and teov7 put 
> > >> in its place.
> > >> 5.4-rc1-ds      11168        0.00%  <<< [old] proposed patch (> 7 hours 
> > >> test time)
> >
> >
> >    5.4-rc1-ds12   4224          0.005 <<< new proposed patch
> >
> > >>
> > >> [old] Proposed patch (on top of kernel 5.4-rc1): [deleted]
> >
> > > This change may cause the deepest state to be selected even if its
> > > "hits" metric is less than the "misses" one AFAICS, in which case the
> > > max_early_index state should be selected instead.
> > >
> > > It looks like the max_early_index computation is broken when the
> > > deepest state is disabled.
> >
> > O.K. Thanks for your quick reply, and insight.
> >
> > I think long durations always need to be counted, but currently if
> > the deepest idle state is disabled, they are not.
> > How about this?:
> > (test results added above, more tests pending if this might be a path 
> > forward.)
> >
> > diff --git a/drivers/cpuidle/governors/teo.c 
> > b/drivers/cpuidle/governors/teo.c
> > index b5a0e49..a970d2c 100644
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -155,10 +155,12 @@ static void teo_update(struct cpuidle_driver *drv, 
> > struct cpuidle_device *dev)
> >
> >                 cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT;
> >
> > -               if (drv->states[i].target_residency <= sleep_length_us) {
> > -                       idx_timer = i;
> > -                       if (drv->states[i].target_residency <= measured_us)
> > -                               idx_hit = i;
> > +               if (!(drv->states[i].disabled || 
> > dev->states_usage[i].disable)){
> > +                       if (drv->states[i].target_residency <= 
> > sleep_length_us) {
> > +                               idx_timer = i;
> > +                               if (drv->states[i].target_residency <= 
> > measured_us)
> > +                                       idx_hit = i;
> > +                       }
>
> What if the state is enabled again after some time?


Actually, the states are treated as "bins" here, so for the metrics it
doesn't matter whether or not they are enabled at the moment.

> >                 }
> >         }
> >
> > @@ -256,39 +258,25 @@ static int teo_select(struct cpuidle_driver *drv, 
> > struct cpuidle_device *dev,
> >                 struct cpuidle_state *s = &drv->states[i];
> >                 struct cpuidle_state_usage *su = &dev->states_usage[i];
> >
> > -               if (s->disabled || su->disable) {
> > -                       /*
> > -                        * If the "early hits" metric of a disabled state is
> > -                        * greater than the current maximum, it should be 
> > taken
> > -                        * into account, because it would be a mistake to 
> > select
> > -                        * a deeper state with lower "early hits" metric.  
> > The
> > -                        * index cannot be changed to point to it, however, 
> > so
> > -                        * just increase the max count alone and let the 
> > index
> > -                        * still point to a shallower idle state.
> > -                        */
> > -                       if (max_early_idx >= 0 &&
> > -                           count < cpu_data->states[i].early_hits)
> > -                               count = cpu_data->states[i].early_hits;
> > -
> > -                       continue;
>
> AFAICS, adding early_hits to count is not a mistake if there are still
> enabled states deeper than the current one.

And the mistake appears to be that the "hits" and "misses" metrics
aren't handled in analogy with the "early_hits" one when the current
state is disabled.

Let me try to cut a patch to address that.

Re: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems

Reply via email to