Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-13 Thread Rafael J. Wysocki
On Sun, Aug 12, 2018 at 3:44 PM  wrote:
>
> On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
> > > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > > >  {
> > > > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > > - int i;
> > > > > > - int first_idx;
> > > > > > - int idx;
> > > > > > + int first_idx = 0;
> > > > > > + int idx, i;
> > > > > >   unsigned int interactivity_req;
> > > > > >   unsigned int expected_interval;
> > > > > >   unsigned long nr_iowaiters, cpu_load;
> > > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > > >   /* determine the expected residency time, round up */
> > > > > >   data->next_timer_us = 
> > > > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > > > >
> > > > > > + /*
> > > > > > +  * If the tick is already stopped, the cost of possible short 
> > > > > > idle
> > > > > > +  * duration misprediction is much higher, because the CPU may 
> > > > > > be stuck
> > > > > > +  * in a shallow idle state for a long time as a result of it. 
> > > > > >  In that
> > > > > > +  * case say we might mispredict and use the known time till 
> > > > > > the closest
> > > > > > +  * timer event for the idle state selection.
> > > > > > +  */
> > > > > > + if (tick_nohz_tick_stopped()) {
> > > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > > + goto select;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > This introduce two potential issues:
> > > > >
> > > > > - This will totally ignore the typical pattern in idle loop; I
> > > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > > >   with consistent interval;
> > > >
> > > > I'm not sure what you mean by "ignore".
> > >
> > > You could see after move code from blow to this position, the typical
> > > pattern interval will not be accounted; so if in the middle of idles
> > > there have a bunch of interrupts with fix pattern, the upper code
> > > cannot detect this pattern anymore.
> >
> > I'm not really following you here.
> >
> > The part of the code skipped for tick_nohz_tick_stopped() doesn't
> > update the data at all AFAICS.  It only computes some values that
> > would be discarded later anyway, so I'm not sure what the point of
> > running that computation is.
>
> Sorry I don't explain clearly, so try to rephrase:
>
> With your patch for the tick stopped case, it directly uses tick delta
> value as prediction and goto 'select' tag.  So it skips below code
> pieces, these codes have minor improvement for typical pattern which
> can be applied in the middle of idles, for example, the mmc driver
> triggers 16 interrupts with ~1500us interval, these interrupts are all
> handled within the idle loop, so the typical pattern can detect the mmc
> interrupts pattern and it will help idle governor to select a shallower
> idle state so can avoid to break the residency.
>
> You mentioned these computed values would be discarded later, this is
> true for most cases, but it isn't always true actually.  Without your
> patch, the governor will discard the computed values only when
> 'data->predicted_us < TICK_USEC', otherwise the interval pattern is
> still be applied in the prediction.

OK, right.

I'll fix that up in v4, thanks!


Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-13 Thread Rafael J. Wysocki
On Sun, Aug 12, 2018 at 3:44 PM  wrote:
>
> On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
> > > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > > >  {
> > > > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > > - int i;
> > > > > > - int first_idx;
> > > > > > - int idx;
> > > > > > + int first_idx = 0;
> > > > > > + int idx, i;
> > > > > >   unsigned int interactivity_req;
> > > > > >   unsigned int expected_interval;
> > > > > >   unsigned long nr_iowaiters, cpu_load;
> > > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > > >   /* determine the expected residency time, round up */
> > > > > >   data->next_timer_us = 
> > > > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > > > >
> > > > > > + /*
> > > > > > +  * If the tick is already stopped, the cost of possible short 
> > > > > > idle
> > > > > > +  * duration misprediction is much higher, because the CPU may 
> > > > > > be stuck
> > > > > > +  * in a shallow idle state for a long time as a result of it. 
> > > > > >  In that
> > > > > > +  * case say we might mispredict and use the known time till 
> > > > > > the closest
> > > > > > +  * timer event for the idle state selection.
> > > > > > +  */
> > > > > > + if (tick_nohz_tick_stopped()) {
> > > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > > + goto select;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > This introduce two potential issues:
> > > > >
> > > > > - This will totally ignore the typical pattern in idle loop; I
> > > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > > >   with consistent interval;
> > > >
> > > > I'm not sure what you mean by "ignore".
> > >
> > > You could see after move code from blow to this position, the typical
> > > pattern interval will not be accounted; so if in the middle of idles
> > > there have a bunch of interrupts with fix pattern, the upper code
> > > cannot detect this pattern anymore.
> >
> > I'm not really following you here.
> >
> > The part of the code skipped for tick_nohz_tick_stopped() doesn't
> > update the data at all AFAICS.  It only computes some values that
> > would be discarded later anyway, so I'm not sure what the point of
> > running that computation is.
>
> Sorry I don't explain clearly, so try to rephrase:
>
> With your patch for the tick stopped case, it directly uses tick delta
> value as prediction and goto 'select' tag.  So it skips below code
> pieces, these codes have minor improvement for typical pattern which
> can be applied in the middle of idles, for example, the mmc driver
> triggers 16 interrupts with ~1500us interval, these interrupts are all
> handled within the idle loop, so the typical pattern can detect the mmc
> interrupts pattern and it will help idle governor to select a shallower
> idle state so can avoid to break the residency.
>
> You mentioned these computed values would be discarded later, this is
> true for most cases, but it isn't always true actually.  Without your
> patch, the governor will discard the computed values only when
> 'data->predicted_us < TICK_USEC', otherwise the interval pattern is
> still be applied in the prediction.

OK, right.

I'll fix that up in v4, thanks!


Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-12 Thread leo . yan
On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:

[...]

> > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > >  {
> > > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > - int i;
> > > > > - int first_idx;
> > > > > - int idx;
> > > > > + int first_idx = 0;
> > > > > + int idx, i;
> > > > >   unsigned int interactivity_req;
> > > > >   unsigned int expected_interval;
> > > > >   unsigned long nr_iowaiters, cpu_load;
> > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > >   /* determine the expected residency time, round up */
> > > > >   data->next_timer_us = 
> > > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > > >
> > > > > + /*
> > > > > +  * If the tick is already stopped, the cost of possible short 
> > > > > idle
> > > > > +  * duration misprediction is much higher, because the CPU may 
> > > > > be stuck
> > > > > +  * in a shallow idle state for a long time as a result of it.  
> > > > > In that
> > > > > +  * case say we might mispredict and use the known time till the 
> > > > > closest
> > > > > +  * timer event for the idle state selection.
> > > > > +  */
> > > > > + if (tick_nohz_tick_stopped()) {
> > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > + goto select;
> > > > > + }
> > > > > +
> > > >
> > > > This introduce two potential issues:
> > > >
> > > > - This will totally ignore the typical pattern in idle loop; I
> > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > >   with consistent interval;
> > >
> > > I'm not sure what you mean by "ignore".
> >
> > You could see after move code from blow to this position, the typical
> > pattern interval will not be accounted; so if in the middle of idles
> > there have a bunch of interrupts with fix pattern, the upper code
> > cannot detect this pattern anymore.
> 
> I'm not really following you here.
> 
> The part of the code skipped for tick_nohz_tick_stopped() doesn't
> update the data at all AFAICS.  It only computes some values that
> would be discarded later anyway, so I'm not sure what the point of
> running that computation is.

Sorry I don't explain clearly, so try to rephrase:

With your patch for the tick stopped case, it directly uses tick delta
value as prediction and goto 'select' tag.  So it skips below code
pieces, these codes have minor improvement for typical pattern which
can be applied in the middle of idles, for example, the mmc driver
triggers 16 interrupts with ~1500us interval, these interrupts are all
handled within the idle loop, so the typical pattern can detect the mmc
interrupts pattern and it will help idle governor to select a shallower
idle state so can avoid to break the residency.

You mentioned these computed values would be discarded later, this is
true for most cases, but it isn't always true actually.  Without your
patch, the governor will discard the computed values only when
'data->predicted_us < TICK_USEC', otherwise the interval pattern is
still be applied in the prediction.

expected_interval = get_typical_interval(data);
expected_interval = min(expected_interval, data->next_timer_us);

[...]

/*
 * Use the lowest expected idle interval to pick the idle state.
 */
data->predicted_us = min(data->predicted_us, expected_interval);

> The statistics are updated by menu_update() and that still runs and it
> will take the actual wakeup events into account, won't it?

Yes.

> > [...]
> >
> > > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > - expected_interval < TICK_USEC) {
> > > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > + expected_interval < TICK_USEC) && 
> > > > > !tick_nohz_tick_stopped()) {
> > > >
> > > > I am not sure this logic is right... Why not use below checking, so
> > > > for POLLING state we will never ask to stop the tick?
> > > >
> > > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > > > (expected_interval < TICK_USEC && 
> > > > !tick_nohz_tick_stopped())) {
> > > >
> > >
> > > The only effect of it would be setting stop_tick to false, but why
> > > would that matter?
> >
> > Please consider below situation, not sure if this case is existed or
> > not:
> >
> >   step1: first time: enter one idle state with stopping tick;
> >   step2: second time: select POLLING state and tick_nohz_tick_stopped()
> >   is true;
> >
> > So in step2, it cannot set stop_tick to false with below sentence.
> >
> > > > >   unsigned int delta_next_us = ktime_to_us(delta_next);
> > > > >
> > > > >   

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-12 Thread leo . yan
On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:

[...]

> > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > >  {
> > > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > - int i;
> > > > > - int first_idx;
> > > > > - int idx;
> > > > > + int first_idx = 0;
> > > > > + int idx, i;
> > > > >   unsigned int interactivity_req;
> > > > >   unsigned int expected_interval;
> > > > >   unsigned long nr_iowaiters, cpu_load;
> > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > >   /* determine the expected residency time, round up */
> > > > >   data->next_timer_us = 
> > > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > > >
> > > > > + /*
> > > > > +  * If the tick is already stopped, the cost of possible short 
> > > > > idle
> > > > > +  * duration misprediction is much higher, because the CPU may 
> > > > > be stuck
> > > > > +  * in a shallow idle state for a long time as a result of it.  
> > > > > In that
> > > > > +  * case say we might mispredict and use the known time till the 
> > > > > closest
> > > > > +  * timer event for the idle state selection.
> > > > > +  */
> > > > > + if (tick_nohz_tick_stopped()) {
> > > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > > + goto select;
> > > > > + }
> > > > > +
> > > >
> > > > This introduce two potential issues:
> > > >
> > > > - This will totally ignore the typical pattern in idle loop; I
> > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > >   with consistent interval;
> > >
> > > I'm not sure what you mean by "ignore".
> >
> > You could see after move code from blow to this position, the typical
> > pattern interval will not be accounted; so if in the middle of idles
> > there have a bunch of interrupts with fix pattern, the upper code
> > cannot detect this pattern anymore.
> 
> I'm not really following you here.
> 
> The part of the code skipped for tick_nohz_tick_stopped() doesn't
> update the data at all AFAICS.  It only computes some values that
> would be discarded later anyway, so I'm not sure what the point of
> running that computation is.

Sorry I don't explain clearly, so try to rephrase:

With your patch for the tick stopped case, it directly uses tick delta
value as prediction and goto 'select' tag.  So it skips below code
pieces, these codes have minor improvement for typical pattern which
can be applied in the middle of idles, for example, the mmc driver
triggers 16 interrupts with ~1500us interval, these interrupts are all
handled within the idle loop, so the typical pattern can detect the mmc
interrupts pattern and it will help idle governor to select a shallower
idle state so can avoid to break the residency.

You mentioned these computed values would be discarded later, this is
true for most cases, but it isn't always true actually.  Without your
patch, the governor will discard the computed values only when
'data->predicted_us < TICK_USEC', otherwise the interval pattern is
still be applied in the prediction.

expected_interval = get_typical_interval(data);
expected_interval = min(expected_interval, data->next_timer_us);

[...]

/*
 * Use the lowest expected idle interval to pick the idle state.
 */
data->predicted_us = min(data->predicted_us, expected_interval);

> The statistics are updated by menu_update() and that still runs and it
> will take the actual wakeup events into account, won't it?

Yes.

> > [...]
> >
> > > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > - expected_interval < TICK_USEC) {
> > > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > + expected_interval < TICK_USEC) && 
> > > > > !tick_nohz_tick_stopped()) {
> > > >
> > > > I am not sure this logic is right... Why not use below checking, so
> > > > for POLLING state we will never ask to stop the tick?
> > > >
> > > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > > > (expected_interval < TICK_USEC && 
> > > > !tick_nohz_tick_stopped())) {
> > > >
> > >
> > > The only effect of it would be setting stop_tick to false, but why
> > > would that matter?
> >
> > Please consider below situation, not sure if this case is existed or
> > not:
> >
> >   step1: first time: enter one idle state with stopping tick;
> >   step2: second time: select POLLING state and tick_nohz_tick_stopped()
> >   is true;
> >
> > So in step2, it cannot set stop_tick to false with below sentence.
> >
> > > > >   unsigned int delta_next_us = ktime_to_us(delta_next);
> > > > >
> > > > >   

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-12 Thread Rafael J. Wysocki
On Fri, Aug 10, 2018 at 2:31 PM  wrote:
>
> On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 11:20 AM  wrote:
> > >
> > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > > From: Rafael J. Wysocki 
> > > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > > >
> > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > > with stopped tick) missed the case when the target residencies of
> > > > deep idle states of CPUs are above the tick boundary which may cause
> > > > the CPU to get stuck in a shallow idle state for a long time.
> > > >
> > > > Say there are two CPU idle states available: one shallow, with the
> > > > target residency much below the tick boundary and one deep, with
> > > > the target residency significantly above the tick boundary.  In
> > > > that case, if the tick has been stopped already and the expected
> > > > next timer event is relatively far in the future, the governor will
> > > > assume the idle duration to be equal to TICK_USEC and it will select
> > > > the idle state for the CPU accordingly.  However, that will cause the
> > > > shallow state to be selected even though it would have been more
> > > > energy-efficient to select the deep one.
> > > >
> > > > To address this issue, modify the governor to always assume idle
> > > > duration to be equal to the time till the closest timer event if
> > > > the tick is not running which will cause the selected idle states
> > > > to always match the known CPU wakeup time.
> > > >
> > > > Also make it always indicate that the tick should be stopped in
> > > > that case for consistency.
> > > >
> > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > > > stopped tick)
> > > > Reported-by: Leo Yan 
> > > > Signed-off-by: Rafael J. Wysocki 
> > > > ---
> > > >
> > > > -> v2: Initialize first_idx properly in the stopped tick case.
> > > >
> > > > ---
> > > >  drivers/cpuidle/governors/menu.c |   55 
> > > > +--
> > > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > > ===
> > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > >  {
> > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > - int i;
> > > > - int first_idx;
> > > > - int idx;
> > > > + int first_idx = 0;
> > > > + int idx, i;
> > > >   unsigned int interactivity_req;
> > > >   unsigned int expected_interval;
> > > >   unsigned long nr_iowaiters, cpu_load;
> > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > >   /* determine the expected residency time, round up */
> > > >   data->next_timer_us = 
> > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > >
> > > > + /*
> > > > +  * If the tick is already stopped, the cost of possible short idle
> > > > +  * duration misprediction is much higher, because the CPU may be 
> > > > stuck
> > > > +  * in a shallow idle state for a long time as a result of it.  In 
> > > > that
> > > > +  * case say we might mispredict and use the known time till the 
> > > > closest
> > > > +  * timer event for the idle state selection.
> > > > +  */
> > > > + if (tick_nohz_tick_stopped()) {
> > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > + goto select;
> > > > + }
> > > > +
> > >
> > > This introduce two potential issues:
> > >
> > > - This will totally ignore the typical pattern in idle loop; I
> > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > >   with consistent interval;
> >
> > I'm not sure what you mean by "ignore".
>
> You could see after move code from blow to this position, the typical
> pattern interval will not be accounted; so if in the middle of idles
> there have a bunch of interrupts with fix pattern, the upper code
> cannot detect this pattern anymore.

I'm not really following you here.

The part of the code skipped for tick_nohz_tick_stopped() doesn't
update the data at all AFAICS.  It only computes some values that
would be discarded later anyway, so I'm not sure what the point of
running that computation is.

The statistics are updated by menu_update() and that still runs and it
will take the actual wakeup events into account, won't it?

> [...]
>
> > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > - expected_interval < TICK_USEC) {
> > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > >
> > > I am 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-12 Thread Rafael J. Wysocki
On Fri, Aug 10, 2018 at 2:31 PM  wrote:
>
> On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 11:20 AM  wrote:
> > >
> > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > > From: Rafael J. Wysocki 
> > > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > > >
> > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > > with stopped tick) missed the case when the target residencies of
> > > > deep idle states of CPUs are above the tick boundary which may cause
> > > > the CPU to get stuck in a shallow idle state for a long time.
> > > >
> > > > Say there are two CPU idle states available: one shallow, with the
> > > > target residency much below the tick boundary and one deep, with
> > > > the target residency significantly above the tick boundary.  In
> > > > that case, if the tick has been stopped already and the expected
> > > > next timer event is relatively far in the future, the governor will
> > > > assume the idle duration to be equal to TICK_USEC and it will select
> > > > the idle state for the CPU accordingly.  However, that will cause the
> > > > shallow state to be selected even though it would have been more
> > > > energy-efficient to select the deep one.
> > > >
> > > > To address this issue, modify the governor to always assume idle
> > > > duration to be equal to the time till the closest timer event if
> > > > the tick is not running which will cause the selected idle states
> > > > to always match the known CPU wakeup time.
> > > >
> > > > Also make it always indicate that the tick should be stopped in
> > > > that case for consistency.
> > > >
> > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > > > stopped tick)
> > > > Reported-by: Leo Yan 
> > > > Signed-off-by: Rafael J. Wysocki 
> > > > ---
> > > >
> > > > -> v2: Initialize first_idx properly in the stopped tick case.
> > > >
> > > > ---
> > > >  drivers/cpuidle/governors/menu.c |   55 
> > > > +--
> > > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > > ===
> > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > >  {
> > > >   struct menu_device *data = this_cpu_ptr(_devices);
> > > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > - int i;
> > > > - int first_idx;
> > > > - int idx;
> > > > + int first_idx = 0;
> > > > + int idx, i;
> > > >   unsigned int interactivity_req;
> > > >   unsigned int expected_interval;
> > > >   unsigned long nr_iowaiters, cpu_load;
> > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > >   /* determine the expected residency time, round up */
> > > >   data->next_timer_us = 
> > > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > > >
> > > > + /*
> > > > +  * If the tick is already stopped, the cost of possible short idle
> > > > +  * duration misprediction is much higher, because the CPU may be 
> > > > stuck
> > > > +  * in a shallow idle state for a long time as a result of it.  In 
> > > > that
> > > > +  * case say we might mispredict and use the known time till the 
> > > > closest
> > > > +  * timer event for the idle state selection.
> > > > +  */
> > > > + if (tick_nohz_tick_stopped()) {
> > > > + data->predicted_us = ktime_to_us(delta_next);
> > > > + goto select;
> > > > + }
> > > > +
> > >
> > > This introduce two potential issues:
> > >
> > > - This will totally ignore the typical pattern in idle loop; I
> > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > >   with consistent interval;
> >
> > I'm not sure what you mean by "ignore".
>
> You could see after move code from blow to this position, the typical
> pattern interval will not be accounted; so if in the middle of idles
> there have a bunch of interrupts with fix pattern, the upper code
> cannot detect this pattern anymore.

I'm not really following you here.

The part of the code skipped for tick_nohz_tick_stopped() doesn't
update the data at all AFAICS.  It only computes some values that
would be discarded later anyway, so I'm not sure what the point of
running that computation is.

The statistics are updated by menu_update() and that still runs and it
will take the actual wakeup events into account, won't it?

> [...]
>
> > > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > - expected_interval < TICK_USEC) {
> > > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > >
> > > I am 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread leo . yan
On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 11:20 AM  wrote:
> >
> > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > From: Rafael J. Wysocki 
> > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > >
> > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > with stopped tick) missed the case when the target residencies of
> > > deep idle states of CPUs are above the tick boundary which may cause
> > > the CPU to get stuck in a shallow idle state for a long time.
> > >
> > > Say there are two CPU idle states available: one shallow, with the
> > > target residency much below the tick boundary and one deep, with
> > > the target residency significantly above the tick boundary.  In
> > > that case, if the tick has been stopped already and the expected
> > > next timer event is relatively far in the future, the governor will
> > > assume the idle duration to be equal to TICK_USEC and it will select
> > > the idle state for the CPU accordingly.  However, that will cause the
> > > shallow state to be selected even though it would have been more
> > > energy-efficient to select the deep one.
> > >
> > > To address this issue, modify the governor to always assume idle
> > > duration to be equal to the time till the closest timer event if
> > > the tick is not running which will cause the selected idle states
> > > to always match the known CPU wakeup time.
> > >
> > > Also make it always indicate that the tick should be stopped in
> > > that case for consistency.
> > >
> > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > > stopped tick)
> > > Reported-by: Leo Yan 
> > > Signed-off-by: Rafael J. Wysocki 
> > > ---
> > >
> > > -> v2: Initialize first_idx properly in the stopped tick case.
> > >
> > > ---
> > >  drivers/cpuidle/governors/menu.c |   55 
> > > +--
> > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > >
> > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > ===
> > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > >  {
> > >   struct menu_device *data = this_cpu_ptr(_devices);
> > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > - int i;
> > > - int first_idx;
> > > - int idx;
> > > + int first_idx = 0;
> > > + int idx, i;
> > >   unsigned int interactivity_req;
> > >   unsigned int expected_interval;
> > >   unsigned long nr_iowaiters, cpu_load;
> > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > >   /* determine the expected residency time, round up */
> > >   data->next_timer_us = 
> > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > >
> > > + /*
> > > +  * If the tick is already stopped, the cost of possible short idle
> > > +  * duration misprediction is much higher, because the CPU may be 
> > > stuck
> > > +  * in a shallow idle state for a long time as a result of it.  In 
> > > that
> > > +  * case say we might mispredict and use the known time till the 
> > > closest
> > > +  * timer event for the idle state selection.
> > > +  */
> > > + if (tick_nohz_tick_stopped()) {
> > > + data->predicted_us = ktime_to_us(delta_next);
> > > + goto select;
> > > + }
> > > +
> >
> > This introduce two potential issues:
> >
> > - This will totally ignore the typical pattern in idle loop; I
> >   observed on the mmc driver can trigger multiple times (> 10 times)
> >   with consistent interval;
> 
> I'm not sure what you mean by "ignore".

You could see after move code from blow to this position, the typical
pattern interval will not be accounted; so if in the middle of idles
there have a bunch of interrupts with fix pattern, the upper code
cannot detect this pattern anymore.

[...]

> > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > - expected_interval < TICK_USEC) {
> > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> >
> > I am not sure this logic is right... Why not use below checking, so
> > for POLLING state we will never ask to stop the tick?
> >
> > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> >
> 
> The only effect of it would be setting stop_tick to false, but why
> would that matter?

Please consider below situation, not sure if this case is existed or
not:

  step1: first time: enter one idle state with stopping tick;
  step2: second time: select POLLING state and tick_nohz_tick_stopped()
  is true;

So in step2, it cannot set stop_tick to false 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread leo . yan
On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 11:20 AM  wrote:
> >
> > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > From: Rafael J. Wysocki 
> > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > >
> > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > with stopped tick) missed the case when the target residencies of
> > > deep idle states of CPUs are above the tick boundary which may cause
> > > the CPU to get stuck in a shallow idle state for a long time.
> > >
> > > Say there are two CPU idle states available: one shallow, with the
> > > target residency much below the tick boundary and one deep, with
> > > the target residency significantly above the tick boundary.  In
> > > that case, if the tick has been stopped already and the expected
> > > next timer event is relatively far in the future, the governor will
> > > assume the idle duration to be equal to TICK_USEC and it will select
> > > the idle state for the CPU accordingly.  However, that will cause the
> > > shallow state to be selected even though it would have been more
> > > energy-efficient to select the deep one.
> > >
> > > To address this issue, modify the governor to always assume idle
> > > duration to be equal to the time till the closest timer event if
> > > the tick is not running which will cause the selected idle states
> > > to always match the known CPU wakeup time.
> > >
> > > Also make it always indicate that the tick should be stopped in
> > > that case for consistency.
> > >
> > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > > stopped tick)
> > > Reported-by: Leo Yan 
> > > Signed-off-by: Rafael J. Wysocki 
> > > ---
> > >
> > > -> v2: Initialize first_idx properly in the stopped tick case.
> > >
> > > ---
> > >  drivers/cpuidle/governors/menu.c |   55 
> > > +--
> > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > >
> > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > ===
> > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > >  {
> > >   struct menu_device *data = this_cpu_ptr(_devices);
> > >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > - int i;
> > > - int first_idx;
> > > - int idx;
> > > + int first_idx = 0;
> > > + int idx, i;
> > >   unsigned int interactivity_req;
> > >   unsigned int expected_interval;
> > >   unsigned long nr_iowaiters, cpu_load;
> > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > >   /* determine the expected residency time, round up */
> > >   data->next_timer_us = 
> > > ktime_to_us(tick_nohz_get_sleep_length(_next));
> > >
> > > + /*
> > > +  * If the tick is already stopped, the cost of possible short idle
> > > +  * duration misprediction is much higher, because the CPU may be 
> > > stuck
> > > +  * in a shallow idle state for a long time as a result of it.  In 
> > > that
> > > +  * case say we might mispredict and use the known time till the 
> > > closest
> > > +  * timer event for the idle state selection.
> > > +  */
> > > + if (tick_nohz_tick_stopped()) {
> > > + data->predicted_us = ktime_to_us(delta_next);
> > > + goto select;
> > > + }
> > > +
> >
> > This introduce two potential issues:
> >
> > - This will totally ignore the typical pattern in idle loop; I
> >   observed on the mmc driver can trigger multiple times (> 10 times)
> >   with consistent interval;
> 
> I'm not sure what you mean by "ignore".

You could see after move code from blow to this position, the typical
pattern interval will not be accounted; so if in the middle of idles
there have a bunch of interrupts with fix pattern, the upper code
cannot detect this pattern anymore.

[...]

> > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > - expected_interval < TICK_USEC) {
> > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> >
> > I am not sure this logic is right... Why not use below checking, so
> > for POLLING state we will never ask to stop the tick?
> >
> > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> >
> 
> The only effect of it would be setting stop_tick to false, but why
> would that matter?

Please consider below situation, not sure if this case is existed or
not:

  step1: first time: enter one idle state with stopping tick;
  step2: second time: select POLLING state and tick_nohz_tick_stopped()
  is true;

So in step2, it cannot set stop_tick to false 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread Rafael J. Wysocki
On Fri, Aug 10, 2018 at 11:20 AM  wrote:
>
> On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki 
> > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary.  In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly.  However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always assume idle
> > duration to be equal to the time till the closest timer event if
> > the tick is not running which will cause the selected idle states
> > to always match the known CPU wakeup time.
> >
> > Also make it always indicate that the tick should be stopped in
> > that case for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > stopped tick)
> > Reported-by: Leo Yan 
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> >
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > ---
> >  drivers/cpuidle/governors/menu.c |   55 
> > +--
> >  1 file changed, 25 insertions(+), 30 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> >  {
> >   struct menu_device *data = this_cpu_ptr(_devices);
> >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > - int i;
> > - int first_idx;
> > - int idx;
> > + int first_idx = 0;
> > + int idx, i;
> >   unsigned int interactivity_req;
> >   unsigned int expected_interval;
> >   unsigned long nr_iowaiters, cpu_load;
> > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> >   /* determine the expected residency time, round up */
> >   data->next_timer_us = 
> > ktime_to_us(tick_nohz_get_sleep_length(_next));
> >
> > + /*
> > +  * If the tick is already stopped, the cost of possible short idle
> > +  * duration misprediction is much higher, because the CPU may be stuck
> > +  * in a shallow idle state for a long time as a result of it.  In that
> > +  * case say we might mispredict and use the known time till the 
> > closest
> > +  * timer event for the idle state selection.
> > +  */
> > + if (tick_nohz_tick_stopped()) {
> > + data->predicted_us = ktime_to_us(delta_next);
> > + goto select;
> > + }
> > +
>
> This introduce two potential issues:
>
> - This will totally ignore the typical pattern in idle loop; I
>   observed on the mmc driver can trigger multiple times (> 10 times)
>   with consistent interval;

I'm not sure what you mean by "ignore".

>  but I have no strong opinion to not  use next timer event for this case.

OK

> - Will this break correction factors when the CPU exit from idle?
>   data->bucket is stale value 

Good point.

I'll send a v3 with this addressed.

>
> >   get_iowait_load(_iowaiters, _load);
> >   data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
> >
> > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> >   expected_interval = get_typical_interval(data);
> >   expected_interval = min(expected_interval, data->next_timer_us);
> >
> > - first_idx = 0;
> >   if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> >   struct cpuidle_state *s = >states[1];
> >   unsigned int polling_threshold;
> > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> >*/
> >   data->predicted_us = min(data->predicted_us, expected_interval);
> >
> > - if (tick_nohz_tick_stopped()) {
> > - /*
> > -  * If the tick is already stopped, the cost of possible short
> > -  * idle duration misprediction is much higher, because the CPU
> > -  * may be stuck in a shallow idle state for a long time as a
> > -  * result of it.  In that case say we might mispredict and try
> > -  * to force the CPU into a state for which we would have 
> > stopped
> > -  * 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread Rafael J. Wysocki
On Fri, Aug 10, 2018 at 11:20 AM  wrote:
>
> On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki 
> > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary.  In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly.  However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always assume idle
> > duration to be equal to the time till the closest timer event if
> > the tick is not running which will cause the selected idle states
> > to always match the known CPU wakeup time.
> >
> > Also make it always indicate that the tick should be stopped in
> > that case for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> > stopped tick)
> > Reported-by: Leo Yan 
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> >
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > ---
> >  drivers/cpuidle/governors/menu.c |   55 
> > +--
> >  1 file changed, 25 insertions(+), 30 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> >  {
> >   struct menu_device *data = this_cpu_ptr(_devices);
> >   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > - int i;
> > - int first_idx;
> > - int idx;
> > + int first_idx = 0;
> > + int idx, i;
> >   unsigned int interactivity_req;
> >   unsigned int expected_interval;
> >   unsigned long nr_iowaiters, cpu_load;
> > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> >   /* determine the expected residency time, round up */
> >   data->next_timer_us = 
> > ktime_to_us(tick_nohz_get_sleep_length(_next));
> >
> > + /*
> > +  * If the tick is already stopped, the cost of possible short idle
> > +  * duration misprediction is much higher, because the CPU may be stuck
> > +  * in a shallow idle state for a long time as a result of it.  In that
> > +  * case say we might mispredict and use the known time till the 
> > closest
> > +  * timer event for the idle state selection.
> > +  */
> > + if (tick_nohz_tick_stopped()) {
> > + data->predicted_us = ktime_to_us(delta_next);
> > + goto select;
> > + }
> > +
>
> This introduce two potential issues:
>
> - This will totally ignore the typical pattern in idle loop; I
>   observed on the mmc driver can trigger multiple times (> 10 times)
>   with consistent interval;

I'm not sure what you mean by "ignore".

>  but I have no strong opinion to not  use next timer event for this case.

OK

> - Will this break correction factors when the CPU exit from idle?
>   data->bucket is stale value 

Good point.

I'll send a v3 with this addressed.

>
> >   get_iowait_load(_iowaiters, _load);
> >   data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
> >
> > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> >   expected_interval = get_typical_interval(data);
> >   expected_interval = min(expected_interval, data->next_timer_us);
> >
> > - first_idx = 0;
> >   if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> >   struct cpuidle_state *s = >states[1];
> >   unsigned int polling_threshold;
> > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> >*/
> >   data->predicted_us = min(data->predicted_us, expected_interval);
> >
> > - if (tick_nohz_tick_stopped()) {
> > - /*
> > -  * If the tick is already stopped, the cost of possible short
> > -  * idle duration misprediction is much higher, because the CPU
> > -  * may be stuck in a shallow idle state for a long time as a
> > -  * result of it.  In that case say we might mispredict and try
> > -  * to force the CPU into a state for which we would have 
> > stopped
> > -  * 

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread leo . yan
On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki 
> Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
> 
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> stopped tick)
> Reported-by: Leo Yan 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> -> v2: Initialize first_idx properly in the stopped tick case.
> 
> ---
>  drivers/cpuidle/governors/menu.c |   55 
> +--
>  1 file changed, 25 insertions(+), 30 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
>  {
>   struct menu_device *data = this_cpu_ptr(_devices);
>   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> - int i;
> - int first_idx;
> - int idx;
> + int first_idx = 0;
> + int idx, i;
>   unsigned int interactivity_req;
>   unsigned int expected_interval;
>   unsigned long nr_iowaiters, cpu_load;
> @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
>   /* determine the expected residency time, round up */
>   data->next_timer_us = 
> ktime_to_us(tick_nohz_get_sleep_length(_next));
>  
> + /*
> +  * If the tick is already stopped, the cost of possible short idle
> +  * duration misprediction is much higher, because the CPU may be stuck
> +  * in a shallow idle state for a long time as a result of it.  In that
> +  * case say we might mispredict and use the known time till the closest
> +  * timer event for the idle state selection.
> +  */
> + if (tick_nohz_tick_stopped()) {
> + data->predicted_us = ktime_to_us(delta_next);
> + goto select;
> + }
> +

This introduce two potential issues:

- This will totally ignore the typical pattern in idle loop; I
  observed on the mmc driver can trigger multiple times (> 10 times)
  with consistent interval;  but I have no strong opinion to not
  use next timer event for this case.

- Will this break correction factors when the CPU exit from idle?
  data->bucket is stale value 

>   get_iowait_load(_iowaiters, _load);
>   data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>  
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
>   expected_interval = get_typical_interval(data);
>   expected_interval = min(expected_interval, data->next_timer_us);
>  
> - first_idx = 0;
>   if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
>   struct cpuidle_state *s = >states[1];
>   unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
>*/
>   data->predicted_us = min(data->predicted_us, expected_interval);
>  
> - if (tick_nohz_tick_stopped()) {
> - /*
> -  * If the tick is already stopped, the cost of possible short
> -  * idle duration misprediction is much higher, because the CPU
> -  * may be stuck in a shallow idle state for a long time as a
> -  * result of it.  In that case say we might mispredict and try
> -  * to force the CPU into a state for which we would have stopped
> -  * the tick, unless a timer is going to expire really soon
> -  * anyway.
> -  */
> - if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> -ktime_to_us(delta_next));
> - } else {
> - /*

Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively

2018-08-10 Thread leo . yan
On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki 
> Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
> 
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> stopped tick)
> Reported-by: Leo Yan 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> -> v2: Initialize first_idx properly in the stopped tick case.
> 
> ---
>  drivers/cpuidle/governors/menu.c |   55 
> +--
>  1 file changed, 25 insertions(+), 30 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
>  {
>   struct menu_device *data = this_cpu_ptr(_devices);
>   int latency_req = cpuidle_governor_latency_req(dev->cpu);
> - int i;
> - int first_idx;
> - int idx;
> + int first_idx = 0;
> + int idx, i;
>   unsigned int interactivity_req;
>   unsigned int expected_interval;
>   unsigned long nr_iowaiters, cpu_load;
> @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
>   /* determine the expected residency time, round up */
>   data->next_timer_us = 
> ktime_to_us(tick_nohz_get_sleep_length(_next));
>  
> + /*
> +  * If the tick is already stopped, the cost of possible short idle
> +  * duration misprediction is much higher, because the CPU may be stuck
> +  * in a shallow idle state for a long time as a result of it.  In that
> +  * case say we might mispredict and use the known time till the closest
> +  * timer event for the idle state selection.
> +  */
> + if (tick_nohz_tick_stopped()) {
> + data->predicted_us = ktime_to_us(delta_next);
> + goto select;
> + }
> +

This introduce two potential issues:

- This will totally ignore the typical pattern in idle loop; I
  observed on the mmc driver can trigger multiple times (> 10 times)
  with consistent interval;  but I have no strong opinion to not
  use next timer event for this case.

- Will this break correction factors when the CPU exit from idle?
  data->bucket is stale value 

>   get_iowait_load(_iowaiters, _load);
>   data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>  
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
>   expected_interval = get_typical_interval(data);
>   expected_interval = min(expected_interval, data->next_timer_us);
>  
> - first_idx = 0;
>   if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
>   struct cpuidle_state *s = >states[1];
>   unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
>*/
>   data->predicted_us = min(data->predicted_us, expected_interval);
>  
> - if (tick_nohz_tick_stopped()) {
> - /*
> -  * If the tick is already stopped, the cost of possible short
> -  * idle duration misprediction is much higher, because the CPU
> -  * may be stuck in a shallow idle state for a long time as a
> -  * result of it.  In that case say we might mispredict and try
> -  * to force the CPU into a state for which we would have stopped
> -  * the tick, unless a timer is going to expire really soon
> -  * anyway.
> -  */
> - if (data->predicted_us < TICK_USEC)
> - data->predicted_us = min_t(unsigned int, TICK_USEC,
> -ktime_to_us(delta_next));
> - } else {
> - /*