On Fri, Aug 10, 2018 at 11:20 AM <leo....@linaro.org> wrote: > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively > > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states > > with stopped tick) missed the case when the target residencies of > > deep idle states of CPUs are above the tick boundary which may cause > > the CPU to get stuck in a shallow idle state for a long time. > > > > Say there are two CPU idle states available: one shallow, with the > > target residency much below the tick boundary and one deep, with > > the target residency significantly above the tick boundary. In > > that case, if the tick has been stopped already and the expected > > next timer event is relatively far in the future, the governor will > > assume the idle duration to be equal to TICK_USEC and it will select > > the idle state for the CPU accordingly. However, that will cause the > > shallow state to be selected even though it would have been more > > energy-efficient to select the deep one. > > > > To address this issue, modify the governor to always assume idle > > duration to be equal to the time till the closest timer event if > > the tick is not running which will cause the selected idle states > > to always match the known CPU wakeup time. > > > > Also make it always indicate that the tick should be stopped in > > that case for consistency. > > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with > > stopped tick) > > Reported-by: Leo Yan <leo....@linaro.org> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > > --- > > > > -> v2: Initialize first_idx properly in the stopped tick case. > > > > --- > > drivers/cpuidle/governors/menu.c | 55 > > +++++++++++++++++---------------------- > > 1 file changed, 25 insertions(+), 30 deletions(-) > > > > Index: linux-pm/drivers/cpuidle/governors/menu.c > > =================================================================== > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > > +++ linux-pm/drivers/cpuidle/governors/menu.c > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr > > { > > struct menu_device *data = this_cpu_ptr(&menu_devices); > > int latency_req = cpuidle_governor_latency_req(dev->cpu); > > - int i; > > - int first_idx; > > - int idx; > > + int first_idx = 0; > > + int idx, i; > > unsigned int interactivity_req; > > unsigned int expected_interval; > > unsigned long nr_iowaiters, cpu_load; > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr > > /* determine the expected residency time, round up */ > > data->next_timer_us = > > ktime_to_us(tick_nohz_get_sleep_length(&delta_next)); > > > > + /* > > + * If the tick is already stopped, the cost of possible short idle > > + * duration misprediction is much higher, because the CPU may be stuck > > + * in a shallow idle state for a long time as a result of it. In that > > + * case say we might mispredict and use the known time till the > > closest > > + * timer event for the idle state selection. > > + */ > > + if (tick_nohz_tick_stopped()) { > > + data->predicted_us = ktime_to_us(delta_next); > > + goto select; > > + } > > + > > This introduce two potential issues: > > - This will totally ignore the typical pattern in idle loop; I > observed on the mmc driver can trigger multiple times (> 10 times) > with consistent interval;
I'm not sure what you mean by "ignore". > but I have no strong opinion to not use next timer event for this case. OK > - Will this break correction factors when the CPU exit from idle? > data->bucket is stale value .... Good point. I'll send a v3 with this addressed. > > > get_iowait_load(&nr_iowaiters, &cpu_load); > > data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); > > > > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr > > expected_interval = get_typical_interval(data); > > expected_interval = min(expected_interval, data->next_timer_us); > > > > - first_idx = 0; > > if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) { > > struct cpuidle_state *s = &drv->states[1]; > > unsigned int polling_threshold; > > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr > > */ > > data->predicted_us = min(data->predicted_us, expected_interval); > > > > - if (tick_nohz_tick_stopped()) { > > - /* > > - * If the tick is already stopped, the cost of possible short > > - * idle duration misprediction is much higher, because the CPU > > - * may be stuck in a shallow idle state for a long time as a > > - * result of it. In that case say we might mispredict and try > > - * to force the CPU into a state for which we would have > > stopped > > - * the tick, unless a timer is going to expire really soon > > - * anyway. > > - */ > > - if (data->predicted_us < TICK_USEC) > > - data->predicted_us = min_t(unsigned int, TICK_USEC, > > - ktime_to_us(delta_next)); > > - } else { > > - /* > > - * Use the performance multiplier and the user-configurable > > - * latency_req to determine the maximum exit latency. > > - */ > > - interactivity_req = data->predicted_us / > > performance_multiplier(nr_iowaiters, cpu_load); > > - if (latency_req > interactivity_req) > > - latency_req = interactivity_req; > > - } > > + /* > > + * Use the performance multiplier and the user-configurable > > latency_req > > + * to determine the maximum exit latency. > > + */ > > + interactivity_req = data->predicted_us / > > performance_multiplier(nr_iowaiters, cpu_load); > > + if (latency_req > interactivity_req) > > + latency_req = interactivity_req; > > > > +select: > > expected_interval = data->predicted_us; > > /* > > * Find the idle state with the lowest power while satisfying > > @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr > > * Don't stop the tick if the selected state is a polling one or if > > the > > * expected idle duration is shorter than the tick period length. > > */ > > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || > > - expected_interval < TICK_USEC) { > > + if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || > > + expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) { > > I am not sure this logic is right... Why not use below checking, so > for POLLING state we will never ask to stop the tick? > > if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING || > (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) { > The only effect of it would be setting stop_tick to false, but why would that matter? > > unsigned int delta_next_us = ktime_to_us(delta_next); > > > > *stop_tick = false; > > > > - if (!tick_nohz_tick_stopped() && idx > 0 && > > - drv->states[idx].target_residency > delta_next_us) { > > + if (idx > 0 && drv->states[idx].target_residency > > > delta_next_us) { > > /* > > * The tick is not going to be stopped and the target > > * residency of the state to be returned is not within > >