On 2019.10.06 08:34 Rafael J. Wysocki wrote: > On Sun, Oct 6, 2019 at 4:46 PM Doug Smythies <dsmyth...@telus.net> wrote: >> On 2019.10.01 02:32 Rafael J. Wysocki wrote: >>> On Sun, Sep 29, 2019 at 6:05 PM Doug Smythies <dsmyth...@telus.net> wrote: >>>> On 2019.09.26 09:32 Doug Smythies wrote: >>>> >>>>> If the deepest idle state is disabled, the system >>>>> can become somewhat unstable, with anywhere between no problem >>>>> at all, to the occasional temporary jump using a lot more >>>>> power for a few seconds, to a permanent jump using a lot more >>>>> power continuously. I have been unable to isolate the exact >>>>> test load conditions under which this will occur. However, >>>>> temporarily disabling and then enabling other idle states >>>>> seems to make for a somewhat repeatable test. It is important >>>>> to note that the issue occurs with only ever disabling the deepest >>>>> idle state, just not reliably. >>>>> >>>>> I want to know how you want to proceed before I do a bunch of >>>>> regression testing. >>>> >> I do not think I stated it clearly before: The problem here is that some CPUs >> seem to get stuck in idle state 0, and when they do power consumption spikes, >> often by several hundred % and often indefinitely. > > That indeed has not been clear to me, thanks for the clarification!
> >> I made a hack job automated test: >> Kernel tests fail rate >> 5.4-rc1 6616 13.45% >> 5.3 2376 4.50% >> 5.3-teov7 12136 0.00% <<< teo.c reverted and teov7 put in >> its place. >> 5.4-rc1-ds 11168 0.00% <<< [old] proposed patch (> 7 hours test >> time) 5.4-rc1-ds12 4224 0.005 <<< new proposed patch >> >> [old] Proposed patch (on top of kernel 5.4-rc1): [deleted] > This change may cause the deepest state to be selected even if its > "hits" metric is less than the "misses" one AFAICS, in which case the > max_early_index state should be selected instead. > > It looks like the max_early_index computation is broken when the > deepest state is disabled. O.K. Thanks for your quick reply, and insight. I think long durations always need to be counted, but currently if the deepest idle state is disabled, they are not. How about this?: (test results added above, more tests pending if this might be a path forward.) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index b5a0e49..a970d2c 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -155,10 +155,12 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT; - if (drv->states[i].target_residency <= sleep_length_us) { - idx_timer = i; - if (drv->states[i].target_residency <= measured_us) - idx_hit = i; + if (!(drv->states[i].disabled || dev->states_usage[i].disable)){ + if (drv->states[i].target_residency <= sleep_length_us) { + idx_timer = i; + if (drv->states[i].target_residency <= measured_us) + idx_hit = i; + } } } @@ -256,39 +258,25 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, struct cpuidle_state *s = &drv->states[i]; struct cpuidle_state_usage *su = &dev->states_usage[i]; - if (s->disabled || su->disable) { - /* - * If the "early hits" metric of a disabled state is - * greater than the current maximum, it should be taken - * into account, because it would be a mistake to select - * a deeper state with lower "early hits" metric. The - * index cannot be changed to point to it, however, so - * just increase the max count alone and let the index - * still point to a shallower idle state. - */ - if (max_early_idx >= 0 && - count < cpu_data->states[i].early_hits) - count = cpu_data->states[i].early_hits; - - continue; - } - if (idx < 0) - idx = i; /* first enabled state */ + if (!(s->disabled || su->disable)) { + if (idx < 0) + idx = i; /* first enabled state */ - if (s->target_residency > duration_us) - break; + if (s->target_residency > duration_us) + break; - if (s->exit_latency > latency_req && constraint_idx > i) - constraint_idx = i; + if (s->exit_latency > latency_req && constraint_idx > i) + constraint_idx = i; - idx = i; + idx = i; - if (count < cpu_data->states[i].early_hits && - !(tick_nohz_tick_stopped() && - drv->states[i].target_residency < TICK_USEC)) { - count = cpu_data->states[i].early_hits; - max_early_idx = i; + if (count < cpu_data->states[i].early_hits && + !(tick_nohz_tick_stopped() && + drv->states[i].target_residency < TICK_USEC)) { + count = cpu_data->states[i].early_hits; + max_early_idx = i; + } } }