All, Recall the extensive tests and work done between 2018.10.11 and 2019.01.16 on Rafael's teo governor, versions 1 through 11, noting that versions 9, 10, and 11 did not contain functional changes.
Hi Rafael, If the deepest idle state is disabled, the system can become somewhat unstable, with anywhere between no problem at all, to the occasional temporary jump using a lot more power for a few seconds, to a permanent jump using a lot more power continuously. I have been unable to isolate the exact test load conditions under which this will occur. However, temporarily disabling and then enabling other idle states seems to make for a somewhat repeatable test. It is important to note that the issue occurs with only ever disabling the deepest idle state, just not reliably. I want to know how you want to proceed before I do a bunch of regression testing. On 2018.12.11 03:50 Rafael J. Wysocki wrote: > v7 -> v8: > * Apply the selection rules to the idle deepest state as well as to > the shallower ones (the deepest idle state was treated differently > before by mistake). > * Subtract 1/2 of the exit latency from the measured idle duration > in teo_update() (instead of subtracting the entire exit latency). > This makes the idle state selection be slightly more performance- > oriented. I have isolated the issue to a subset of the v7 to v8 changes, however it was not the exit latency changes. The partial revert to V7 changes I made were (on top of 5.3): diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index 7d05efd..d2892bb 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -283,9 +283,28 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx < 0) idx = i; /* first enabled state */ - if (s->target_residency > duration_us) - break; + if (s->target_residency > duration_us) { + /* + * If the "hits" metric of the state matching the sleep + * length is greater than its "misses" metric, that is + * the one to use. + */ + if (cpu_data->states[idx].hits > cpu_data->states[idx].misses) + break; + /* + * It is more likely that one of the shallower states + * will match the idle duration measured after wakeup, + * so take the one with the maximum "early hits" metric, + * but if that cannot be determined, just use the state + * selected so far. + */ + if (max_early_idx >= 0) { + idx = max_early_idx; + duration_us = drv->states[idx].target_residency; + } + break; + } if (s->exit_latency > latency_req) { /* * If we break out of the loop for latency reasons, use @@ -294,7 +313,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * as long as that target residency is low enough. */ duration_us = drv->states[idx].target_residency; - goto refine; + break; } idx = i; @@ -307,21 +326,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, } } - /* - * If the "hits" metric of the idle state matching the sleep length is - * greater than its "misses" metric, that is the one to use. Otherwise, - * it is more likely that one of the shallower states will match the - * idle duration observed after wakeup, so take the one with the maximum - * "early hits" metric, but if that cannot be determined, just use the - * state selected so far. - */ - if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses && - max_early_idx >= 0) { - idx = max_early_idx; - duration_us = drv->states[idx].target_residency; - } - -refine: if (idx < 0) { idx = 0; /* No states enabled. Must use 0. */ } else if (idx > 0) { Test results (note: I use my own monitoring utility, because it feeds my graphing/html utility, but have used turbostat here in case others want to repeat the test): Processor: i7-2600K Deepest idle state: 4 (C6) The system is mostly idle for these tests. turbostat command used: sudo turbostat --Summary --quiet --hide IRQ,Avg_MHz,SMI,GFXMHz,TSC_MHz,GFXWatt,CorWatt,CPU%c1,CPU%c7,CoreTmp,GFX%rc6,Pkg%pc2,Pkg%pc3,Pkg%pc6,C1,C1E,,C1%,C1E%,C3%,C6% --interval 3 Only because it works so reliably, the test involves disabling all idle states deeper than 0, then enabling all idle states shallower than the deepest. I have attempted to minimize the number of display columns, hopefully without hiding useful information. Kernel 5.3: Busy% Bzy_MHz POLL C3 C6 POLL% CPU%c3 CPU%c6 PkgTmp PkgWatt 0.04 1600 0 0 232 0.00 0.00 99.85 28 3.73 0.05 1600 0 3 254 0.00 0.00 99.82 28 3.73 0.04 1600 0 1 264 0.00 0.00 99.82 30 3.73 61.08 3493 14735 16 258 60.84 0.04 38.57 45 31.00 100.00 3500 24057 0 0 99.88 0.00 0.00 46 48.73 100.00 3500 24064 0 0 99.86 0.00 0.00 48 48.87 100.00 3500 24057 0 0 99.88 0.00 0.00 49 49.07 100.00 3500 24070 0 0 99.87 0.00 0.00 50 49.21 100.00 3500 24068 0 0 99.87 0.00 0.00 50 49.29 100.00 3500 24063 0 0 99.88 0.00 0.00 52 49.42 100.00 3500 24060 0 0 99.87 0.00 0.00 53 49.58 100.00 3500 24076 0 0 99.87 0.00 0.00 53 49.70 100.00 3500 24067 0 0 99.87 0.00 0.00 55 49.77 28.89 3575 15138 261 0 28.71 59.97 0.00 48 27.93 10.52 3666 11884 379 0 10.43 75.24 0.00 47 20.72 10.80 3680 12141 558 0 10.65 75.13 0.00 46 21.11 11.51 3742 13016 217 0 11.43 74.93 0.00 47 22.48 10.84 3696 12254 294 0 10.76 74.89 0.00 48 21.26 10.34 3652 11683 243 0 10.25 74.93 0.00 47 20.36 10.67 3681 12065 240 0 10.59 74.94 0.00 48 20.96 10.68 3671 12067 278 0 10.59 74.92 0.00 47 20.85 10.84 3690 12259 236 0 10.76 74.94 0.00 47 21.24 11.51 3735 13023 265 0 11.43 74.93 0.00 48 22.43 Notice how once idle state 3 is enabled again (as were states 1 and 2) and we should be at about 5 watts, there is still a ton of time spent in idle state 0 (POLL). Kernel 5.3 + above patch: Busy% Bzy_MHz POLL C3 C6 POLL% CPU%c3 CPU%c6 PkgTmp PkgWatt 0.04 1600 0 3 235 0.00 0.00 99.82 26 3.71 0.04 1600 0 1 252 0.00 0.00 99.79 26 3.71 0.05 1600 0 3 273 0.00 0.00 99.79 26 3.72 0.04 1600 0 3 224 0.00 0.00 99.83 26 3.71 0.05 1600 0 1 252 0.00 0.00 99.81 25 3.71 0.04 1600 0 1 231 0.00 0.00 99.84 26 3.71 0.05 1600 0 3 294 0.00 0.00 99.77 28 3.71 0.04 1600 0 2 259 0.00 0.00 99.81 25 3.70 69.61 3490 16773 15 326 69.33 0.14 29.98 42 34.41 100.00 3500 24079 0 0 99.88 0.00 0.00 45 48.27 100.00 3500 24062 0 0 99.89 0.00 0.00 45 48.41 100.00 3500 24056 0 0 99.88 0.00 0.00 46 48.52 100.00 3500 24064 0 0 99.89 0.00 0.00 47 48.63 27.31 3498 6588 340 0 27.16 72.40 0.00 32 17.13 0.03 1600 1 230 0 0.00 99.91 0.00 30 5.08 0.03 1600 2 234 0 0.00 99.90 0.00 31 5.07 0.03 1600 5 238 0 0.00 99.91 0.00 30 5.07 0.03 1600 3 230 0 0.00 99.91 0.00 30 5.05 0.03 1600 3 246 0 0.00 99.91 0.00 29 5.05 0.03 1600 3 235 0 0.00 99.90 0.00 30 5.04 Notice how once idle state 3 is enabled again (as were states 1 and 2) almost all time is spent there, as expected, and the power is the expected 5 watts. ... Doug