On 2017/7/18 3:23, Peter Zijlstra wrote: > On Fri, Jul 14, 2017 at 09:26:19AM -0700, Andi Kleen wrote: >>> And as said; Daniel has been working on a better predictor -- now he's >>> probably not used it on the network workload you're looking at, so that >>> might be something to consider. >> >> Deriving a better idle predictor is a bit orthogonal to fast idle. > > No. If you want a different C state selected we need to fix the current > C state selector. We're not going to tinker. > > And the predictor is probably the most fundamental part of the whole C > state selection logic. > > Now I think the problem is that the current predictor goes for an > average idle duration. This means that we, on average, get it wrong 50% > of the time. For performance that's bad. > > If you want to improve the worst case, we need to consider a cumulative > distribution function, and staying with the Gaussian assumption already > present, that would mean using: > > 1 x - mu > CDF(x) = - [ 1 + erf(-------------) ] > 2 sigma sqrt(2) > > Where, per the normal convention mu is the average and sigma^2 the > variance. See also: > > https://en.wikipedia.org/wiki/Normal_distribution > > We then solve CDF(x) = n% to find the x for which we get it wrong n% of > the time (IIRC something like: 'mu - 2sigma' ends up being 5% or so). > > This conceptually gets us better exit latency for the cases where we got > it wrong before, and practically pushes down the estimate which gets us > C1 longer. > > Of course, this all assumes a Gaussian distribution to begin with, if we > get bimodal (or worse) distributions we can still get it wrong. To fix > that, we'd need to do something better than what we currently have. > Maybe you are talking about applying some machine learning algorithm online to fit a multivariate normal distribution, :)
Well, back to the problem, when the scheduler picks up idle thread, it does not look at the history, nor make the prediction. So it's possible it has to switch back a task ASAP when it's going into idle(very common under some workloads). That is, (idle_entry + idle_exit) > idle. If the system has multiple hardware idle states, then: (idle_entry + idle_exit + HW_entry + HW_exit) > HW_sleep So we eventually want the idle path lighter than what we currently have. A complex predictor may have high accuracy, but the cost could be high as well. We need a tradeoff here IMHO. I'll check Daniel's work to understand how/if it's better than menu governor. Thanks, -Aubrey