On Mon, Feb 01 2016, Andi Kleen <a...@linux.intel.com> wrote: > On Mon, Feb 01, 2016 at 10:25:17PM +0100, Rasmus Villemoes wrote: >> On Thu, Jan 28 2016, Andi Kleen <a...@firstfloor.org> wrote: >> >> > From: Andi Kleen <a...@linux.intel.com> >> > >> > The menu cpuidle governor does at least two int_sqrt() each time >> > we go into idle in get_typical_interval to compute stddev >> > >> > int_sqrts take 100-120 cycles each. Short idle latency is important >> > for many workloads. >> > >> >> If you want to optimize get_typical_interval(), why not just take the >> square root out of the equation (literally)? >> >> Something like > > Looks good. Yes that's a better fix. >
Thanks. (Is there a good way to tell gcc that avg*avg is actually a 32x32->64 multiplication?) While there and doing the math, I noticed that the variance computation may _theoretically_ overflow (if half the observations are 0, half C, the variance before the division should be around INTERVALS*C^2/4, which is around 2^65 for C=UINT_MAX and INTERVALS=8). I have no idea if it actually matters, but it can be fixed by lowering the initial threshold from UINT_MAX to sqrt(4*U64_MAX/INTERVALS) ~~ 3e9. However, this would make it possible that all observations are larger than the initial threshold, so we'd have to protect against a division by zero... Rasmus