On 2017/7/13 22:53, Peter Zijlstra wrote: > On Thu, Jul 13, 2017 at 10:48:55PM +0800, Li, Aubrey wrote: > >> - totally from arch_cpu_idle_enter entry to arch_cpu_idle_exit return costs >> 9122ns - 15318ns. >> ---- In this period(arch idle), rcu_idle_enter costs 1985ns - 2262ns, >> rcu_idle_exit >> costs 1813ns - 3507ns >> >> Besides RCU, > > So Paul wants more details on where RCU hurts so we can try to fix. > If we can call RCU idle enter/exit after tick is really stopped, instead of call it every idle, I think it's fine. Then we can skip stopping tick if we need fast idle. >> the period includes c-state selection on X86, a few timestamp updates >> and a few computations in menu governor. Also, deep HW-cstate latency can be >> up >> to 100+ microseconds, even if the system is very busy, CPU still has chance >> to enter >> deep cstate, which I guess some outburst workloads are not happy with it. >> >> That's my major concern without a fast idle path. > > Fixing C-state selection by creating an alternative idle path sounds so > very wrong.
This only happens on the arch which has multiple hardware idle cstates, like Intel's processor. As long as we want to support multiple cstates, we have to make a selection(with cost of timestamp update and computation). That's fine in the normal idle path, but if we want a fast idle switch, we can make a tradeoff to use a low-latency one directly, that's why I proposed a fast idle path, so that we don't need to mix fast idle condition judgement in both idle entry and idle exit path. Thanks, -Aubrey