On Wed, Aug 13, 2014 at 08:55:29PM +0200, Peter Zijlstra wrote: > On Wed, Aug 13, 2014 at 11:20:30AM -0700, Paul E. McKenney wrote: > > cpuidle_idle_call() does a fastpath irq-enable/exit if need-resched, > > then does stop_critical_timings() and rcu_idle_enter(). Then we > > have the buried complexity with cpuidle_select(), but a negative > > return says to check need-resched and enable interrupts or to > > invoke arch_cpu_idle(), which executes various sleep instructions > > on various architectures. Some notable variants: > > > And various other architectures seem to work similarly, but lots of > > hair here. So Steven, you OK with the underlying arch_cpu_idle() > > functions being off-limits to tracing? > > I didn't find anything particularly hairy in the arch_cpu_idle() > implementations, lots of simple 'go sleep' or 'spin' like things.
"Hairy" in terms of lots of assembly, in many cases apparently tightly coupled to a given SoC. > > Now, if cpuidle_select() returns non-negative, we are dealing with > > the CPU-idle governor, which is invoked at the later cpuidle_enter(). > > > > Hmmm... On the CPU-idle drivers... > > > > o apm_idle_driver puts the idle loop into the ->enter() function, > > apm_cpu_idle(). > > Yes, this one is creative. The best I came up with is > adding CPUIDLE_FLAG_RCU_IDLE which indicates that the driver will do the > rcu_idle calls and place them in apm_do_idle() around the > apm_bios_call_simple() thing. > > Now, that apm_bios_call_simple() thing uses on_cpu0(), which schedules > work on cpu0, which to me seems to guarantee this won't be used on any > SMP system, because that simply _cannot_ work for idle. > > And on UP its a few more function calls, we could sprinkle some > __always_inline()s around if we really care I suppose. Heh, I missed the UP-only pieces. > > o ACPI puts the idle loop in acpi_idle_do_entry(), and does call > > stop_critical_timings(), but not rcu_idle_enter(). > > So presumably stop_critical_timings() can nest? Not clear > > from the code. > > Yeah, so I'm not sure I see that they nest properly.. > > Still ACPI does a lot of weird crap in the busmaster idle function, > again I'd suggest that CPUIDLE_FLAG_RCU_IDLE which would let the driver > do rcu_idle itself, and place it in appropriate sites. > > Not too hard I think in this case. My main concern is avoiding situations where the driver manages to loop without passing through the rcu_idle_enter() and rcu_idle_exit(). In case someone manages to misconfigure things so that the driver just endlessly shuttles among states without actually ever really going idle. > > o The CPS driver is even stranger... Is cps_gen_entry_code() > > really depositing assembly instructions into a buffer that is > > passed back as a function? > > I had not yet looked at this one; its got that cpu_pm_{enter,exit}() > thing going.. we could do the same and place the manual RCU_IDLE around > cps_pm_enter_state() Again, as long as it avoids loops that don't include code under rcu_idle_enter(). > > o The intel_idle driver is the one with mwait_idle_with_hints(), > > so you covered it below. > > Yeah, fairly straight fwd driver that, _lots_ saner than the ACPI one. Now -that- is damning with faint praise! ;-) > > Your patch covers the cpuidle_enter() transition, which means > > that functions like cpuidle_enter(), acpi_idle_enter_c1(), and > > acpi_idle_do_entry() would be off-limits to trampolining. In the case > > of CPS, quite a bit of code. > > So I think we can do this; sure lots of code, but typically 'simpler' > than RCU stuff. For some definition of "simpler". ;-) > > > We should push the rcu_idle_{enter,exit}() down to around > > > mwait_idle_with_hints(), so we don't call half the word with RCU > > > disabled. > > > > That would be for the intel_idle.c CPU-idle driver. The other drivers > > also need rcu_idle_{enter,exit}(). > > Right, so simple drivers can use the generic rcu_idle bits from > kernel/sched/idle.c and difficult drivers can use CPUIDLE_FLAG_RCU_IDLE > and do some manual cleverness. OK, but in this case the relevant definition of "simple" is "never will need a trampoline". Steven, thoughts? > > > > I have already said that I will be happy to rip out the wakeup code > > > > when it is no longer needed, and I agree that it would be way better if > > > > not needed. > > > > > > I'd prefer to dtrt now and not needing to fix it later. > > > > Once it works, I might consider it "right" and adjust accordingly. > > At the moment, speculation. > > I think its simpler than doing RCU, maybe a little more work, but hey, > I'm the idiot that does full arch/ sweeps on a semi regular basis. Hmmm... Exactly what are you thinking could be enabled by this proposed change to the idle code? From what I can see at the moment, it would allow me to drop the schedule-on-holdout-idle code and allow synchronize_sched() to be substituted on !CONFIG_PREEMPT kernels. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/