On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: > On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >> > >> So seeing how you're from @intel.com I'm assuming you're using x86 here. > >> > >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >> just fine, which means we'll fall out of the cpuidle_enter(), which > >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > >> > >> It will indeed not leave the cpu_idle_loop() function and go right back > >> into cpuidle_idle_call(), but that will then call cpuidle_select() which > >> should pick a new C state. > >> > >> So the interrupt _should_ work. If it doesn't you need to explain why. > > > > I think the issue is related to the poll_idle state, in > > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > > cpuidle table as the state 0 (POLL). There is no mwait for this state. > > It is a bit confusing because this state is not listed in the acpi / > > intel idle driver but inserted implicitly at the beginning of the idle > > table by the cpuidle framework when the driver is registered. > > > > static int poll_idle(struct cpuidle_device *dev, > > struct cpuidle_driver *drv, int index) > > { > > local_irq_enable(); > > if (!current_set_polling_and_test()) { > > while (!need_resched()) > > cpu_relax(); > > } > > current_clr_polling(); > > > > return index; > > } > > As the most recent person to have modified this function, and as an > avowed hater of pointless IPIs, let me ask a rather different question: > why are you sending IPIs at all? As of Linux 3.16, poll_idle actually > supports the polling idle interface :) > > Can't you just do: > > if (set_nr_if_polling(rq->idle)) { > trace_sched_wake_idle_without_ipi(cpu); > } else { > spin_lock_irqsave(&rq->lock, flags); > if (rq->curr == rq->idle) > smp_send_reschedule(cpu); > // else the CPU wasn't idle; nothing to do > raw_spin_unlock_irqrestore(&rq->lock, flags); > } > > In the common case (wake from C0, i.e. polling idle), this will skip the > IPI entirely unless you race with idle entry/exit, saving a few more > precious electrons and all of the latency involved in poking the APIC > registers.
They could and they probably should, but that logic should _not_ live in the cpuidle driver. And as stated elsewhere in the thread; they also need to fix their kick_all_cpus_sync() usage, because that's similarly wrecked.
pgp2ZW6wDFZ7U.pgp
Description: PGP signature