> -----Original Message----- > From: Andy Lutomirski [mailto:l...@amacapital.net] > Sent: Friday, August 15, 2014 5:23 AM > To: Peter Zijlstra > Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki; > linux...@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming; > Chakravarty, Souvik K > Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS > back to DEFAULT > > On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <pet...@infradead.org> > wrote: > > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: > >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: > >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: > >> >> > >> >> So seeing how you're from @intel.com I'm assuming you're using x86 > here. > >> >> > >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs > >> >> just fine, which means we'll fall out of the cpuidle_enter(), which > >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). > >> >> > >> >> It will indeed not leave the cpu_idle_loop() function and go right back > >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which > >> >> should pick a new C state. > >> >> > >> >> So the interrupt _should_ work. If it doesn't you need to explain why. > >> > > >> > I think the issue is related to the poll_idle state, in > >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the > >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. > >> > It is a bit confusing because this state is not listed in the acpi / > >> > intel idle driver but inserted implicitly at the beginning of the idle > >> > table by the cpuidle framework when the driver is registered. > >> > > >> > static int poll_idle(struct cpuidle_device *dev, > >> > struct cpuidle_driver *drv, int index) > >> > { > >> > local_irq_enable(); > >> > if (!current_set_polling_and_test()) { > >> > while (!need_resched()) > >> > cpu_relax(); > >> > } > >> > current_clr_polling(); > >> > > >> > return index; > >> > } > >> > >> As the most recent person to have modified this function, and as an > >> avowed hater of pointless IPIs, let me ask a rather different question: > >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually > >> supports the polling idle interface :) > >> > >> Can't you just do: > >> > >> if (set_nr_if_polling(rq->idle)) { > >> trace_sched_wake_idle_without_ipi(cpu); > >> } else { > >> spin_lock_irqsave(&rq->lock, flags); > >> if (rq->curr == rq->idle) > >> smp_send_reschedule(cpu); > >> // else the CPU wasn't idle; nothing to do > >> raw_spin_unlock_irqrestore(&rq->lock, flags); > >> } > >> > >> In the common case (wake from C0, i.e. polling idle), this will skip the > >> IPI entirely unless you race with idle entry/exit, saving a few more > >> precious electrons and all of the latency involved in poking the APIC > >> registers. > > > > They could and they probably should, but that logic should _not_ live in > > the cpuidle driver. > > Sure. My point is that fixing the IPI handler is, I think, totally > bogus, because the IPI API isn't the right way to do this at all. > > It would be straightforward to add a new function wake_if_idle(int > cpu) to sched/core.c. > Thanks Andy and Peter's suggestion, it will save some IPI things in case the cores are not in idle.
There is one similar API in sched/core.c wake_up_idle_cpu(), then just need add one new common smp API: smp_wake_up_cpus() { for_each_online_cpu() wake_up_idle_cpu(); } Will try one patch for it. N�����r��y����b�X��ǧv�^�){.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a��� 0��h���i