On Thu, Aug 14, 2014 at 6:21 PM, Liu, Chuansheng <chuansheng....@intel.com> wrote: > > >> -----Original Message----- >> From: Andy Lutomirski [mailto:l...@amacapital.net] >> Sent: Friday, August 15, 2014 5:23 AM >> To: Peter Zijlstra >> Cc: Daniel Lezcano; Liu, Chuansheng; Rafael J. Wysocki; >> linux...@vger.kernel.org; LKML; Liu, Changcheng; Wang, Xiaoming; >> Chakravarty, Souvik K >> Subject: Re: [PATCH] cpuidle: Fix the CPU stuck at C0 for 2-3s after PM_QOS >> back to DEFAULT >> >> On Thu, Aug 14, 2014 at 2:16 PM, Peter Zijlstra <pet...@infradead.org> >> wrote: >> > On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote: >> >> On 08/14/2014 04:14 AM, Daniel Lezcano wrote: >> >> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote: >> >> >> >> >> >> So seeing how you're from @intel.com I'm assuming you're using x86 >> here. >> >> >> >> >> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs >> >> >> just fine, which means we'll fall out of the cpuidle_enter(), which >> >> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call(). >> >> >> >> >> >> It will indeed not leave the cpu_idle_loop() function and go right back >> >> >> into cpuidle_idle_call(), but that will then call cpuidle_select() >> >> >> which >> >> >> should pick a new C state. >> >> >> >> >> >> So the interrupt _should_ work. If it doesn't you need to explain why. >> >> > >> >> > I think the issue is related to the poll_idle state, in >> >> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the >> >> > cpuidle table as the state 0 (POLL). There is no mwait for this state. >> >> > It is a bit confusing because this state is not listed in the acpi / >> >> > intel idle driver but inserted implicitly at the beginning of the idle >> >> > table by the cpuidle framework when the driver is registered. >> >> > >> >> > static int poll_idle(struct cpuidle_device *dev, >> >> > struct cpuidle_driver *drv, int index) >> >> > { >> >> > local_irq_enable(); >> >> > if (!current_set_polling_and_test()) { >> >> > while (!need_resched()) >> >> > cpu_relax(); >> >> > } >> >> > current_clr_polling(); >> >> > >> >> > return index; >> >> > } >> >> >> >> As the most recent person to have modified this function, and as an >> >> avowed hater of pointless IPIs, let me ask a rather different question: >> >> why are you sending IPIs at all? As of Linux 3.16, poll_idle actually >> >> supports the polling idle interface :) >> >> >> >> Can't you just do: >> >> >> >> if (set_nr_if_polling(rq->idle)) { >> >> trace_sched_wake_idle_without_ipi(cpu); >> >> } else { >> >> spin_lock_irqsave(&rq->lock, flags); >> >> if (rq->curr == rq->idle) >> >> smp_send_reschedule(cpu); >> >> // else the CPU wasn't idle; nothing to do >> >> raw_spin_unlock_irqrestore(&rq->lock, flags); >> >> } >> >> >> >> In the common case (wake from C0, i.e. polling idle), this will skip the >> >> IPI entirely unless you race with idle entry/exit, saving a few more >> >> precious electrons and all of the latency involved in poking the APIC >> >> registers. >> > >> > They could and they probably should, but that logic should _not_ live in >> > the cpuidle driver. >> >> Sure. My point is that fixing the IPI handler is, I think, totally >> bogus, because the IPI API isn't the right way to do this at all. >> >> It would be straightforward to add a new function wake_if_idle(int >> cpu) to sched/core.c. >> > Thanks Andy and Peter's suggestion, it will save some IPI things in case the > cores are not > in idle.
This isn't quite right. Using the polling interface correctly will save IPIs in case the core *is* idle. But, given that you are trying to upgrade the chosen idle state, I don't think you need to kick non-idle CPUs at all, and my example contains that optimization. Presumably the function should be named something like wake_up_if_idle. > > There is one similar API in sched/core.c wake_up_idle_cpu(), > then just need add one new common smp API: > > smp_wake_up_cpus() { > for_each_online_cpu() > wake_up_idle_cpu(); > } > > Will try one patch for it. This will have lots of extra overhead if the cpu is *not* idle. I think my example will be a lot more efficient. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/