On Thu, Aug 14, 2014 at 02:12:27PM -0700, Andy Lutomirski wrote:
> On 08/14/2014 04:14 AM, Daniel Lezcano wrote:
> > On 08/14/2014 01:00 PM, Peter Zijlstra wrote:
> >>
> >> So seeing how you're from @intel.com I'm assuming you're using x86 here.
> >>
> >> I'm not seeing how this can be possible, MWAIT is interrupted by IPIs
> >> just fine, which means we'll fall out of the cpuidle_enter(), which
> >> means we'll cpuidle_reflect(), and then leave cpuidle_idle_call().
> >>
> >> It will indeed not leave the cpu_idle_loop() function and go right back
> >> into cpuidle_idle_call(), but that will then call cpuidle_select() which
> >> should pick a new C state.
> >>
> >> So the interrupt _should_ work. If it doesn't you need to explain why.
> > 
> > I think the issue is related to the poll_idle state, in
> > drivers/cpuidle/driver.c. This state is x86 specific and inserted in the
> > cpuidle table as the state 0 (POLL). There is no mwait for this state.
> > It is a bit confusing because this state is not listed in the acpi /
> > intel idle driver but inserted implicitly at the beginning of the idle
> > table by the cpuidle framework when the driver is registered.
> > 
> > static int poll_idle(struct cpuidle_device *dev,
> >                 struct cpuidle_driver *drv, int index)
> > {
> >         local_irq_enable();
> >         if (!current_set_polling_and_test()) {
> >                 while (!need_resched())
> >                         cpu_relax();
> >         }
> >         current_clr_polling();
> > 
> >         return index;
> > }
> 
> As the most recent person to have modified this function, and as an
> avowed hater of pointless IPIs, let me ask a rather different question:
> why are you sending IPIs at all?  As of Linux 3.16, poll_idle actually
> supports the polling idle interface :)
> 
> Can't you just do:
> 
> if (set_nr_if_polling(rq->idle)) {
>       trace_sched_wake_idle_without_ipi(cpu);
> } else {
>       spin_lock_irqsave(&rq->lock, flags);
>       if (rq->curr == rq->idle)
>               smp_send_reschedule(cpu);
>       // else the CPU wasn't idle; nothing to do
>       raw_spin_unlock_irqrestore(&rq->lock, flags);
> }
> 
> In the common case (wake from C0, i.e. polling idle), this will skip the
> IPI entirely unless you race with idle entry/exit, saving a few more
> precious electrons and all of the latency involved in poking the APIC
> registers.

They could and they probably should, but that logic should _not_ live in
the cpuidle driver.

And as stated elsewhere in the thread; they also need to fix their
kick_all_cpus_sync() usage, because that's similarly wrecked.

Attachment: pgp2ZW6wDFZ7U.pgp
Description: PGP signature

Reply via email to