On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote:
> On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> I'm a little skeptical that a single percpu write is going to add much
> measurable overhead to this path. 

So that write is almost guaranteed to be a cacheline miss, those things
hurt and do show up on profiles.

> However, we can certainly adapt
> alternate approaches that stay away from the actual idle code.
> 
> One approach (diff appended) is to just test to see if the PC is
> actually in the architecture-specific halt code.  There are two downsides:
> 
> 1. It requires a small amount of per-architecture support.  I've provided
>    the tile support as an example, since that's what I tested.  I expect
>    x86 is a little more complicated since there are more idle paths and
>    they don't currently run the idle instruction(s) at a fixed address, but
>    it's unlikely to be too complicated on any platform.
>    Still, adding anything per-architecture is certainly a downside.
> 
> 2. As proposed, my new alternate solution only handles the non-polling
>    case, so if you are in the polling loop, we won't benefit from having
>    the NMI backtrace code skip over you.  However my guess is that 99% of
>    the time folks do choose to run the default non-polling mode, so this
>    probably still achieves a pretty reasonable outcome.
> 
> A different approach that would handle downside #2 and probably make it
> easier to implement the architecture-specific code for more complicated
> platforms like x86 would be to use the SCHED_TEXT model and tag all the
> low-level idling functions as CPUIDLE_TEXT.  Then the "are we idling"
> test is just a range compare on the PC against __cpuidle_text_{start,end}.
> 
> We'd have to decide whether to make cpu_idle_poll() non-inline and just
> test for being in that function, or whether we could tag all of
> cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
> whenever the PC is anywhere in that function.  Obviously if we have
> called out to more complicated code (e.g. Daniel's concern about calling
> out to power management code) the PC would no longer be in the CPUIDLE_TEXT
> at that point, so that might be OK too.

But the CPU would also not be idle if its running pm code.

So I like the CPUIDLE_TEXT approach, since it has no impact on the
generated code.

An alternative option could be to inspect the stack, we already take a
stack dump, so you could say that everything that has cpuidle_enter() in
its callchain is an 'idle' cpu.

Yet another option would be to look at rq->idle_state or any other state
cpuidle already tracks. The 'obvious' downside is relying on cpuidle,
which I understand isn't supported by everyone.

Reply via email to