On 7/29/2013 7:14 AM, Lorenzo Pieralisi wrote:


btw this is largely a misunderstanding;
tasks are not the issue; tasks use timers and those are perfectly predictable.
It's interrupts that are not and the heuristics are for that.

Now, if your hardware does the really-bad-for-power wake-all on any interrupt,
then the menu governor logic is not good for you; rather than looking at the 
next
timer on the current cpu you need to look at the earliest timer on the set of 
bundled
cpus as the upper bound of the next wake event.

Yes, that's true and we have to look into this properly, but certainly
a wake-up for a CPU in a package C-state is not beneficial to x86 CPUs either,
or I am missing something ?

a CPU core isn't in a package C state, the system is.
(in a core C state the whole core is already powered down completely; a package 
C state
just also turns off the memory controller/etc)

package C states are global on x86 (not just per package); there's nothing one
can do there in terms of grouping/etc.


Even if the wake-up interrupts just power up one of the CPUs in a package
and leave other(s) alone, all HW state shared (ie caches) by those CPUs must
be turned on. What I am asking is: this bundled next event is a concept
that should apply to x86 CPUs too, or it is entirely managed in FW/HW
and the kernel just should not care ?

on Intel x86 cpus, there's not really bundled concept. or rather, there is only 
1 bundle
(which amounts to the same thing).
Yes in a multi-package setup there are some cache power effects... but there's
not a lot one can do there.
The other cores don't wake up, so they still make their own correct decisions.

I still do not understand how this "bundled" next event is managed on
x86 with the menu governor, or better why it is not managed at all, given
the importance of package C-states.

package C states on x86 are basically OS invisible. The OS manages core level C 
states,
the hardware manages the rest.
The bundle part hurts you on a "one wakes all" system,
not because of package level power effects, but because others wake up 
prematurely
(compared to what they expected) which causes them to think future wakups will 
also
be earlier. All because they get the "what is the next known event" wrong,
and start correcting for known events instead of only for 'unpredictable' 
interrupts.
Things will go very wonky if you do that for sure.
(I've seen various simulation data on that, and the menu governor indeed acts 
quite poorly
for that)

And maybe even more special casing is needed... but I doubt it.

I lost you here, can you elaborate pls ?

well.. just looking at the earliest timer might not be enough; that timer might 
be on a different
core that's still active, and may change after the current cpu has gone into an 
idle state.
Fun.
Coupled C states on this level are a PAIN in many ways, and tend to totally 
suck for power
due to this and the general "too much is active" reasons.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to