Re: [RFC][PATCH 0/7] Power-aware scheduling v2

2013-10-15 Thread Morten Rasmussen
On Mon, Oct 14, 2013 at 06:31:13PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 14, 2013 at 06:15:41PM +0100, Morten Rasmussen wrote:
> > > In fact, I don't see anything except a random bunch of hooks without an
> > > over-all picture of how to get less power used.
> > 
> > I will follow up with a better description of the overall picture. The
> > slides I linked to are not really self-explaining.
> 
> I hadn't even noticed there were slides linked. In general I tend to
> ignore external links -- patches should be descriptive enough to stand
> on their own.

Elaborating a bit more on the big picture and where we can go with this
proposal, here are the main requirements:

1. A unified scheduler driven power policy, i.e. the scheduler drives
DVFS/idle (as suggested by Ingo and hence this first set of patches).

2. Small task packing. Avoid spreading tasks under light workloads.

In addition for big.LITTLE we need:

3. Task placement based cpu suitability. Associate task load ranges with
each cpu to give task placement. Heavy tasks on big, small tasks on
little.

This patch set addresses part of 1, while 3 will follow soon. Point 2 is
worked on by Vincent in collaboration with us.

The power driver introduced in this set has a role in the solution to
all three points. It serves as a unified platform power driver and the
interface allows the scheduler to get highlevel feedback which cpufreq
and cpuidle do not currently provide.

Decisions about the power/performance trade-off will be made in the
power driver guided by the hints from the scheduler. That allow
platforms the freedom to do what they want with the hints including
ignoring them completely (taking Arjan's previous comments into
account). It will make the power driver much powerful than the current
cpufreq/cpuidle drivers.

Morten 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Power-aware scheduling v2

2013-10-15 Thread Preeti U Murthy
Hi,

On 10/14/2013 07:02 PM, Peter Zijlstra wrote:
> On Fri, Oct 11, 2013 at 06:19:10PM +0100, Morten Rasmussen wrote:
>> Hi,
>>
>> I have revised the previous power scheduler proposal[1] trying to address as
>> many of the comments as possible. The overall idea was discussed at LPC[2,3].
>> The revised design has removed the power scheduler and replaced it with a 
>> high
>> level power driver interface. An interface that allows the scheduler to query
>> the power driver for information and provide hints to guide power management
>> decisions in the power driver.
>>
>> The power driver is going to be a unified platform power driver that can
>> replace cpufreq and cpuidle drivers. Generic power policies will be optional
>> helper functions called from the power driver. Platforms may choose to
>> implement their own policies as part of their power driver.
>>
>> This RFC series prototypes a part of the power driver interface (cpu capacity
>> hints) and shows how they can be used from the scheduler. More extensive use 
>> of
>> the power driver hints and queries is left for later. The focus for now is 
>> the
>> power driver interface. The patch series includes a power driver/cpufreq
>> governor that can use existing cpufreq drivers as backend. It has been tested
>> (not thoroughly) on ARM TC2. The cpufreq governor power driver implementation
>> is rather horrible, but it illustrates how the power driver interface can be
>> used. Native power drivers is on the todo list.
>>
>> The power driver interface is still missing quite a few calls to handle: 
>> Idle,
>> adding extra information to the sched_domain hierarchy to guide scheduling
>> decisions (packing), and possibly scaling of tracked load to compensate for
>> frequency changes and asymmetric systems (big.LITTLE).
>>
>> This set is based on 3.11. I have done ARM TC2 testing based on linux-linaro
>> 2013.08[4] to get cpufreq support for TC2.
> 
> What I'm missing is a general overview of why what and how.

I agree that the "why" needs to be mentioned very clearly since the
patchset revolves around it. As far as I understand we need a single
controller for deciding the power efficiency of the kernel, who is
exposed to all the user policies and the frequency+idle states stats of
the CPU to begin with. These stats are being supplied by the power driver.

Having these details and decision making in multiple places like we do
today in cpuidle, cpu-frequency and scheduler will probably cause
problems. For example, when the power efficiency of the kernel goes
wrong we have trouble point out the reason behind it. Where did the
problem arise from among the above three power policy decision makers?
This is a maintainability concern.
   Another reason is the power saving decisions made by say cpuidle may
not complement the power saving decisions made by cpufreq. This can lead
to inconsistent results across different workloads.

Thus having a single policy maker for power savings we are hoping to
solve the primary concerns of consistent behaviour from the kernel in
terms of power efficiency and improved maintainability.

> 
> In particular; how does this proposal lead to power savings. Is there a
> mathematical model that supports this framework? Something where if you
> give it a task set with global utilisation < 1 (ie. there's idle time),
> it results in less power used.

AFAIK, this patchset is an attempt to achieve consistency in the power
efficiency of the kernel across workloads with the existing algorithms,
in addition to a cleanup involving integration of the power policy
making in one place as explained above. In an attempt to do so, *maybe*
better power numbers can be obtained or at-least the default power
efficiency of the kernel will show up.

However adding the new patchsets like packing small tasks, heterogeneous
scheduling, power aware scheduling etc.. *should* then yield good and
consistent power savings since they now stand on top of an integrated
stable power driver.

Regards
Preeti U Murthy
> 
> Also, how does this proposal deal with cpufreq's fundamental broken
> approach to SMP? Afaict nothing considers the effect of one cpu upon
> another -- something which isn't true at all.
> 
> In fact, I don't see anything except a random bunch of hooks without an
> over-all picture of how to get less power used.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Power-aware scheduling v2

2013-10-14 Thread Peter Zijlstra
On Mon, Oct 14, 2013 at 06:15:41PM +0100, Morten Rasmussen wrote:
> > In fact, I don't see anything except a random bunch of hooks without an
> > over-all picture of how to get less power used.
> 
> I will follow up with a better description of the overall picture. The
> slides I linked to are not really self-explaining.

I hadn't even noticed there were slides linked. In general I tend to
ignore external links -- patches should be descriptive enough to stand
on their own.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Power-aware scheduling v2

2013-10-14 Thread Morten Rasmussen
On Mon, Oct 14, 2013 at 02:32:34PM +0100, Peter Zijlstra wrote:
> On Fri, Oct 11, 2013 at 06:19:10PM +0100, Morten Rasmussen wrote:
> > Hi,
> > 
> > I have revised the previous power scheduler proposal[1] trying to address as
> > many of the comments as possible. The overall idea was discussed at 
> > LPC[2,3].
> > The revised design has removed the power scheduler and replaced it with a 
> > high
> > level power driver interface. An interface that allows the scheduler to 
> > query
> > the power driver for information and provide hints to guide power management
> > decisions in the power driver.
> > 
> > The power driver is going to be a unified platform power driver that can
> > replace cpufreq and cpuidle drivers. Generic power policies will be optional
> > helper functions called from the power driver. Platforms may choose to
> > implement their own policies as part of their power driver.
> > 
> > This RFC series prototypes a part of the power driver interface (cpu 
> > capacity
> > hints) and shows how they can be used from the scheduler. More extensive 
> > use of
> > the power driver hints and queries is left for later. The focus for now is 
> > the
> > power driver interface. The patch series includes a power driver/cpufreq
> > governor that can use existing cpufreq drivers as backend. It has been 
> > tested
> > (not thoroughly) on ARM TC2. The cpufreq governor power driver 
> > implementation
> > is rather horrible, but it illustrates how the power driver interface can be
> > used. Native power drivers is on the todo list.
> > 
> > The power driver interface is still missing quite a few calls to handle: 
> > Idle,
> > adding extra information to the sched_domain hierarchy to guide scheduling
> > decisions (packing), and possibly scaling of tracked load to compensate for
> > frequency changes and asymmetric systems (big.LITTLE).
> > 
> > This set is based on 3.11. I have done ARM TC2 testing based on linux-linaro
> > 2013.08[4] to get cpufreq support for TC2.
> 
> What I'm missing is a general overview of why what and how.
> 
> In particular; how does this proposal lead to power savings. Is there a
> mathematical model that supports this framework? Something where if you
> give it a task set with global utilisation < 1 (ie. there's idle time),
> it results in less power used.

It is not there yet. What I'm proposing here is just the interface with
some very simple examples of how they can be used. It is not the
complete set of hooks, but a starting point. In the first round of
discussions it was clear that it is quite important to find an interface
that can work for everyone. To find a good power optimization model we
first need to know what information the platform (represented by the
power driver) can provide.

With guidance from the power driver about what level performance we can
expect from a cpu, and possibly also the power cost, we can make better
load-balancing decisions. We can add task packing support and let the
platform decide the degree of packing that meets the power/performance
target.

> 
> Also, how does this proposal deal with cpufreq's fundamental broken
> approach to SMP? Afaict nothing considers the effect of one cpu upon
> another -- something which isn't true at all.

If you are referring to not doing anything clever with the affected cpus
in the power driver then yes. I doesn't do anything clever at the
moment. However, the go_faster/slower() interface would allow the power
driver to refuse increasing the frequency if the power cost can't be
justified for some reason. Using the power driver interface the
scheduler will know this and be able to try a different balance. Spread
load to other cpus in the same frequency domain that are less loaded, if
possible.

> 
> In fact, I don't see anything except a random bunch of hooks without an
> over-all picture of how to get less power used.

I will follow up with a better description of the overall picture. The
slides I linked to are not really self-explaining.

Thanks,
Morten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Power-aware scheduling v2

2013-10-14 Thread Peter Zijlstra
On Fri, Oct 11, 2013 at 06:19:10PM +0100, Morten Rasmussen wrote:
> Hi,
> 
> I have revised the previous power scheduler proposal[1] trying to address as
> many of the comments as possible. The overall idea was discussed at LPC[2,3].
> The revised design has removed the power scheduler and replaced it with a high
> level power driver interface. An interface that allows the scheduler to query
> the power driver for information and provide hints to guide power management
> decisions in the power driver.
> 
> The power driver is going to be a unified platform power driver that can
> replace cpufreq and cpuidle drivers. Generic power policies will be optional
> helper functions called from the power driver. Platforms may choose to
> implement their own policies as part of their power driver.
> 
> This RFC series prototypes a part of the power driver interface (cpu capacity
> hints) and shows how they can be used from the scheduler. More extensive use 
> of
> the power driver hints and queries is left for later. The focus for now is the
> power driver interface. The patch series includes a power driver/cpufreq
> governor that can use existing cpufreq drivers as backend. It has been tested
> (not thoroughly) on ARM TC2. The cpufreq governor power driver implementation
> is rather horrible, but it illustrates how the power driver interface can be
> used. Native power drivers is on the todo list.
> 
> The power driver interface is still missing quite a few calls to handle: Idle,
> adding extra information to the sched_domain hierarchy to guide scheduling
> decisions (packing), and possibly scaling of tracked load to compensate for
> frequency changes and asymmetric systems (big.LITTLE).
> 
> This set is based on 3.11. I have done ARM TC2 testing based on linux-linaro
> 2013.08[4] to get cpufreq support for TC2.

What I'm missing is a general overview of why what and how.

In particular; how does this proposal lead to power savings. Is there a
mathematical model that supports this framework? Something where if you
give it a task set with global utilisation < 1 (ie. there's idle time),
it results in less power used.

Also, how does this proposal deal with cpufreq's fundamental broken
approach to SMP? Afaict nothing considers the effect of one cpu upon
another -- something which isn't true at all.

In fact, I don't see anything except a random bunch of hooks without an
over-all picture of how to get less power used.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/