Re: [patch v7 0/21] sched: power aware scheduling

2013-05-19 Thread Preeti U Murthy
Hi Alex,

On 05/20/2013 06:31 AM, Alex Shi wrote:
> 
>> Which are the workloads where 'powersaving' mode hurts workload 
>> performance measurably?
>>
>> I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.
> 
> Is this a 2 * 16 * 4 LCPUs PowerPC machine?

This is a 2 * 8 * 4 LCPUs PowerPC machine.

>> The power efficiency drops significantly with the powersaving policy of
>> this patch,over the power efficiency of the scheduler without this patch.
>>
>> The below parameters are measured relative to the default scheduler
>> behaviour.
>>
>> A: Drop in power efficiency with the patch+powersaving policy
>> B: Drop in performance with the patch+powersaving policy
>> C: Decrease in power consumption with the patch+powersaving policy
>>
>> NumThreads  AB C
>> -
>> 2   33% 36%   4%
>> 4   31% 33%   3%
>> 8   28% 30%   3%
>> 16  31% 33%   4%
>>
>> Each of the above run is for 30s.
>>
>> On investigating socket utilization,I found that only 1 socket was being
>> used during all the above threaded runs. As can be guessed this is due
>> to the group_weight being considered for the threshold metric.
>> This stacks up tasks on a core and further on a socket, thus throttling
>> them, as observed by Mike below.
>>
>> I therefore think we must switch to group_capacity as the metric for
>> threshold and use only (rq->utils*nr_running) for group_utils
>> calculation during non-bursty wakeup scenarios.
>> This way we are comparing right; the utilization of the runqueue by the
>> fair tasks and the cpu capacity available for them after being consumed
>> by the rt tasks.
>>
>> After I made the above modification,all the above three parameters came
>> to be nearly null. However, I am observing the load balancing of the
>> scheduler with the patch and powersavings policy enabled. It is behaving
>> very close to the default scheduler (spreading tasks across sockets).
>> That also explains why there is no performance drop or gain with the
>> patch+powersavings policy enabled. I will look into this observation and
>> revert.
> 
> Thanks a lot for the great testings!
> Seem tasks per SMT cpu isn't power efficient.
> And I got the similar result last week. I tested the fspin testing(do
> endless calculation, in linux-next tree.). when I bind task per SMT cpu,
> the power efficiency really dropped with most every threads number. but
> when bind task per core, it has better power efficiency on all threads.
> Beside to move task depend on group_capacity, another choice is balance
> task according cpu_power. I did the transfer in code. but need to go
> through a internal open source process before public them.

What do you mean by *another* choice is balance task according to
cpu_power? group_capacity is based on cpu_power.

Also, your balance policy in v6 was doing the same right? It was rightly
comparing rq->utils * nr_running against cpu_power. Why not simply
switch to that code for power policy load balancing?

> Well, it'll lose throughput any time there's parallel execution
> potential but it's serialized instead.. using average will inevitably
> stack tasks sometimes, but that's its goal.  Hackbench shows it.

 (but that consolidation can be a winner too, and I bet a nickle it would
 be for a socket sized pgbench run)
>>>
>>> (belay that, was thinking of keeping all tasks on a single node, but
>>> it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)
>>
>> At this point, I would like to raise one issue.
>> *Is the goal of the power aware scheduler improving power efficiency of
>> the scheduler or a compromise on the power efficiency but definitely a
>> decrease in power consumption, since it is the user who has decided to
>> prioritise lower power consumption over performance* ?
>>
> 
> It could be one of reason for this feather, but I could like to
> make it has better efficiency, like packing tasks according to cpu_power
> not current group_weight.

Yes we could try the patch using group_capacity and observe the results
for power efficiency, before we decide to compromise on power efficiency
for decrease in power.

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-05-19 Thread Alex Shi

> Which are the workloads where 'powersaving' mode hurts workload 
> performance measurably?
> 
> I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.

Is this a 2 * 16 * 4 LCPUs PowerPC machine?
> The power efficiency drops significantly with the powersaving policy of
> this patch,over the power efficiency of the scheduler without this patch.
> 
> The below parameters are measured relative to the default scheduler
> behaviour.
> 
> A: Drop in power efficiency with the patch+powersaving policy
> B: Drop in performance with the patch+powersaving policy
> C: Decrease in power consumption with the patch+powersaving policy
> 
> NumThreads  AB C
> -
> 2   33% 36%   4%
> 4   31% 33%   3%
> 8   28% 30%   3%
> 16  31% 33%   4%
> 
> Each of the above run is for 30s.
> 
> On investigating socket utilization,I found that only 1 socket was being
> used during all the above threaded runs. As can be guessed this is due
> to the group_weight being considered for the threshold metric.
> This stacks up tasks on a core and further on a socket, thus throttling
> them, as observed by Mike below.
> 
> I therefore think we must switch to group_capacity as the metric for
> threshold and use only (rq->utils*nr_running) for group_utils
> calculation during non-bursty wakeup scenarios.
> This way we are comparing right; the utilization of the runqueue by the
> fair tasks and the cpu capacity available for them after being consumed
> by the rt tasks.
> 
> After I made the above modification,all the above three parameters came
> to be nearly null. However, I am observing the load balancing of the
> scheduler with the patch and powersavings policy enabled. It is behaving
> very close to the default scheduler (spreading tasks across sockets).
> That also explains why there is no performance drop or gain with the
> patch+powersavings policy enabled. I will look into this observation and
> revert.

Thanks a lot for the great testings!
Seem tasks per SMT cpu isn't power efficient.
And I got the similar result last week. I tested the fspin testing(do
endless calculation, in linux-next tree.). when I bind task per SMT cpu,
the power efficiency really dropped with most every threads number. but
when bind task per core, it has better power efficiency on all threads.
Beside to move task depend on group_capacity, another choice is balance
task according cpu_power. I did the transfer in code. but need to go
through a internal open source process before public them.
> 

 Well, it'll lose throughput any time there's parallel execution
 potential but it's serialized instead.. using average will inevitably
 stack tasks sometimes, but that's its goal.  Hackbench shows it.
>>>
>>> (but that consolidation can be a winner too, and I bet a nickle it would
>>> be for a socket sized pgbench run)
>>
>> (belay that, was thinking of keeping all tasks on a single node, but
>> it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)
> 
> At this point, I would like to raise one issue.
> *Is the goal of the power aware scheduler improving power efficiency of
> the scheduler or a compromise on the power efficiency but definitely a
> decrease in power consumption, since it is the user who has decided to
> prioritise lower power consumption over performance* ?
> 

It could be one of reason for this feather, but I could like to
make it has better efficiency, like packing tasks according to cpu_power
not current group_weight.
>>
> 
> Regards
> Preeti U Murthy
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-05-19 Thread Alex Shi

 Which are the workloads where 'powersaving' mode hurts workload 
 performance measurably?
 
 I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.

Is this a 2 * 16 * 4 LCPUs PowerPC machine?
 The power efficiency drops significantly with the powersaving policy of
 this patch,over the power efficiency of the scheduler without this patch.
 
 The below parameters are measured relative to the default scheduler
 behaviour.
 
 A: Drop in power efficiency with the patch+powersaving policy
 B: Drop in performance with the patch+powersaving policy
 C: Decrease in power consumption with the patch+powersaving policy
 
 NumThreads  AB C
 -
 2   33% 36%   4%
 4   31% 33%   3%
 8   28% 30%   3%
 16  31% 33%   4%
 
 Each of the above run is for 30s.
 
 On investigating socket utilization,I found that only 1 socket was being
 used during all the above threaded runs. As can be guessed this is due
 to the group_weight being considered for the threshold metric.
 This stacks up tasks on a core and further on a socket, thus throttling
 them, as observed by Mike below.
 
 I therefore think we must switch to group_capacity as the metric for
 threshold and use only (rq-utils*nr_running) for group_utils
 calculation during non-bursty wakeup scenarios.
 This way we are comparing right; the utilization of the runqueue by the
 fair tasks and the cpu capacity available for them after being consumed
 by the rt tasks.
 
 After I made the above modification,all the above three parameters came
 to be nearly null. However, I am observing the load balancing of the
 scheduler with the patch and powersavings policy enabled. It is behaving
 very close to the default scheduler (spreading tasks across sockets).
 That also explains why there is no performance drop or gain with the
 patch+powersavings policy enabled. I will look into this observation and
 revert.

Thanks a lot for the great testings!
Seem tasks per SMT cpu isn't power efficient.
And I got the similar result last week. I tested the fspin testing(do
endless calculation, in linux-next tree.). when I bind task per SMT cpu,
the power efficiency really dropped with most every threads number. but
when bind task per core, it has better power efficiency on all threads.
Beside to move task depend on group_capacity, another choice is balance
task according cpu_power. I did the transfer in code. but need to go
through a internal open source process before public them.
 

 Well, it'll lose throughput any time there's parallel execution
 potential but it's serialized instead.. using average will inevitably
 stack tasks sometimes, but that's its goal.  Hackbench shows it.

 (but that consolidation can be a winner too, and I bet a nickle it would
 be for a socket sized pgbench run)

 (belay that, was thinking of keeping all tasks on a single node, but
 it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)
 
 At this point, I would like to raise one issue.
 *Is the goal of the power aware scheduler improving power efficiency of
 the scheduler or a compromise on the power efficiency but definitely a
 decrease in power consumption, since it is the user who has decided to
 prioritise lower power consumption over performance* ?
 

It could be one of reason for this feather, but I could like to
make it has better efficiency, like packing tasks according to cpu_power
not current group_weight.

 
 Regards
 Preeti U Murthy
 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-05-19 Thread Preeti U Murthy
Hi Alex,

On 05/20/2013 06:31 AM, Alex Shi wrote:
 
 Which are the workloads where 'powersaving' mode hurts workload 
 performance measurably?

 I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.
 
 Is this a 2 * 16 * 4 LCPUs PowerPC machine?

This is a 2 * 8 * 4 LCPUs PowerPC machine.

 The power efficiency drops significantly with the powersaving policy of
 this patch,over the power efficiency of the scheduler without this patch.

 The below parameters are measured relative to the default scheduler
 behaviour.

 A: Drop in power efficiency with the patch+powersaving policy
 B: Drop in performance with the patch+powersaving policy
 C: Decrease in power consumption with the patch+powersaving policy

 NumThreads  AB C
 -
 2   33% 36%   4%
 4   31% 33%   3%
 8   28% 30%   3%
 16  31% 33%   4%

 Each of the above run is for 30s.

 On investigating socket utilization,I found that only 1 socket was being
 used during all the above threaded runs. As can be guessed this is due
 to the group_weight being considered for the threshold metric.
 This stacks up tasks on a core and further on a socket, thus throttling
 them, as observed by Mike below.

 I therefore think we must switch to group_capacity as the metric for
 threshold and use only (rq-utils*nr_running) for group_utils
 calculation during non-bursty wakeup scenarios.
 This way we are comparing right; the utilization of the runqueue by the
 fair tasks and the cpu capacity available for them after being consumed
 by the rt tasks.

 After I made the above modification,all the above three parameters came
 to be nearly null. However, I am observing the load balancing of the
 scheduler with the patch and powersavings policy enabled. It is behaving
 very close to the default scheduler (spreading tasks across sockets).
 That also explains why there is no performance drop or gain with the
 patch+powersavings policy enabled. I will look into this observation and
 revert.
 
 Thanks a lot for the great testings!
 Seem tasks per SMT cpu isn't power efficient.
 And I got the similar result last week. I tested the fspin testing(do
 endless calculation, in linux-next tree.). when I bind task per SMT cpu,
 the power efficiency really dropped with most every threads number. but
 when bind task per core, it has better power efficiency on all threads.
 Beside to move task depend on group_capacity, another choice is balance
 task according cpu_power. I did the transfer in code. but need to go
 through a internal open source process before public them.

What do you mean by *another* choice is balance task according to
cpu_power? group_capacity is based on cpu_power.

Also, your balance policy in v6 was doing the same right? It was rightly
comparing rq-utils * nr_running against cpu_power. Why not simply
switch to that code for power policy load balancing?

 Well, it'll lose throughput any time there's parallel execution
 potential but it's serialized instead.. using average will inevitably
 stack tasks sometimes, but that's its goal.  Hackbench shows it.

 (but that consolidation can be a winner too, and I bet a nickle it would
 be for a socket sized pgbench run)

 (belay that, was thinking of keeping all tasks on a single node, but
 it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)

 At this point, I would like to raise one issue.
 *Is the goal of the power aware scheduler improving power efficiency of
 the scheduler or a compromise on the power efficiency but definitely a
 decrease in power consumption, since it is the user who has decided to
 prioritise lower power consumption over performance* ?

 
 It could be one of reason for this feather, but I could like to
 make it has better efficiency, like packing tasks according to cpu_power
 not current group_weight.

Yes we could try the patch using group_capacity and observe the results
for power efficiency, before we decide to compromise on power efficiency
for decrease in power.

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-05-17 Thread Preeti U Murthy
On 04/30/2013 03:26 PM, Mike Galbraith wrote:
> On Tue, 2013-04-30 at 11:49 +0200, Mike Galbraith wrote: 
>> On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
>>> On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 
>>
 Which are the workloads where 'powersaving' mode hurts workload 
 performance measurably?

I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.
The power efficiency drops significantly with the powersaving policy of
this patch,over the power efficiency of the scheduler without this patch.

The below parameters are measured relative to the default scheduler
behaviour.

A: Drop in power efficiency with the patch+powersaving policy
B: Drop in performance with the patch+powersaving policy
C: Decrease in power consumption with the patch+powersaving policy

NumThreads  AB C
-
2   33% 36%   4%
4   31% 33%   3%
8   28% 30%   3%
16  31% 33%   4%

Each of the above run is for 30s.

On investigating socket utilization,I found that only 1 socket was being
used during all the above threaded runs. As can be guessed this is due
to the group_weight being considered for the threshold metric.
This stacks up tasks on a core and further on a socket, thus throttling
them, as observed by Mike below.

I therefore think we must switch to group_capacity as the metric for
threshold and use only (rq->utils*nr_running) for group_utils
calculation during non-bursty wakeup scenarios.
This way we are comparing right; the utilization of the runqueue by the
fair tasks and the cpu capacity available for them after being consumed
by the rt tasks.

After I made the above modification,all the above three parameters came
to be nearly null. However, I am observing the load balancing of the
scheduler with the patch and powersavings policy enabled. It is behaving
very close to the default scheduler (spreading tasks across sockets).
That also explains why there is no performance drop or gain with the
patch+powersavings policy enabled. I will look into this observation and
revert.

>>>
>>> Well, it'll lose throughput any time there's parallel execution
>>> potential but it's serialized instead.. using average will inevitably
>>> stack tasks sometimes, but that's its goal.  Hackbench shows it.
>>
>> (but that consolidation can be a winner too, and I bet a nickle it would
>> be for a socket sized pgbench run)
> 
> (belay that, was thinking of keeping all tasks on a single node, but
> it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)

At this point, I would like to raise one issue.
*Is the goal of the power aware scheduler improving power efficiency of
the scheduler or a compromise on the power efficiency but definitely a
decrease in power consumption, since it is the user who has decided to
prioritise lower power consumption over performance* ?

> 

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-05-17 Thread Preeti U Murthy
On 04/30/2013 03:26 PM, Mike Galbraith wrote:
 On Tue, 2013-04-30 at 11:49 +0200, Mike Galbraith wrote: 
 On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
 On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 

 Which are the workloads where 'powersaving' mode hurts workload 
 performance measurably?

I ran ebizzy on a 2 socket, 16 core, SMT 4 Power machine.
The power efficiency drops significantly with the powersaving policy of
this patch,over the power efficiency of the scheduler without this patch.

The below parameters are measured relative to the default scheduler
behaviour.

A: Drop in power efficiency with the patch+powersaving policy
B: Drop in performance with the patch+powersaving policy
C: Decrease in power consumption with the patch+powersaving policy

NumThreads  AB C
-
2   33% 36%   4%
4   31% 33%   3%
8   28% 30%   3%
16  31% 33%   4%

Each of the above run is for 30s.

On investigating socket utilization,I found that only 1 socket was being
used during all the above threaded runs. As can be guessed this is due
to the group_weight being considered for the threshold metric.
This stacks up tasks on a core and further on a socket, thus throttling
them, as observed by Mike below.

I therefore think we must switch to group_capacity as the metric for
threshold and use only (rq-utils*nr_running) for group_utils
calculation during non-bursty wakeup scenarios.
This way we are comparing right; the utilization of the runqueue by the
fair tasks and the cpu capacity available for them after being consumed
by the rt tasks.

After I made the above modification,all the above three parameters came
to be nearly null. However, I am observing the load balancing of the
scheduler with the patch and powersavings policy enabled. It is behaving
very close to the default scheduler (spreading tasks across sockets).
That also explains why there is no performance drop or gain with the
patch+powersavings policy enabled. I will look into this observation and
revert.


 Well, it'll lose throughput any time there's parallel execution
 potential but it's serialized instead.. using average will inevitably
 stack tasks sometimes, but that's its goal.  Hackbench shows it.

 (but that consolidation can be a winner too, and I bet a nickle it would
 be for a socket sized pgbench run)
 
 (belay that, was thinking of keeping all tasks on a single node, but
 it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)

At this point, I would like to raise one issue.
*Is the goal of the power aware scheduler improving power efficiency of
the scheduler or a compromise on the power efficiency but definitely a
decrease in power consumption, since it is the user who has decided to
prioritise lower power consumption over performance* ?

 

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 11:49 +0200, Mike Galbraith wrote: 
> On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
> > On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 
> 
> > > Which are the workloads where 'powersaving' mode hurts workload 
> > > performance measurably?
> > 
> > Well, it'll lose throughput any time there's parallel execution
> > potential but it's serialized instead.. using average will inevitably
> > stack tasks sometimes, but that's its goal.  Hackbench shows it.
> 
> (but that consolidation can be a winner too, and I bet a nickle it would
> be for a socket sized pgbench run)

(belay that, was thinking of keeping all tasks on a single node, but
it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
> On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 

> > Which are the workloads where 'powersaving' mode hurts workload 
> > performance measurably?
> 
> Well, it'll lose throughput any time there's parallel execution
> potential but it's serialized instead.. using average will inevitably
> stack tasks sometimes, but that's its goal.  Hackbench shows it.

(but that consolidation can be a winner too, and I bet a nickle it would
be for a socket sized pgbench run)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 
> * Mike Galbraith  wrote:
> 
> > On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:
> > 
> > > Well now, that's not exactly what I expected to see for AIM7 compute.
> > > Filesystem is munching cycles otherwise used for compute when load is
> > > spread across the whole box vs consolidated.
> > 
> > So AIM7 compute performance delta boils down to: powersaving stacks
> > tasks, so they pat single bit of spinning rust sequentially/gently.
> 
> So AIM7 with real block IO improved, due to sequentiality. Does it improve 
> if AIM7 works on an SSD, or into ramdisk?

Seriously doubt it, but I suppose I can try tmpfs.

performance 
Tasksjobs/min  jti  jobs/min/task  real   cpu
   2011170.51   99   558.5253 10.85 15.19   Tue Apr 30 11:21:46 
2013
   2011078.61   99   553.9305 10.94 15.59   Tue Apr 30 11:21:57 
2013
   2011191.14   99   559.5568 10.83 15.29   Tue Apr 30 11:22:08 
2013

powersaving
Tasksjobs/min  jti  jobs/min/task  real   cpu
   2010978.26   99   548.9130 11.04 19.25   Tue Apr 30 11:22:38 
2013
   2010988.21   99   549.4107 11.03 18.71   Tue Apr 30 11:22:49 
2013
   2011008.17   99   550.4087 11.01 18.85   Tue Apr 30 11:23:00 
2013

Nope.

> Which are the workloads where 'powersaving' mode hurts workload 
> performance measurably?

Well, it'll lose throughput any time there's parallel execution
potential but it's serialized instead.. using average will inevitably
stack tasks sometimes, but that's its goal.  Hackbench shows it.

performance 
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.487
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.487
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.497

powersaving
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.702
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.679
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 1.137

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Ingo Molnar

* Mike Galbraith  wrote:

> On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:
> 
> > Well now, that's not exactly what I expected to see for AIM7 compute.
> > Filesystem is munching cycles otherwise used for compute when load is
> > spread across the whole box vs consolidated.
> 
> So AIM7 compute performance delta boils down to: powersaving stacks
> tasks, so they pat single bit of spinning rust sequentially/gently.

So AIM7 with real block IO improved, due to sequentiality. Does it improve 
if AIM7 works on an SSD, or into ramdisk?

Which are the workloads where 'powersaving' mode hurts workload 
performance measurably?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:

> Well now, that's not exactly what I expected to see for AIM7 compute.
> Filesystem is munching cycles otherwise used for compute when load is
> spread across the whole box vs consolidated.

So AIM7 compute performance delta boils down to: powersaving stacks
tasks, so they pat single bit of spinning rust sequentially/gently.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:

 Well now, that's not exactly what I expected to see for AIM7 compute.
 Filesystem is munching cycles otherwise used for compute when load is
 spread across the whole box vs consolidated.

So AIM7 compute performance delta boils down to: powersaving stacks
tasks, so they pat single bit of spinning rust sequentially/gently.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Ingo Molnar

* Mike Galbraith bitbuc...@online.de wrote:

 On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:
 
  Well now, that's not exactly what I expected to see for AIM7 compute.
  Filesystem is munching cycles otherwise used for compute when load is
  spread across the whole box vs consolidated.
 
 So AIM7 compute performance delta boils down to: powersaving stacks
 tasks, so they pat single bit of spinning rust sequentially/gently.

So AIM7 with real block IO improved, due to sequentiality. Does it improve 
if AIM7 works on an SSD, or into ramdisk?

Which are the workloads where 'powersaving' mode hurts workload 
performance measurably?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 
 * Mike Galbraith bitbuc...@online.de wrote:
 
  On Tue, 2013-04-30 at 07:16 +0200, Mike Galbraith wrote:
  
   Well now, that's not exactly what I expected to see for AIM7 compute.
   Filesystem is munching cycles otherwise used for compute when load is
   spread across the whole box vs consolidated.
  
  So AIM7 compute performance delta boils down to: powersaving stacks
  tasks, so they pat single bit of spinning rust sequentially/gently.
 
 So AIM7 with real block IO improved, due to sequentiality. Does it improve 
 if AIM7 works on an SSD, or into ramdisk?

Seriously doubt it, but I suppose I can try tmpfs.

performance 
Tasksjobs/min  jti  jobs/min/task  real   cpu
   2011170.51   99   558.5253 10.85 15.19   Tue Apr 30 11:21:46 
2013
   2011078.61   99   553.9305 10.94 15.59   Tue Apr 30 11:21:57 
2013
   2011191.14   99   559.5568 10.83 15.29   Tue Apr 30 11:22:08 
2013

powersaving
Tasksjobs/min  jti  jobs/min/task  real   cpu
   2010978.26   99   548.9130 11.04 19.25   Tue Apr 30 11:22:38 
2013
   2010988.21   99   549.4107 11.03 18.71   Tue Apr 30 11:22:49 
2013
   2011008.17   99   550.4087 11.01 18.85   Tue Apr 30 11:23:00 
2013

Nope.

 Which are the workloads where 'powersaving' mode hurts workload 
 performance measurably?

Well, it'll lose throughput any time there's parallel execution
potential but it's serialized instead.. using average will inevitably
stack tasks sometimes, but that's its goal.  Hackbench shows it.

performance 
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.487
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.487
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.497

powersaving
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.702
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 0.679
monteverdi:/abuild/mike/aim7/:[0]# hackbench -l 1000
Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
Each sender will pass 1000 messages of 100 bytes
Time: 1.137

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
 On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 

  Which are the workloads where 'powersaving' mode hurts workload 
  performance measurably?
 
 Well, it'll lose throughput any time there's parallel execution
 potential but it's serialized instead.. using average will inevitably
 stack tasks sometimes, but that's its goal.  Hackbench shows it.

(but that consolidation can be a winner too, and I bet a nickle it would
be for a socket sized pgbench run)



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-30 Thread Mike Galbraith
On Tue, 2013-04-30 at 11:49 +0200, Mike Galbraith wrote: 
 On Tue, 2013-04-30 at 11:35 +0200, Mike Galbraith wrote: 
  On Tue, 2013-04-30 at 10:41 +0200, Ingo Molnar wrote: 
 
   Which are the workloads where 'powersaving' mode hurts workload 
   performance measurably?
  
  Well, it'll lose throughput any time there's parallel execution
  potential but it's serialized instead.. using average will inevitably
  stack tasks sometimes, but that's its goal.  Hackbench shows it.
 
 (but that consolidation can be a winner too, and I bet a nickle it would
 be for a socket sized pgbench run)

(belay that, was thinking of keeping all tasks on a single node, but
it'll likely stack the whole thing on a CPU or two, if so, it'll hurt)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-29 Thread Mike Galbraith
On Fri, 2013-04-26 at 17:11 +0200, Mike Galbraith wrote: 
> On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
> > On 04/12/2013 12:48 PM, Mike Galbraith wrote:
> > > On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
> > >> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
> > >>> Thanks a lot for comments, Len!
> > >>
> > >> AFAICT, you kinda forgot to answer his most important question:
> > >>
> > >>> These numbers suggest that this patch series simultaneously
> > >>> has a negative impact on performance and energy required
> > >>> to retire the workload.  Why do it?
> > > 
> > > Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
> > > throughput increase at the low to moderate load end of the test spectrum
> > > IIRC.  Fully repeatable.  There were also other benefits unrelated to
> > > power, ie mitigation of the evil face of select_idle_sibling().  I
> > > rather liked what I saw during ~big box test-drive.
> > > 
> > > (just saying there are other aspects besides joules in there)
> > 
> > Mike,
> > 
> > Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
> > and then independently re-enable them?
> > 
> > If you still see the performance benefit, then that proves
> > that the scheduler hacks are not about tricking into
> > turbo mode, but something else.
> 
> I did that today, neither turbo nor HT affected the performance gain.  I
> used the same box and patch set as tested before (v4), but plugged into
> linus HEAD.  "powersaving" AIM7 numbers are ~identical to those I posted
> before, "performance" is lower at the low end of AIM7 test spectrum, but
> as before, delta goes away once the load becomes hefty.

Well now, that's not exactly what I expected to see for AIM7 compute.
Filesystem is munching cycles otherwise used for compute when load is
spread across the whole box vs consolidated.

performance

   PerfTop:  35 irqs/sec  kernel:94.3%  exact:  0.0% [1000Hz cycles],  
(all, 80 CPUs)
---


 samples  pcnt function   DSO
 ___ _ __ 


 9367.00 15.5% jbd2_journal_put_journal_head  
/lib/modules/3.9.0-default/build/vmlinux
 7658.00 12.7% jbd2_journal_add_journal_head  
/lib/modules/3.9.0-default/build/vmlinux
 7042.00 11.7% jbd2_journal_grab_journal_head 
/lib/modules/3.9.0-default/build/vmlinux
 4433.00  7.4% sieve  
/abuild/mike/aim7/multitask 
 3248.00  5.4% jbd_lock_bh_state  
/lib/modules/3.9.0-default/build/vmlinux
 3034.00  5.0% do_get_write_access
/lib/modules/3.9.0-default/build/vmlinux
 2058.00  3.4% mul_double 
/abuild/mike/aim7/multitask 
 2038.00  3.4% add_double 
/abuild/mike/aim7/multitask 
 1365.00  2.3% native_write_msr_safe  
/lib/modules/3.9.0-default/build/vmlinux
 1333.00  2.2% __find_get_block   
/lib/modules/3.9.0-default/build/vmlinux
 1213.00  2.0% add_long   
/abuild/mike/aim7/multitask 
 1208.00  2.0% add_int
/abuild/mike/aim7/multitask 
 1084.00  1.8% __wait_on_bit_lock 
/lib/modules/3.9.0-default/build/vmlinux
 1065.00  1.8% div_double 
/abuild/mike/aim7/multitask 
  901.00  1.5% intel_idle 
/lib/modules/3.9.0-default/build/vmlinux
  812.00  1.3% _raw_spin_lock_irqsave 
/lib/modules/3.9.0-default/build/vmlinux
  559.00  0.9% jbd2_journal_dirty_metadata
/lib/modules/3.9.0-default/build/vmlinux
  464.00  0.8% copy_user_generic_string   
/lib/modules/3.9.0-default/build/vmlinux
  455.00  0.8% div_int
/abuild/mike/aim7/multitask 
  430.00  0.7% string_rtns_1  
/abuild/mike/aim7/multitask 
  419.00  0.7% strncat/lib64/libc-2.11.3.so 
  
  412.00  0.7% wake_bit_function  
/lib/modules/3.9.0-default/build/vmlinux
  347.00  0.6% jbd2_journal_cancel_revoke 
/lib/modules/3.9.0-default/build/vmlinux
  346.00  0.6% ext4_mark_iloc_dirty   
/lib/modules/3.9.0-default/build/vmlinux
  306.00  0.5% __brelse   
/lib/modules/3.9.0-default/build/vmlinux

powersaving

   PerfTop:  59 irqs/sec  kernel:78.0%  exact:  0.0% [1000Hz cycles],  
(all, 80 CPUs)

Re: [patch v7 0/21] sched: power aware scheduling

2013-04-29 Thread Mike Galbraith
On Fri, 2013-04-26 at 17:11 +0200, Mike Galbraith wrote: 
 On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
  On 04/12/2013 12:48 PM, Mike Galbraith wrote:
   On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
   On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
   Thanks a lot for comments, Len!
  
   AFAICT, you kinda forgot to answer his most important question:
  
   These numbers suggest that this patch series simultaneously
   has a negative impact on performance and energy required
   to retire the workload.  Why do it?
   
   Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
   throughput increase at the low to moderate load end of the test spectrum
   IIRC.  Fully repeatable.  There were also other benefits unrelated to
   power, ie mitigation of the evil face of select_idle_sibling().  I
   rather liked what I saw during ~big box test-drive.
   
   (just saying there are other aspects besides joules in there)
  
  Mike,
  
  Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
  and then independently re-enable them?
  
  If you still see the performance benefit, then that proves
  that the scheduler hacks are not about tricking into
  turbo mode, but something else.
 
 I did that today, neither turbo nor HT affected the performance gain.  I
 used the same box and patch set as tested before (v4), but plugged into
 linus HEAD.  powersaving AIM7 numbers are ~identical to those I posted
 before, performance is lower at the low end of AIM7 test spectrum, but
 as before, delta goes away once the load becomes hefty.

Well now, that's not exactly what I expected to see for AIM7 compute.
Filesystem is munching cycles otherwise used for compute when load is
spread across the whole box vs consolidated.

performance

   PerfTop:  35 irqs/sec  kernel:94.3%  exact:  0.0% [1000Hz cycles],  
(all, 80 CPUs)
---


 samples  pcnt function   DSO
 ___ _ __ 


 9367.00 15.5% jbd2_journal_put_journal_head  
/lib/modules/3.9.0-default/build/vmlinux
 7658.00 12.7% jbd2_journal_add_journal_head  
/lib/modules/3.9.0-default/build/vmlinux
 7042.00 11.7% jbd2_journal_grab_journal_head 
/lib/modules/3.9.0-default/build/vmlinux
 4433.00  7.4% sieve  
/abuild/mike/aim7/multitask 
 3248.00  5.4% jbd_lock_bh_state  
/lib/modules/3.9.0-default/build/vmlinux
 3034.00  5.0% do_get_write_access
/lib/modules/3.9.0-default/build/vmlinux
 2058.00  3.4% mul_double 
/abuild/mike/aim7/multitask 
 2038.00  3.4% add_double 
/abuild/mike/aim7/multitask 
 1365.00  2.3% native_write_msr_safe  
/lib/modules/3.9.0-default/build/vmlinux
 1333.00  2.2% __find_get_block   
/lib/modules/3.9.0-default/build/vmlinux
 1213.00  2.0% add_long   
/abuild/mike/aim7/multitask 
 1208.00  2.0% add_int
/abuild/mike/aim7/multitask 
 1084.00  1.8% __wait_on_bit_lock 
/lib/modules/3.9.0-default/build/vmlinux
 1065.00  1.8% div_double 
/abuild/mike/aim7/multitask 
  901.00  1.5% intel_idle 
/lib/modules/3.9.0-default/build/vmlinux
  812.00  1.3% _raw_spin_lock_irqsave 
/lib/modules/3.9.0-default/build/vmlinux
  559.00  0.9% jbd2_journal_dirty_metadata
/lib/modules/3.9.0-default/build/vmlinux
  464.00  0.8% copy_user_generic_string   
/lib/modules/3.9.0-default/build/vmlinux
  455.00  0.8% div_int
/abuild/mike/aim7/multitask 
  430.00  0.7% string_rtns_1  
/abuild/mike/aim7/multitask 
  419.00  0.7% strncat/lib64/libc-2.11.3.so 
  
  412.00  0.7% wake_bit_function  
/lib/modules/3.9.0-default/build/vmlinux
  347.00  0.6% jbd2_journal_cancel_revoke 
/lib/modules/3.9.0-default/build/vmlinux
  346.00  0.6% ext4_mark_iloc_dirty   
/lib/modules/3.9.0-default/build/vmlinux
  306.00  0.5% __brelse   
/lib/modules/3.9.0-default/build/vmlinux

powersaving

   PerfTop:  59 irqs/sec  kernel:78.0%  exact:  0.0% [1000Hz cycles],  
(all, 80 CPUs)

Re: [patch v7 0/21] sched: power aware scheduling

2013-04-26 Thread Mike Galbraith
On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
> On 04/12/2013 12:48 PM, Mike Galbraith wrote:
> > On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
> >> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
> >>> Thanks a lot for comments, Len!
> >>
> >> AFAICT, you kinda forgot to answer his most important question:
> >>
> >>> These numbers suggest that this patch series simultaneously
> >>> has a negative impact on performance and energy required
> >>> to retire the workload.  Why do it?
> > 
> > Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
> > throughput increase at the low to moderate load end of the test spectrum
> > IIRC.  Fully repeatable.  There were also other benefits unrelated to
> > power, ie mitigation of the evil face of select_idle_sibling().  I
> > rather liked what I saw during ~big box test-drive.
> > 
> > (just saying there are other aspects besides joules in there)
> 
> Mike,
> 
> Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
> and then independently re-enable them?
> 
> If you still see the performance benefit, then that proves
> that the scheduler hacks are not about tricking into
> turbo mode, but something else.

I did that today, neither turbo nor HT affected the performance gain.  I
used the same box and patch set as tested before (v4), but plugged into
linus HEAD.  "powersaving" AIM7 numbers are ~identical to those I posted
before, "performance" is lower at the low end of AIM7 test spectrum, but
as before, delta goes away once the load becomes hefty.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-26 Thread Mike Galbraith
On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
 On 04/12/2013 12:48 PM, Mike Galbraith wrote:
  On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
  On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
  Thanks a lot for comments, Len!
 
  AFAICT, you kinda forgot to answer his most important question:
 
  These numbers suggest that this patch series simultaneously
  has a negative impact on performance and energy required
  to retire the workload.  Why do it?
  
  Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
  throughput increase at the low to moderate load end of the test spectrum
  IIRC.  Fully repeatable.  There were also other benefits unrelated to
  power, ie mitigation of the evil face of select_idle_sibling().  I
  rather liked what I saw during ~big box test-drive.
  
  (just saying there are other aspects besides joules in there)
 
 Mike,
 
 Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
 and then independently re-enable them?
 
 If you still see the performance benefit, then that proves
 that the scheduler hacks are not about tricking into
 turbo mode, but something else.

I did that today, neither turbo nor HT affected the performance gain.  I
used the same box and patch set as tested before (v4), but plugged into
linus HEAD.  powersaving AIM7 numbers are ~identical to those I posted
before, performance is lower at the low end of AIM7 test spectrum, but
as before, delta goes away once the load becomes hefty.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Mike Galbraith
On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
> On 04/12/2013 12:48 PM, Mike Galbraith wrote:
> > On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
> >> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
> >>> Thanks a lot for comments, Len!
> >>
> >> AFAICT, you kinda forgot to answer his most important question:
> >>
> >>> These numbers suggest that this patch series simultaneously
> >>> has a negative impact on performance and energy required
> >>> to retire the workload.  Why do it?
> > 
> > Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
> > throughput increase at the low to moderate load end of the test spectrum
> > IIRC.  Fully repeatable.  There were also other benefits unrelated to
> > power, ie mitigation of the evil face of select_idle_sibling().  I
> > rather liked what I saw during ~big box test-drive.
> > 
> > (just saying there are other aspects besides joules in there)
> 
> Mike,
> 
> Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
> and then independently re-enable them?

Unfortunately no, because I don't have remote access to buttons.

> If you still see the performance benefit, then that proves
> that the scheduler hacks are not about tricking into
> turbo mode, but something else.

Yeah, turbo playing a role in that makes lots of sense.  Someone else
will have to test that though.  It was 100% repeatable, so should be
easy to verify.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Len Brown
On 04/12/2013 12:48 PM, Mike Galbraith wrote:
> On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
>> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
>>> Thanks a lot for comments, Len!
>>
>> AFAICT, you kinda forgot to answer his most important question:
>>
>>> These numbers suggest that this patch series simultaneously
>>> has a negative impact on performance and energy required
>>> to retire the workload.  Why do it?
> 
> Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
> throughput increase at the low to moderate load end of the test spectrum
> IIRC.  Fully repeatable.  There were also other benefits unrelated to
> power, ie mitigation of the evil face of select_idle_sibling().  I
> rather liked what I saw during ~big box test-drive.
> 
> (just saying there are other aspects besides joules in there)

Mike,

Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
and then independently re-enable them?

If you still see the performance benefit, then that proves
that the scheduler hacks are not about tricking into
turbo mode, but something else.

If the performance gains *are* about interactions with turbo-mode,
then perhaps what we should really be doing here is making
the scheduler explicitly turbo-aware?  Of course, that begs the question
of how the scheduler should be aware of cpufreq in general...

thanks,
Len Brown, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Borislav Petkov
On Wed, Apr 17, 2013 at 09:18:28AM +0800, Alex Shi wrote:
> Sure. Currently if the whole socket get into sleep, but the memory on
> the node is still accessed. the cpu socket still spend some power on
> 'uncore' part. So the further step is reduce the remote memory access
> to save more power, and that is also numa balance want to do.

Yeah, if you also mean, you need to further migrate the memory of the
threads away from the node so that it doesn't need to serve memory
accesses from other sockets, then that should probably help save even
more power. You probably would still need to serve probes from the L3
but your DRAM links will be powered down and such.

> And then the next step is to detect if this socket is cache intensive,
> if there is much cache thresh on the node.

Yeah, that would be probably harder to determine - is cache thrashing
(and I think you mean L3 here) worse than migrating tasks to other nodes
and having them powered on just because my current node is not supposed
to thrash L3. Hmm.

> In theory, there is still has lots of tuning space. :)

Yep. :)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Borislav Petkov
On Wed, Apr 17, 2013 at 09:18:28AM +0800, Alex Shi wrote:
 Sure. Currently if the whole socket get into sleep, but the memory on
 the node is still accessed. the cpu socket still spend some power on
 'uncore' part. So the further step is reduce the remote memory access
 to save more power, and that is also numa balance want to do.

Yeah, if you also mean, you need to further migrate the memory of the
threads away from the node so that it doesn't need to serve memory
accesses from other sockets, then that should probably help save even
more power. You probably would still need to serve probes from the L3
but your DRAM links will be powered down and such.

 And then the next step is to detect if this socket is cache intensive,
 if there is much cache thresh on the node.

Yeah, that would be probably harder to determine - is cache thrashing
(and I think you mean L3 here) worse than migrating tasks to other nodes
and having them powered on just because my current node is not supposed
to thrash L3. Hmm.

 In theory, there is still has lots of tuning space. :)

Yep. :)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Len Brown
On 04/12/2013 12:48 PM, Mike Galbraith wrote:
 On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
 On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
 Thanks a lot for comments, Len!

 AFAICT, you kinda forgot to answer his most important question:

 These numbers suggest that this patch series simultaneously
 has a negative impact on performance and energy required
 to retire the workload.  Why do it?
 
 Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
 throughput increase at the low to moderate load end of the test spectrum
 IIRC.  Fully repeatable.  There were also other benefits unrelated to
 power, ie mitigation of the evil face of select_idle_sibling().  I
 rather liked what I saw during ~big box test-drive.
 
 (just saying there are other aspects besides joules in there)

Mike,

Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
and then independently re-enable them?

If you still see the performance benefit, then that proves
that the scheduler hacks are not about tricking into
turbo mode, but something else.

If the performance gains *are* about interactions with turbo-mode,
then perhaps what we should really be doing here is making
the scheduler explicitly turbo-aware?  Of course, that begs the question
of how the scheduler should be aware of cpufreq in general...

thanks,
Len Brown, Intel Open Source Technology Center

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-17 Thread Mike Galbraith
On Wed, 2013-04-17 at 17:53 -0400, Len Brown wrote: 
 On 04/12/2013 12:48 PM, Mike Galbraith wrote:
  On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
  On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
  Thanks a lot for comments, Len!
 
  AFAICT, you kinda forgot to answer his most important question:
 
  These numbers suggest that this patch series simultaneously
  has a negative impact on performance and energy required
  to retire the workload.  Why do it?
  
  Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
  throughput increase at the low to moderate load end of the test spectrum
  IIRC.  Fully repeatable.  There were also other benefits unrelated to
  power, ie mitigation of the evil face of select_idle_sibling().  I
  rather liked what I saw during ~big box test-drive.
  
  (just saying there are other aspects besides joules in there)
 
 Mike,
 
 Can you re-run your AIM7 measurement with turbo-mode and HT-mode disabled,
 and then independently re-enable them?

Unfortunately no, because I don't have remote access to buttons.

 If you still see the performance benefit, then that proves
 that the scheduler hacks are not about tricking into
 turbo mode, but something else.

Yeah, turbo playing a role in that makes lots of sense.  Someone else
will have to test that though.  It was 100% repeatable, so should be
easy to verify.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-16 Thread Alex Shi
On 04/16/2013 06:24 PM, Borislav Petkov wrote:
> On Tue, Apr 16, 2013 at 08:22:19AM +0800, Alex Shi wrote:
>> testing has a little variation, but the power data is quite accurate.
>> I may change to packing tasks per cpu capacity than current cpu
>> weight. that should has better power efficient value.
> 
> Yeah, this probably needs careful measuring - and by "this" I mean how
> to place N tasks where N is less than number of cores in the system.
> 
> I can imagine trying to migrate them all together on a single physical
> socket (maybe even overcommitting it) so that you can flush the caches
> of the cores on the other sockets and so that you can power down the
> other sockets and avoid coherent traffic from waking them up, to be one
> strategy. My supposition here is that maybe putting the whole unused
> sockets in a deep sleep state could save a lot of power.

Sure. Currently if the whole socket get into sleep, but the memory on
the node is still accessed. the cpu socket still spend some power on
'uncore' part. So the further step is reduce the remote memory access to
save more power, and that is also numa balance want to do.
And then the next step is to detect if this socket is cache intensive,
if there is much cache thresh on the node.
In theory, there is still has lots of tuning space. :)
> 
> Or not, who knows. Only empirical measurements should show us what
> actually happens.

Sure. :)
> 
> Thanks.
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-16 Thread Borislav Petkov
On Tue, Apr 16, 2013 at 08:22:19AM +0800, Alex Shi wrote:
> testing has a little variation, but the power data is quite accurate.
> I may change to packing tasks per cpu capacity than current cpu
> weight. that should has better power efficient value.

Yeah, this probably needs careful measuring - and by "this" I mean how
to place N tasks where N is less than number of cores in the system.

I can imagine trying to migrate them all together on a single physical
socket (maybe even overcommitting it) so that you can flush the caches
of the cores on the other sockets and so that you can power down the
other sockets and avoid coherent traffic from waking them up, to be one
strategy. My supposition here is that maybe putting the whole unused
sockets in a deep sleep state could save a lot of power.

Or not, who knows. Only empirical measurements should show us what
actually happens.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-16 Thread Borislav Petkov
On Tue, Apr 16, 2013 at 08:22:19AM +0800, Alex Shi wrote:
 testing has a little variation, but the power data is quite accurate.
 I may change to packing tasks per cpu capacity than current cpu
 weight. that should has better power efficient value.

Yeah, this probably needs careful measuring - and by this I mean how
to place N tasks where N is less than number of cores in the system.

I can imagine trying to migrate them all together on a single physical
socket (maybe even overcommitting it) so that you can flush the caches
of the cores on the other sockets and so that you can power down the
other sockets and avoid coherent traffic from waking them up, to be one
strategy. My supposition here is that maybe putting the whole unused
sockets in a deep sleep state could save a lot of power.

Or not, who knows. Only empirical measurements should show us what
actually happens.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-16 Thread Alex Shi
On 04/16/2013 06:24 PM, Borislav Petkov wrote:
 On Tue, Apr 16, 2013 at 08:22:19AM +0800, Alex Shi wrote:
 testing has a little variation, but the power data is quite accurate.
 I may change to packing tasks per cpu capacity than current cpu
 weight. that should has better power efficient value.
 
 Yeah, this probably needs careful measuring - and by this I mean how
 to place N tasks where N is less than number of cores in the system.
 
 I can imagine trying to migrate them all together on a single physical
 socket (maybe even overcommitting it) so that you can flush the caches
 of the cores on the other sockets and so that you can power down the
 other sockets and avoid coherent traffic from waking them up, to be one
 strategy. My supposition here is that maybe putting the whole unused
 sockets in a deep sleep state could save a lot of power.

Sure. Currently if the whole socket get into sleep, but the memory on
the node is still accessed. the cpu socket still spend some power on
'uncore' part. So the further step is reduce the remote memory access to
save more power, and that is also numa balance want to do.
And then the next step is to detect if this socket is cache intensive,
if there is much cache thresh on the node.
In theory, there is still has lots of tuning space. :)
 
 Or not, who knows. Only empirical measurements should show us what
 actually happens.

Sure. :)
 
 Thanks.
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/16/2013 07:12 AM, Borislav Petkov wrote:
> On Mon, Apr 15, 2013 at 09:50:22PM +0800, Alex Shi wrote:
>> For fairness and total threads consideration, powersaving cost quit
>> similar energy on kbuild benchmark, and even better.
>>
>>  17348.850   27400.458  15973.776
>>  13737.493   18487.248  12167.816
> 
> Yeah, but those lines don't look good - powersaving needs more energy
> than performance.
> 
> And what is even crazier is that fixed 1.2 GHz case. I'd guess in
> the normal case those cores are at triple the freq. - i.e. somewhere
> around 3-4 GHz. And yet, 1.2 GHz eats almost *double* the power than
> performance and powersaving.

yes, the max freq is 2.7 GHZ, plus boost.
> 
> So for the x=8 and maybe even the x=16 case we're basically better off
> with performance.
> 
> Or could it be that the power measurements are not really that accurate
> and those numbers above are not really correct?

testing has a little variation, but the power data is quite accurate. I
may change to packing tasks per cpu capacity than current cpu weight.
that should has better power efficient value.

> 
> Hmm.
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Borislav Petkov
On Mon, Apr 15, 2013 at 09:50:22PM +0800, Alex Shi wrote:
> For fairness and total threads consideration, powersaving cost quit
> similar energy on kbuild benchmark, and even better.
> 
>   17348.850   27400.458  15973.776
>   13737.493   18487.248  12167.816

Yeah, but those lines don't look good - powersaving needs more energy
than performance.

And what is even crazier is that fixed 1.2 GHz case. I'd guess in
the normal case those cores are at triple the freq. - i.e. somewhere
around 3-4 GHz. And yet, 1.2 GHz eats almost *double* the power than
performance and powersaving.

So for the x=8 and maybe even the x=16 case we're basically better off
with performance.

Or could it be that the power measurements are not really that accurate
and those numbers above are not really correct?

Hmm.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/15/2013 05:52 PM, Borislav Petkov wrote:
> On Mon, Apr 15, 2013 at 02:16:55PM +0800, Alex Shi wrote:
>> And I need to say again. the powersaving policy just effect on system
>> under utilisation. when system goes busy, it won't has effect.
>> performance oriented policy will take over balance behaviour.
> 
> And AFACU your patches, you do this automatically, right?

Yes
 In which case,
> an underutilized system will have switched to powersaving balancing and
> will use *more* energy to retire the workload. Correct?
> 

For fairness and total threads consideration, powersaving cost quit
similar energy on kbuild benchmark, and even better.

17348.850   27400.458  15973.776
13737.493   18487.248  12167.816
11057.004   16080.750  11623.661

17288.102   27637.176  16560.375
10356.5218482.584  12504.702
10905.772   16190.447  11125.625
10785.621   16113.330  11542.140

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Borislav Petkov
On Mon, Apr 15, 2013 at 02:16:55PM +0800, Alex Shi wrote:
> And I need to say again. the powersaving policy just effect on system
> under utilisation. when system goes busy, it won't has effect.
> performance oriented policy will take over balance behaviour.

And AFACU your patches, you do this automatically, right? In which case,
an underutilized system will have switched to powersaving balancing and
will use *more* energy to retire the workload. Correct?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/15/2013 02:04 PM, Alex Shi wrote:
> On 04/14/2013 11:59 PM, Borislav Petkov wrote:
>> > On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
>>> >> Even some scenario the total energy cost more, at least the avg watts
>>> >> dropped in that scenarios.
>> > 
>> > Ok, what's wrong with x = 32 then? So basically if you're looking at
>> > avg watts, you don't want to have more than 16 threads, otherwise
>> > powersaving sucks on that particular uarch and platform. Can you say
>> > that for all platforms out there?
> The cpu freq boost make the avg watts higher with x = 32, and also make
> higher power efficiency. We can disable cpu freq boost for this if we
> want lower power consumption all time.
> But for my understanding, the power efficient is better way to save power.

BTW, lowest p-state, no freq boost and plus this powersaving policy will
give the lowest power consumption.

And I need to say again. the powersaving policy just effect on system
under utilisation. when system goes busy, it won't has effect.
performance oriented policy will take over balance behaviour.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/14/2013 11:59 PM, Borislav Petkov wrote:
> On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
>> Even some scenario the total energy cost more, at least the avg watts
>> dropped in that scenarios.
> 
> Ok, what's wrong with x = 32 then? So basically if you're looking at
> avg watts, you don't want to have more than 16 threads, otherwise
> powersaving sucks on that particular uarch and platform. Can you say
> that for all platforms out there?

The cpu freq boost make the avg watts higher with x = 32, and also make
higher power efficiency. We can disable cpu freq boost for this if we
want lower power consumption all time.
But for my understanding, the power efficient is better way to save power.
As to other platforms, I'm glad to see any testing or try and give me
results...
> 
> Also, I've added in the columns below the Energy = Power * Time thing.

Thanks. btw the third data of each column is 'performance/watt'. that
shows similar meaning on the other side. :)
> 
> And the funny thing is, exactly there where avg watts is better in
> powersaving, energy for workload retire is worse. And the other way
> around. Basically, avg watts vs retire energy is reciprocal. Great :-\.
> 
>> Len said he has low p-state which can work there. but that's is
>> different. I had sent some data in another email list to show the
>> difference:
>>
>> The following is 2 times kbuild testing result for 3 kinds condiation on
>> SNB EP box, the middle column is the lowest p-state testing result, we
>> can see, it has the lowest power consumption, also has the lowest
>> performance/watts value.
>> At least for kbuild benchmark, powersaving policy has the best
>> compromise on powersaving and power efficient. Further more, due to cpu
>> boost feature, it has better performance in some scenarios.
>>
>>powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
>> x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
>> x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
>> x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86
>>
>> x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
>> x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
>> x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
>> x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86
> 
>   17348.850   27400.458  15973.776
>   13737.493   18487.248  12167.816
>   11057.004   16080.750  11623.661
> 
>   17288.102   27637.176  16560.375
>   10356.5218482.584  12504.702
>   10905.772   16190.447  11125.625
>   10785.621   16113.330  11542.140
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/14/2013 11:59 PM, Borislav Petkov wrote:
 On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
 Even some scenario the total energy cost more, at least the avg watts
 dropped in that scenarios.
 
 Ok, what's wrong with x = 32 then? So basically if you're looking at
 avg watts, you don't want to have more than 16 threads, otherwise
 powersaving sucks on that particular uarch and platform. Can you say
 that for all platforms out there?

The cpu freq boost make the avg watts higher with x = 32, and also make
higher power efficiency. We can disable cpu freq boost for this if we
want lower power consumption all time.
But for my understanding, the power efficient is better way to save power.
As to other platforms, I'm glad to see any testing or try and give me
results...
 
 Also, I've added in the columns below the Energy = Power * Time thing.

Thanks. btw the third data of each column is 'performance/watt'. that
shows similar meaning on the other side. :)
 
 And the funny thing is, exactly there where avg watts is better in
 powersaving, energy for workload retire is worse. And the other way
 around. Basically, avg watts vs retire energy is reciprocal. Great :-\.
 
 Len said he has low p-state which can work there. but that's is
 different. I had sent some data in another email list to show the
 difference:

 The following is 2 times kbuild testing result for 3 kinds condiation on
 SNB EP box, the middle column is the lowest p-state testing result, we
 can see, it has the lowest power consumption, also has the lowest
 performance/watts value.
 At least for kbuild benchmark, powersaving policy has the best
 compromise on powersaving and power efficient. Further more, due to cpu
 boost feature, it has better performance in some scenarios.

powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
 x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
 x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
 x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86

 x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
 x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
 x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
 x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86
 
   17348.850   27400.458  15973.776
   13737.493   18487.248  12167.816
   11057.004   16080.750  11623.661
 
   17288.102   27637.176  16560.375
   10356.5218482.584  12504.702
   10905.772   16190.447  11125.625
   10785.621   16113.330  11542.140
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/15/2013 02:04 PM, Alex Shi wrote:
 On 04/14/2013 11:59 PM, Borislav Petkov wrote:
  On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
  Even some scenario the total energy cost more, at least the avg watts
  dropped in that scenarios.
  
  Ok, what's wrong with x = 32 then? So basically if you're looking at
  avg watts, you don't want to have more than 16 threads, otherwise
  powersaving sucks on that particular uarch and platform. Can you say
  that for all platforms out there?
 The cpu freq boost make the avg watts higher with x = 32, and also make
 higher power efficiency. We can disable cpu freq boost for this if we
 want lower power consumption all time.
 But for my understanding, the power efficient is better way to save power.

BTW, lowest p-state, no freq boost and plus this powersaving policy will
give the lowest power consumption.

And I need to say again. the powersaving policy just effect on system
under utilisation. when system goes busy, it won't has effect.
performance oriented policy will take over balance behaviour.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Borislav Petkov
On Mon, Apr 15, 2013 at 02:16:55PM +0800, Alex Shi wrote:
 And I need to say again. the powersaving policy just effect on system
 under utilisation. when system goes busy, it won't has effect.
 performance oriented policy will take over balance behaviour.

And AFACU your patches, you do this automatically, right? In which case,
an underutilized system will have switched to powersaving balancing and
will use *more* energy to retire the workload. Correct?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/15/2013 05:52 PM, Borislav Petkov wrote:
 On Mon, Apr 15, 2013 at 02:16:55PM +0800, Alex Shi wrote:
 And I need to say again. the powersaving policy just effect on system
 under utilisation. when system goes busy, it won't has effect.
 performance oriented policy will take over balance behaviour.
 
 And AFACU your patches, you do this automatically, right?

Yes
 In which case,
 an underutilized system will have switched to powersaving balancing and
 will use *more* energy to retire the workload. Correct?
 

For fairness and total threads consideration, powersaving cost quit
similar energy on kbuild benchmark, and even better.

17348.850   27400.458  15973.776
13737.493   18487.248  12167.816
11057.004   16080.750  11623.661

17288.102   27637.176  16560.375
10356.5218482.584  12504.702
10905.772   16190.447  11125.625
10785.621   16113.330  11542.140

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Borislav Petkov
On Mon, Apr 15, 2013 at 09:50:22PM +0800, Alex Shi wrote:
 For fairness and total threads consideration, powersaving cost quit
 similar energy on kbuild benchmark, and even better.
 
   17348.850   27400.458  15973.776
   13737.493   18487.248  12167.816

Yeah, but those lines don't look good - powersaving needs more energy
than performance.

And what is even crazier is that fixed 1.2 GHz case. I'd guess in
the normal case those cores are at triple the freq. - i.e. somewhere
around 3-4 GHz. And yet, 1.2 GHz eats almost *double* the power than
performance and powersaving.

So for the x=8 and maybe even the x=16 case we're basically better off
with performance.

Or could it be that the power measurements are not really that accurate
and those numbers above are not really correct?

Hmm.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-15 Thread Alex Shi
On 04/16/2013 07:12 AM, Borislav Petkov wrote:
 On Mon, Apr 15, 2013 at 09:50:22PM +0800, Alex Shi wrote:
 For fairness and total threads consideration, powersaving cost quit
 similar energy on kbuild benchmark, and even better.

  17348.850   27400.458  15973.776
  13737.493   18487.248  12167.816
 
 Yeah, but those lines don't look good - powersaving needs more energy
 than performance.
 
 And what is even crazier is that fixed 1.2 GHz case. I'd guess in
 the normal case those cores are at triple the freq. - i.e. somewhere
 around 3-4 GHz. And yet, 1.2 GHz eats almost *double* the power than
 performance and powersaving.

yes, the max freq is 2.7 GHZ, plus boost.
 
 So for the x=8 and maybe even the x=16 case we're basically better off
 with performance.
 
 Or could it be that the power measurements are not really that accurate
 and those numbers above are not really correct?

testing has a little variation, but the power data is quite accurate. I
may change to packing tasks per cpu capacity than current cpu weight.
that should has better power efficient value.

 
 Hmm.
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-14 Thread Borislav Petkov
On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
> Even some scenario the total energy cost more, at least the avg watts
> dropped in that scenarios.

Ok, what's wrong with x = 32 then? So basically if you're looking at
avg watts, you don't want to have more than 16 threads, otherwise
powersaving sucks on that particular uarch and platform. Can you say
that for all platforms out there?

Also, I've added in the columns below the Energy = Power * Time thing.

And the funny thing is, exactly there where avg watts is better in
powersaving, energy for workload retire is worse. And the other way
around. Basically, avg watts vs retire energy is reciprocal. Great :-\.

> Len said he has low p-state which can work there. but that's is
> different. I had sent some data in another email list to show the
> difference:
> 
> The following is 2 times kbuild testing result for 3 kinds condiation on
> SNB EP box, the middle column is the lowest p-state testing result, we
> can see, it has the lowest power consumption, also has the lowest
> performance/watts value.
> At least for kbuild benchmark, powersaving policy has the best
> compromise on powersaving and power efficient. Further more, due to cpu
> boost feature, it has better performance in some scenarios.
> 
>powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
> x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
> x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
> x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86
> 
> x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
> x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
> x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
> x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86

17348.850   27400.458  15973.776
13737.493   18487.248  12167.816
11057.004   16080.750  11623.661

17288.102   27637.176  16560.375
10356.5218482.584  12504.702
10905.772   16190.447  11125.625
10785.621   16113.330  11542.140

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-14 Thread Borislav Petkov
On Sun, Apr 14, 2013 at 09:28:50AM +0800, Alex Shi wrote:
 Even some scenario the total energy cost more, at least the avg watts
 dropped in that scenarios.

Ok, what's wrong with x = 32 then? So basically if you're looking at
avg watts, you don't want to have more than 16 threads, otherwise
powersaving sucks on that particular uarch and platform. Can you say
that for all platforms out there?

Also, I've added in the columns below the Energy = Power * Time thing.

And the funny thing is, exactly there where avg watts is better in
powersaving, energy for workload retire is worse. And the other way
around. Basically, avg watts vs retire energy is reciprocal. Great :-\.

 Len said he has low p-state which can work there. but that's is
 different. I had sent some data in another email list to show the
 difference:
 
 The following is 2 times kbuild testing result for 3 kinds condiation on
 SNB EP box, the middle column is the lowest p-state testing result, we
 can see, it has the lowest power consumption, also has the lowest
 performance/watts value.
 At least for kbuild benchmark, powersaving policy has the best
 compromise on powersaving and power efficient. Further more, due to cpu
 boost feature, it has better performance in some scenarios.
 
powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
 x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
 x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
 x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86
 
 x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
 x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
 x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
 x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86

17348.850   27400.458  15973.776
13737.493   18487.248  12167.816
11057.004   16080.750  11623.661

17288.102   27637.176  16560.375
10356.5218482.584  12504.702
10905.772   16190.447  11125.625
10785.621   16113.330  11542.140

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/14/2013 09:28 AM, Alex Shi wrote:
 >> > These numbers suggest that this patch series simultaneously
 >> > has a negative impact on performance and energy required
 >> > to retire the workload.  Why do it?
> Even some scenario the total energy cost more, at least the avg watts
> dropped in that scenarios. Len said he has low p-state which can work
> there. but that's is different. I had sent some data in another email
> list to show the difference:
> 
> The following is 2 times kbuild testing result for 3 kinds condiation on
> SNB EP box, the middle column is the lowest p-state testing result, we
> can see, it has the lowest power consumption, also has the lowest
> performance/watts value.
> At least for kbuild benchmark, powersaving policy has the best
> compromise on powersaving and power efficient. Further more, due to cpu
> boost feature, it has better performance in some scenarios.

BTW, another benefit on powersaving is that powersaving policy is very
flexible on system load. when task number in sched domain is beyond LCPU
number, it will take performance oriented balance. That conduct the
similar performance when system is busy.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/13/2013 01:12 AM, Borislav Petkov wrote:
> On Fri, Apr 12, 2013 at 06:48:31PM +0200, Mike Galbraith wrote:
>> (just saying there are other aspects besides joules in there)
> 
> Yeah, but we don't allow any regressions in sched*, do we? Can we pick
> only the good cherries? :-)
> 

Thanks for all of discussion on this threads. :)
I think we can bear a little power efficient lose when want powersaving.

For second question, the performance increase come from cpu boost
feature, the hardware feature diffined, if there are some cores idle in
cpu socket, other core has more chance to boost on higher frequency. The
task packing try to pack tasks so that left more idle cores.

The difficult to merge this feature into current performance is that
current balance policy is trying to give as much as possible cpu
resources to each of task. that just conflict with the cpu boost condition.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/13/2013 12:23 AM, Borislav Petkov wrote:
> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
>> > Thanks a lot for comments, Len!
> AFAICT, you kinda forgot to answer his most important question:
> 
>> > These numbers suggest that this patch series simultaneously
>> > has a negative impact on performance and energy required
>> > to retire the workload.  Why do it?

Even some scenario the total energy cost more, at least the avg watts
dropped in that scenarios. Len said he has low p-state which can work
there. but that's is different. I had sent some data in another email
list to show the difference:

The following is 2 times kbuild testing result for 3 kinds condiation on
SNB EP box, the middle column is the lowest p-state testing result, we
can see, it has the lowest power consumption, also has the lowest
performance/watts value.
At least for kbuild benchmark, powersaving policy has the best
compromise on powersaving and power efficient. Further more, due to cpu
boost feature, it has better performance in some scenarios.

   powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86

x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/13/2013 12:23 AM, Borislav Petkov wrote:
 On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
  Thanks a lot for comments, Len!
 AFAICT, you kinda forgot to answer his most important question:
 
  These numbers suggest that this patch series simultaneously
  has a negative impact on performance and energy required
  to retire the workload.  Why do it?

Even some scenario the total energy cost more, at least the avg watts
dropped in that scenarios. Len said he has low p-state which can work
there. but that's is different. I had sent some data in another email
list to show the difference:

The following is 2 times kbuild testing result for 3 kinds condiation on
SNB EP box, the middle column is the lowest p-state testing result, we
can see, it has the lowest power consumption, also has the lowest
performance/watts value.
At least for kbuild benchmark, powersaving policy has the best
compromise on powersaving and power efficient. Further more, due to cpu
boost feature, it has better performance in some scenarios.

   powersaving + ondemand  userspace + fixed 1.2GHz performance+ondemand
x = 8231.318 /75 57   165.063 /166 36253.552 /63 62
x = 16   280.357 /49 72   174.408 /106 54296.776 /41 82
x = 32   325.206 /34 90   178.675 /90 62 314.153 /37 86

x = 8233.623 /74 57   164.507 /168 36254.775 /65 60
x = 16   272.54  /38 96   174.364 /106 54297.731 /42 79
x = 32   320.758 /34 91   177.917 /91 61 317.875 /35 89
x = 64   326.837 /33 92   179.037 /90 62 320.615 /36 86

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/13/2013 01:12 AM, Borislav Petkov wrote:
 On Fri, Apr 12, 2013 at 06:48:31PM +0200, Mike Galbraith wrote:
 (just saying there are other aspects besides joules in there)
 
 Yeah, but we don't allow any regressions in sched*, do we? Can we pick
 only the good cherries? :-)
 

Thanks for all of discussion on this threads. :)
I think we can bear a little power efficient lose when want powersaving.

For second question, the performance increase come from cpu boost
feature, the hardware feature diffined, if there are some cores idle in
cpu socket, other core has more chance to boost on higher frequency. The
task packing try to pack tasks so that left more idle cores.

The difficult to merge this feature into current performance is that
current balance policy is trying to give as much as possible cpu
resources to each of task. that just conflict with the cpu boost condition.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-13 Thread Alex Shi
On 04/14/2013 09:28 AM, Alex Shi wrote:
   These numbers suggest that this patch series simultaneously
   has a negative impact on performance and energy required
   to retire the workload.  Why do it?
 Even some scenario the total energy cost more, at least the avg watts
 dropped in that scenarios. Len said he has low p-state which can work
 there. but that's is different. I had sent some data in another email
 list to show the difference:
 
 The following is 2 times kbuild testing result for 3 kinds condiation on
 SNB EP box, the middle column is the lowest p-state testing result, we
 can see, it has the lowest power consumption, also has the lowest
 performance/watts value.
 At least for kbuild benchmark, powersaving policy has the best
 compromise on powersaving and power efficient. Further more, due to cpu
 boost feature, it has better performance in some scenarios.

BTW, another benefit on powersaving is that powersaving policy is very
flexible on system load. when task number in sched domain is beyond LCPU
number, it will take performance oriented balance. That conduct the
similar performance when system is busy.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Borislav Petkov
On Fri, Apr 12, 2013 at 06:48:31PM +0200, Mike Galbraith wrote:
> (just saying there are other aspects besides joules in there)

Yeah, but we don't allow any regressions in sched*, do we? Can we pick
only the good cherries? :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Mike Galbraith
On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
> On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
> > Thanks a lot for comments, Len!
> 
> AFAICT, you kinda forgot to answer his most important question:
> 
> > These numbers suggest that this patch series simultaneously
> > has a negative impact on performance and energy required
> > to retire the workload.  Why do it?

Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
throughput increase at the low to moderate load end of the test spectrum
IIRC.  Fully repeatable.  There were also other benefits unrelated to
power, ie mitigation of the evil face of select_idle_sibling().  I
rather liked what I saw during ~big box test-drive.

(just saying there are other aspects besides joules in there)

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Borislav Petkov
On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
> Thanks a lot for comments, Len!

AFAICT, you kinda forgot to answer his most important question:

> These numbers suggest that this patch series simultaneously
> has a negative impact on performance and energy required
> to retire the workload.  Why do it?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Alex Shi
On 04/12/2013 05:02 AM, Len Brown wrote:
>> > x = 16   299.915 /43 77   259.127 /58 66
> Are you sure that powersave mode ran in 43 seconds
> when performance mode ran in 58 seconds?

Thanks a lot for comments, Len!
Will do more testing by your tool fspin. :)

powersaving using less time when thread = 16 or 32.
The main contribution come from CPU freq boost. I have disable the boost
of cpufreq. then find the compile time become similar between
powersaving and performance on thread 32, and powersaving is slower when
threads is 16.
And less Context Switch from less lazy power balance should also do some
help.
> 
> If that is true, than somewhere in this patch series
> you have a _significant_ performance benefit
> on this workload under these conditions!
> 
> Interestingly, powersave mode also ran at
> 15% higher power than performance mode.
> maybe "powersave" isn't quite the right name for it:-)

What other name you suggest? :)
> 
>> > x = 32   341.221 /35 83   323.418 /38 81
> Why does this patch series have a performance impact (8%)
> at x=32.  All the processors are always busy, no?

No, all processors are not always busy in 'make -j vmlinux'
So, compile time also get benefit from boost and less CS. the
performance policy doesn't introduce any impact. there is nothing added
in performance policy.
> 
>> > data explains: 189.416 /228 23
>> >189.416: average Watts during compilation
>> >228: seconds(compile time)
>> >23:  scaled performance/watts = 100 / seconds / watts
>> > The performance value of kbuild is better on threads 16/32, that's due
>> > to lazy power balance reduced the context switch and CPU has more boost 
>> > chance on powersaving balance.
> 25% is a huge difference in performance.
> Can you get a performance benefit in that scenario
> without having a negative performance impact
> in the other scenarios?  In particular,

will try packing task on cpu capacity not cpu weight.
> an 8% hit to the fully utilized case is a deal killer.

that is the 8% gain on powersaving, not 8% lose on performance policy. :)
> 
> The x=16 performance change here suggest there is value
> someplace in this patch series to increase performance.
> However, the case that these scheduling changes are
> a benefit from an energy efficiency point of view
> is yet to be made.


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Alex Shi
On 04/12/2013 05:02 AM, Len Brown wrote:
  x = 16   299.915 /43 77   259.127 /58 66
 Are you sure that powersave mode ran in 43 seconds
 when performance mode ran in 58 seconds?

Thanks a lot for comments, Len!
Will do more testing by your tool fspin. :)

powersaving using less time when thread = 16 or 32.
The main contribution come from CPU freq boost. I have disable the boost
of cpufreq. then find the compile time become similar between
powersaving and performance on thread 32, and powersaving is slower when
threads is 16.
And less Context Switch from less lazy power balance should also do some
help.
 
 If that is true, than somewhere in this patch series
 you have a _significant_ performance benefit
 on this workload under these conditions!
 
 Interestingly, powersave mode also ran at
 15% higher power than performance mode.
 maybe powersave isn't quite the right name for it:-)

What other name you suggest? :)
 
  x = 32   341.221 /35 83   323.418 /38 81
 Why does this patch series have a performance impact (8%)
 at x=32.  All the processors are always busy, no?

No, all processors are not always busy in 'make -j vmlinux'
So, compile time also get benefit from boost and less CS. the
performance policy doesn't introduce any impact. there is nothing added
in performance policy.
 
  data explains: 189.416 /228 23
 189.416: average Watts during compilation
 228: seconds(compile time)
 23:  scaled performance/watts = 100 / seconds / watts
  The performance value of kbuild is better on threads 16/32, that's due
  to lazy power balance reduced the context switch and CPU has more boost 
  chance on powersaving balance.
 25% is a huge difference in performance.
 Can you get a performance benefit in that scenario
 without having a negative performance impact
 in the other scenarios?  In particular,

will try packing task on cpu capacity not cpu weight.
 an 8% hit to the fully utilized case is a deal killer.

that is the 8% gain on powersaving, not 8% lose on performance policy. :)
 
 The x=16 performance change here suggest there is value
 someplace in this patch series to increase performance.
 However, the case that these scheduling changes are
 a benefit from an energy efficiency point of view
 is yet to be made.


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Borislav Petkov
On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
 Thanks a lot for comments, Len!

AFAICT, you kinda forgot to answer his most important question:

 These numbers suggest that this patch series simultaneously
 has a negative impact on performance and energy required
 to retire the workload.  Why do it?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Mike Galbraith
On Fri, 2013-04-12 at 18:23 +0200, Borislav Petkov wrote: 
 On Fri, Apr 12, 2013 at 04:46:50PM +0800, Alex Shi wrote:
  Thanks a lot for comments, Len!
 
 AFAICT, you kinda forgot to answer his most important question:
 
  These numbers suggest that this patch series simultaneously
  has a negative impact on performance and energy required
  to retire the workload.  Why do it?

Hm.  When I tested AIM7 compute on a NUMA box, there was a marked
throughput increase at the low to moderate load end of the test spectrum
IIRC.  Fully repeatable.  There were also other benefits unrelated to
power, ie mitigation of the evil face of select_idle_sibling().  I
rather liked what I saw during ~big box test-drive.

(just saying there are other aspects besides joules in there)

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-12 Thread Borislav Petkov
On Fri, Apr 12, 2013 at 06:48:31PM +0200, Mike Galbraith wrote:
 (just saying there are other aspects besides joules in there)

Yeah, but we don't allow any regressions in sched*, do we? Can we pick
only the good cherries? :-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-11 Thread Len Brown
On 04/03/2013 10:00 PM, Alex Shi wrote:

> As mentioned in the power aware scheduling proposal, Power aware
> scheduling has 2 assumptions:
> 1, race to idle is helpful for power saving
> 2, less active sched groups will reduce cpu power consumption

linux...@vger.kernel.org should be cc:
on Linux proposals that affect power.

> Since the patch can perfect pack tasks into fewer groups, I just show
> some performance/power testing data here:
> =
> $for ((i = 0; i < x; i++)) ; do while true; do :; done  &   done
> 
> On my SNB laptop with 4 core* HT: the data is avg Watts
>  powersaving performance
> x = 8  72.9482 72.6702
> x = 4  61.2737 66.7649
> x = 2  44.8491 59.0679
> x = 1  43.225  43.0638

> on SNB EP machine with 2 sockets * 8 cores * HT:
>  powersaving performance
> x = 32 393.062 395.134
> x = 16 277.438 376.152
> x = 8  209.33  272.398
> x = 4  199 238.309
> x = 2  175.245 210.739
> x = 1  174.264 173.603

The numbers above say nothing about performance,
and thus don't tell us much.

In particular, they don't tell us if reducing power
by hacking the scheduler is more or less efficient
than using the existing techniques that are already shipping,
such as controlling P-states.

> tasks number keep waving benchmark, 'make -j  vmlinux'
> on my SNB EP 2 sockets machine with 8 cores * HT:
>  powersaving  performance
> x = 2189.416 /228 23  193.355 /209 24

Energy = Power * Time

189.416*228 = 43186.848 Joules for powersaving to retire the workload
193.355*209 = 40411.195 Joules for performance to retire the workload.

So the net effect of the 'powersaving' mode here is:
1. 228/209 = 9% performance degradation
2. 43186.848/40411.195 = 6.9 % more energy to retire the workload.

These numbers suggest that this patch series simultaneously
has a negative impact on performance and energy required
to retire the workload.  Why do it?

> x = 4215.728 /132 35  219.69 /122 37

ditto here.
8% increase in time.
6% increase in energy.

> x = 8244.31 /75 54252.709 /68 58

ditto here
10% increase in time.
6% increase in energy.

> x = 16   299.915 /43 77   259.127 /58 66

Are you sure that powersave mode ran in 43 seconds
when performance mode ran in 58 seconds?

If that is true, than somewhere in this patch series
you have a _significant_ performance benefit
on this workload under these conditions!

Interestingly, powersave mode also ran at
15% higher power than performance mode.
maybe "powersave" isn't quite the right name for it:-)

> x = 32   341.221 /35 83   323.418 /38 81

Why does this patch series have a performance impact (8%)
at x=32.  All the processors are always busy, no?

> data explains: 189.416 /228 23
>   189.416: average Watts during compilation
>   228: seconds(compile time)
>   23:  scaled performance/watts = 100 / seconds / watts
> The performance value of kbuild is better on threads 16/32, that's due
> to lazy power balance reduced the context switch and CPU has more boost 
> chance on powersaving balance.

25% is a huge difference in performance.
Can you get a performance benefit in that scenario
without having a negative performance impact
in the other scenarios?  In particular,
an 8% hit to the fully utilized case is a deal killer.

The x=16 performance change here suggest there is value
someplace in this patch series to increase performance.
However, the case that these scheduling changes are
a benefit from an energy efficiency point of view
is yet to be made.

thanks,
-Len Brown
Intel Open Source Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v7 0/21] sched: power aware scheduling

2013-04-11 Thread Len Brown
On 04/03/2013 10:00 PM, Alex Shi wrote:

 As mentioned in the power aware scheduling proposal, Power aware
 scheduling has 2 assumptions:
 1, race to idle is helpful for power saving
 2, less active sched groups will reduce cpu power consumption

linux...@vger.kernel.org should be cc:
on Linux proposals that affect power.

 Since the patch can perfect pack tasks into fewer groups, I just show
 some performance/power testing data here:
 =
 $for ((i = 0; i  x; i++)) ; do while true; do :; done done
 
 On my SNB laptop with 4 core* HT: the data is avg Watts
  powersaving performance
 x = 8  72.9482 72.6702
 x = 4  61.2737 66.7649
 x = 2  44.8491 59.0679
 x = 1  43.225  43.0638

 on SNB EP machine with 2 sockets * 8 cores * HT:
  powersaving performance
 x = 32 393.062 395.134
 x = 16 277.438 376.152
 x = 8  209.33  272.398
 x = 4  199 238.309
 x = 2  175.245 210.739
 x = 1  174.264 173.603

The numbers above say nothing about performance,
and thus don't tell us much.

In particular, they don't tell us if reducing power
by hacking the scheduler is more or less efficient
than using the existing techniques that are already shipping,
such as controlling P-states.

 tasks number keep waving benchmark, 'make -j x vmlinux'
 on my SNB EP 2 sockets machine with 8 cores * HT:
  powersaving  performance
 x = 2189.416 /228 23  193.355 /209 24

Energy = Power * Time

189.416*228 = 43186.848 Joules for powersaving to retire the workload
193.355*209 = 40411.195 Joules for performance to retire the workload.

So the net effect of the 'powersaving' mode here is:
1. 228/209 = 9% performance degradation
2. 43186.848/40411.195 = 6.9 % more energy to retire the workload.

These numbers suggest that this patch series simultaneously
has a negative impact on performance and energy required
to retire the workload.  Why do it?

 x = 4215.728 /132 35  219.69 /122 37

ditto here.
8% increase in time.
6% increase in energy.

 x = 8244.31 /75 54252.709 /68 58

ditto here
10% increase in time.
6% increase in energy.

 x = 16   299.915 /43 77   259.127 /58 66

Are you sure that powersave mode ran in 43 seconds
when performance mode ran in 58 seconds?

If that is true, than somewhere in this patch series
you have a _significant_ performance benefit
on this workload under these conditions!

Interestingly, powersave mode also ran at
15% higher power than performance mode.
maybe powersave isn't quite the right name for it:-)

 x = 32   341.221 /35 83   323.418 /38 81

Why does this patch series have a performance impact (8%)
at x=32.  All the processors are always busy, no?

 data explains: 189.416 /228 23
   189.416: average Watts during compilation
   228: seconds(compile time)
   23:  scaled performance/watts = 100 / seconds / watts
 The performance value of kbuild is better on threads 16/32, that's due
 to lazy power balance reduced the context switch and CPU has more boost 
 chance on powersaving balance.

25% is a huge difference in performance.
Can you get a performance benefit in that scenario
without having a negative performance impact
in the other scenarios?  In particular,
an 8% hit to the fully utilized case is a deal killer.

The x=16 performance change here suggest there is value
someplace in this patch series to increase performance.
However, the case that these scheduling changes are
a benefit from an energy efficiency point of view
is yet to be made.

thanks,
-Len Brown
Intel Open Source Technology Center

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch v7 0/21] sched: power aware scheduling

2013-04-03 Thread Alex Shi
Many many thanks for Namhyung, PJT, Vicent and Preeti's comments and suggestion!
This version included the following changes:
a, remove the patch 3th to recover the runnable load avg recording on rt
b, check avg_idle for each cpu wakeup burst not only the waking CPU.
c, fix select_task_rq_fair return -1 bug by Preeti.

--

This patch set implement/consummate the rough power aware scheduling
proposal: https://lkml.org/lkml/2012/8/13/139.

The code also on this git tree:
https://github.com/alexshi/power-scheduling.git power-scheduling

The patch defines a new policy 'powersaving', that try to pack tasks on
each sched groups level. Then it can save much power when task number in
system is no more than LCPU number.

As mentioned in the power aware scheduling proposal, Power aware
scheduling has 2 assumptions:
1, race to idle is helpful for power saving
2, less active sched groups will reduce cpu power consumption

The first assumption make performance policy take over scheduling when
any group is busy.
The second assumption make power aware scheduling try to pack disperse
tasks into fewer groups.

This feature will cause more cpu cores idle, the give more chances to have
cpu freq boost on active cores. CPU freq boost gives better performance and 
better power efficient. The following kbuild test result show this point.

Compare to the removed power balance, this power balance has the following
advantages:
1, simpler sys interface
only 2 sysfs interface VS 2 interface for each of LCPU
2, cover on all cpu topology 
effect on all domain level VS only work on SMT/MC domain
3, Less task migration 
mutual exclusive perf/power LB VS balance power on balanced performance
4, considered system load threshing 
yes VS no
5, transitory task considered   
yes VS no

BTW, like sched numa, Power aware scheduling is also a kind of cpu
locality oriented scheduling.

Thanks comments/suggestions from PeterZ, Linus Torvalds, Andrew Morton,
Ingo, Len Brown, Arjan, Borislav Petkov, PJT, Namhyung Kim, Mike
Galbraith, Greg, Preeti, Morten Rasmussen, Rafael etc.

Since the patch can perfect pack tasks into fewer groups, I just show
some performance/power testing data here:
=
$for ((i = 0; i < x; i++)) ; do while true; do :; done  &   done

On my SNB laptop with 4 core* HT: the data is avg Watts
 powersaving performance
x = 872.9482 72.6702
x = 461.2737 66.7649
x = 244.8491 59.0679
x = 143.225  43.0638

on SNB EP machine with 2 sockets * 8 cores * HT:
 powersaving performance
x = 32   393.062 395.134
x = 16   277.438 376.152
x = 8209.33  272.398
x = 4199 238.309
x = 2175.245 210.739
x = 1174.264 173.603


tasks number keep waving benchmark, 'make -j  vmlinux'
on my SNB EP 2 sockets machine with 8 cores * HT:
 powersaving  performance
x = 2189.416 /228 23  193.355 /209 24
x = 4215.728 /132 35  219.69 /122 37
x = 8244.31 /75 54252.709 /68 58
x = 16   299.915 /43 77   259.127 /58 66
x = 32   341.221 /35 83   323.418 /38 81

data explains: 189.416 /228 23
189.416: average Watts during compilation
228: seconds(compile time)
23:  scaled performance/watts = 100 / seconds / watts
The performance value of kbuild is better on threads 16/32, that's due
to lazy power balance reduced the context switch and CPU has more boost 
chance on powersaving balance.

Some performance testing results:
-

Tested benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on my core2, nhm, wsm, snb, platforms.

results:
A, no clear performance change found on 'performance' policy.
B, specjbb2005 drop 5~7% on both of policy whenever with openjdk or
   jrockit on powersaving polocy
C, hackbench drops 40% with powersaving policy on snb 4 sockets platforms.
Others has no clear change.

===
Changelog:
V7 change:
a, remove the patch 3th to recover the runnable load avg recording on rt
b, check avg_idle for each cpu wakeup burst not only the waking CPU.
c, fix select_task_rq_fair return -1 bug by Preeti.

Changelog:
V6 change:
a, remove 'balance' policy.
b, consider RT task effect in balancing
c, use avg_idle as burst wakeup indicator
d, balance on task utilization in fork/exec/wakeup.
e, no power balancing on SMT domain.

V5 change:
a, change sched_policy to sched_balance_policy
b, split fork/exec/wake power balancing into 3 patches and refresh
commit logs
c, others minors clean up

V4 change:
a, fix few bugs and clean up code according to Morten Rasmussen, Mike
Galbraith and Namhyung Kim. Thanks!
b, take Morten Rasmussen's suggestion to use different criteria for
different policy in transitory task packing.
c, shorter latency 

[patch v7 0/21] sched: power aware scheduling

2013-04-03 Thread Alex Shi
Many many thanks for Namhyung, PJT, Vicent and Preeti's comments and suggestion!
This version included the following changes:
a, remove the patch 3th to recover the runnable load avg recording on rt
b, check avg_idle for each cpu wakeup burst not only the waking CPU.
c, fix select_task_rq_fair return -1 bug by Preeti.

--

This patch set implement/consummate the rough power aware scheduling
proposal: https://lkml.org/lkml/2012/8/13/139.

The code also on this git tree:
https://github.com/alexshi/power-scheduling.git power-scheduling

The patch defines a new policy 'powersaving', that try to pack tasks on
each sched groups level. Then it can save much power when task number in
system is no more than LCPU number.

As mentioned in the power aware scheduling proposal, Power aware
scheduling has 2 assumptions:
1, race to idle is helpful for power saving
2, less active sched groups will reduce cpu power consumption

The first assumption make performance policy take over scheduling when
any group is busy.
The second assumption make power aware scheduling try to pack disperse
tasks into fewer groups.

This feature will cause more cpu cores idle, the give more chances to have
cpu freq boost on active cores. CPU freq boost gives better performance and 
better power efficient. The following kbuild test result show this point.

Compare to the removed power balance, this power balance has the following
advantages:
1, simpler sys interface
only 2 sysfs interface VS 2 interface for each of LCPU
2, cover on all cpu topology 
effect on all domain level VS only work on SMT/MC domain
3, Less task migration 
mutual exclusive perf/power LB VS balance power on balanced performance
4, considered system load threshing 
yes VS no
5, transitory task considered   
yes VS no

BTW, like sched numa, Power aware scheduling is also a kind of cpu
locality oriented scheduling.

Thanks comments/suggestions from PeterZ, Linus Torvalds, Andrew Morton,
Ingo, Len Brown, Arjan, Borislav Petkov, PJT, Namhyung Kim, Mike
Galbraith, Greg, Preeti, Morten Rasmussen, Rafael etc.

Since the patch can perfect pack tasks into fewer groups, I just show
some performance/power testing data here:
=
$for ((i = 0; i  x; i++)) ; do while true; do :; done done

On my SNB laptop with 4 core* HT: the data is avg Watts
 powersaving performance
x = 872.9482 72.6702
x = 461.2737 66.7649
x = 244.8491 59.0679
x = 143.225  43.0638

on SNB EP machine with 2 sockets * 8 cores * HT:
 powersaving performance
x = 32   393.062 395.134
x = 16   277.438 376.152
x = 8209.33  272.398
x = 4199 238.309
x = 2175.245 210.739
x = 1174.264 173.603


tasks number keep waving benchmark, 'make -j x vmlinux'
on my SNB EP 2 sockets machine with 8 cores * HT:
 powersaving  performance
x = 2189.416 /228 23  193.355 /209 24
x = 4215.728 /132 35  219.69 /122 37
x = 8244.31 /75 54252.709 /68 58
x = 16   299.915 /43 77   259.127 /58 66
x = 32   341.221 /35 83   323.418 /38 81

data explains: 189.416 /228 23
189.416: average Watts during compilation
228: seconds(compile time)
23:  scaled performance/watts = 100 / seconds / watts
The performance value of kbuild is better on threads 16/32, that's due
to lazy power balance reduced the context switch and CPU has more boost 
chance on powersaving balance.

Some performance testing results:
-

Tested benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on my core2, nhm, wsm, snb, platforms.

results:
A, no clear performance change found on 'performance' policy.
B, specjbb2005 drop 5~7% on both of policy whenever with openjdk or
   jrockit on powersaving polocy
C, hackbench drops 40% with powersaving policy on snb 4 sockets platforms.
Others has no clear change.

===
Changelog:
V7 change:
a, remove the patch 3th to recover the runnable load avg recording on rt
b, check avg_idle for each cpu wakeup burst not only the waking CPU.
c, fix select_task_rq_fair return -1 bug by Preeti.

Changelog:
V6 change:
a, remove 'balance' policy.
b, consider RT task effect in balancing
c, use avg_idle as burst wakeup indicator
d, balance on task utilization in fork/exec/wakeup.
e, no power balancing on SMT domain.

V5 change:
a, change sched_policy to sched_balance_policy
b, split fork/exec/wake power balancing into 3 patches and refresh
commit logs
c, others minors clean up

V4 change:
a, fix few bugs and clean up code according to Morten Rasmussen, Mike
Galbraith and Namhyung Kim. Thanks!
b, take Morten Rasmussen's suggestion to use different criteria for
different policy in transitory task packing.
c, shorter latency