Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-17 Thread Rafael J. Wysocki
On Saturday, August 15, 2015 02:48:17 PM Peter Zijlstra wrote:
> 
> So this OPP thing, I think that got mentioned once earlier in this patch
> set, wth is that?

OPP stands for Operating Performance Points.  It is a library for representing
working clock-voltage combinations.

Described in Documentation/power/opp.txt (but may be outdated as there's some
work on it going on).

CC Viresh who's working on it right now.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-17 Thread Vincent Guittot
On 14 August 2015 at 13:39, Juri Lelli  wrote:
> Hi vincent,
>
> On 13/08/15 13:08, Vincent Guittot wrote:
>> On 12 August 2015 at 17:15, Juri Lelli  wrote:
>>> On 11/08/15 17:37, Vincent Guittot wrote:
 On 11 August 2015 at 17:07, Juri Lelli  wrote:
> Hi Vincent,
>
> On 11/08/15 12:41, Vincent Guittot wrote:
>> On 11 August 2015 at 11:08, Juri Lelli  wrote:
>>> On 10/08/15 16:07, Vincent Guittot wrote:
 On 10 August 2015 at 15:43, Juri Lelli  wrote:
>
> Hi Vincent,
>
> On 04/08/15 14:41, Vincent Guittot wrote:
>> Hi Juri,
>>
>> On 7 July 2015 at 20:24, Morten Rasmussen  
>> wrote:
>>> From: Juri Lelli 
>>
>>  [snip]
>>
>>>  }
>>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
>>> struct task_struct *p, int flags)
>>> if (!se) {
>>> sub_nr_running(rq, 1);
>>> update_rq_runnable_avg(rq, 1);
>>> +   /*
>>> +* We want to trigger a freq switch request only 
>>> for tasks that
>>> +* are going to sleep; this is because we get here 
>>> also during
>>> +* load balancing, but in these cases it seems wise 
>>> to trigger
>>> +* as single request after load balancing is done.
>>> +*
>>> +* Also, we add a margin (same ~20% used for the 
>>> tipping point)
>>> +* to our request to provide some head room if p's 
>>> utilization
>>> +* further increases.
>>> +*/
>>> +   if (sched_energy_freq() && task_sleep) {
>>> +   unsigned long req_cap = 
>>> get_cpu_usage(cpu_of(rq));
>>> +
>>> +   req_cap = req_cap * capacity_margin
>>> +   >> SCHED_CAPACITY_SHIFT;
>>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>>
>> Could you clarify why you want to trig a freq switch for tasks that
>> are going to sleep ?
>> The cpu_usage should not changed that much as the se_utilization of
>> the entity moves from utilization_load_avg to utilization_blocked_avg
>> of the rq and the usage and the freq are updated periodically.
>
> I think we still need to cover multiple back-to-back dequeues. Suppose
> that you have, let's say, 3 tasks that get enqueued at the same time.
> After some time the first one goes to sleep and its utilization, as 
> you
> say, gets moved to utilization_blocked_avg. So, nothing changes, and
> the trigger is superfluous (even if no freq change I guess will be
> issued as we are already servicing enough capacity). However, after a
> while, the second task goes to sleep. Now we still use get_cpu_usage()
> and the first task contribution in utilization_blocked_avg should have
> been decayed by this time. Same thing may than happen for the third 
> task
> as well. So, if we don't check if we need to scale down in
> dequeue_task_fair, it seems to me that we might miss some 
> opportunities,
> as blocked contribution of other tasks could have been successively
> decayed.
>
> What you think?

 The tick is used to monitor such variation of the usage (in both way,
 decay of the usage of sleeping tasks and increase of the usage of
 running tasks). So in your example, if the duration between the sleep
 of the 2 tasks is significant enough, the tick will handle this
 variation

>>>
>>> The tick is used to decide if we need to scale up (to max OPP for the
>>> time being), but we don't scale down. It makes more logical sense to
>>
>> why don't you want to check if you need to scale down ?
>>
>
> Well, because if I'm still executing something the cpu usage is only
> subject to raise.

 This is only true for  system with NO HZ idle

>>>
>>> Well, even with !NO_HZ_IDLE usage only decreases when cpu is idle. But,
>>
>> Well, thanks for this obvious statement that usage only decreases when
>> cpu is idle but my question has never been about usage variation of
>> idle/running cpu but about the tick.
>>
>
> I'm sorry if I sounded haughty to you, of course I didn't want too and
> I apologize for that. I just wanted to state the obvious to confirm
> myself that I understood your point, as I say below. :)
>
>>> I think I got your point; for !NO_HZ_IDLE configurations we might end
>>> up not scaling down frequency even if we have the tick running and
>>> the cpu is idle. I might need some more time 

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-15 Thread Michael Turquette
Quoting Peter Zijlstra (2015-08-15 05:48:17)
> 
> 
> So this OPP thing, I think that got mentioned once earlier in this patch
> set, wth is that?

OPP == OPerating Point == P-state

In System-on-chip Land OPP is a very common term, roughly defined as a
frequency & voltage pair that makes up a performance state.

In other words, OPP is the P-state of the non-ACPI world.

Similarly DVFS is sometimes confused as a brand new file system, but it
is also a very standardized acronym amongst SoC vendors meaning
frequency and voltage scaling.

Regards,
Mike

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-15 Thread Peter Zijlstra


So this OPP thing, I think that got mentioned once earlier in this patch
set, wth is that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-14 Thread Juri Lelli
Hi vincent,

On 13/08/15 13:08, Vincent Guittot wrote:
> On 12 August 2015 at 17:15, Juri Lelli  wrote:
>> On 11/08/15 17:37, Vincent Guittot wrote:
>>> On 11 August 2015 at 17:07, Juri Lelli  wrote:
 Hi Vincent,

 On 11/08/15 12:41, Vincent Guittot wrote:
> On 11 August 2015 at 11:08, Juri Lelli  wrote:
>> On 10/08/15 16:07, Vincent Guittot wrote:
>>> On 10 August 2015 at 15:43, Juri Lelli  wrote:

 Hi Vincent,

 On 04/08/15 14:41, Vincent Guittot wrote:
> Hi Juri,
>
> On 7 July 2015 at 20:24, Morten Rasmussen  
> wrote:
>> From: Juri Lelli 
> 
>  [snip]
> 
>>  }
>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
>> struct task_struct *p, int flags)
>> if (!se) {
>> sub_nr_running(rq, 1);
>> update_rq_runnable_avg(rq, 1);
>> +   /*
>> +* We want to trigger a freq switch request only for 
>> tasks that
>> +* are going to sleep; this is because we get here 
>> also during
>> +* load balancing, but in these cases it seems wise 
>> to trigger
>> +* as single request after load balancing is done.
>> +*
>> +* Also, we add a margin (same ~20% used for the 
>> tipping point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && task_sleep) {
>> +   unsigned long req_cap = 
>> get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>
> Could you clarify why you want to trig a freq switch for tasks that
> are going to sleep ?
> The cpu_usage should not changed that much as the se_utilization of
> the entity moves from utilization_load_avg to utilization_blocked_avg
> of the rq and the usage and the freq are updated periodically.

 I think we still need to cover multiple back-to-back dequeues. Suppose
 that you have, let's say, 3 tasks that get enqueued at the same time.
 After some time the first one goes to sleep and its utilization, as you
 say, gets moved to utilization_blocked_avg. So, nothing changes, and
 the trigger is superfluous (even if no freq change I guess will be
 issued as we are already servicing enough capacity). However, after a
 while, the second task goes to sleep. Now we still use get_cpu_usage()
 and the first task contribution in utilization_blocked_avg should have
 been decayed by this time. Same thing may than happen for the third 
 task
 as well. So, if we don't check if we need to scale down in
 dequeue_task_fair, it seems to me that we might miss some 
 opportunities,
 as blocked contribution of other tasks could have been successively
 decayed.

 What you think?
>>>
>>> The tick is used to monitor such variation of the usage (in both way,
>>> decay of the usage of sleeping tasks and increase of the usage of
>>> running tasks). So in your example, if the duration between the sleep
>>> of the 2 tasks is significant enough, the tick will handle this
>>> variation
>>>
>>
>> The tick is used to decide if we need to scale up (to max OPP for the
>> time being), but we don't scale down. It makes more logical sense to
>
> why don't you want to check if you need to scale down ?
>

 Well, because if I'm still executing something the cpu usage is only
 subject to raise.
>>>
>>> This is only true for  system with NO HZ idle
>>>
>>
>> Well, even with !NO_HZ_IDLE usage only decreases when cpu is idle. But,
> 
> Well, thanks for this obvious statement that usage only decreases when
> cpu is idle but my question has never been about usage variation of
> idle/running cpu but about the tick.
> 

I'm sorry if I sounded haughty to you, of course I didn't want too and
I apologize for that. I just wanted to state the obvious to confirm
myself that I understood your point, as I say below. :)

>> I think I got your point; for !NO_HZ_IDLE configurations we might end
>> up not scaling down frequency even if we have the tick running and
>> the cpu is idle. I might need some more time to think this through, but
>> it seems to me that we are still fine without an explicit trigger in
>> task_tick_fair(); if we are running a !NO_HZ_IDLE system we are pro

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-13 Thread Vincent Guittot
On 12 August 2015 at 17:15, Juri Lelli  wrote:
> On 11/08/15 17:37, Vincent Guittot wrote:
>> On 11 August 2015 at 17:07, Juri Lelli  wrote:
>>> Hi Vincent,
>>>
>>> On 11/08/15 12:41, Vincent Guittot wrote:
 On 11 August 2015 at 11:08, Juri Lelli  wrote:
> On 10/08/15 16:07, Vincent Guittot wrote:
>> On 10 August 2015 at 15:43, Juri Lelli  wrote:
>>>
>>> Hi Vincent,
>>>
>>> On 04/08/15 14:41, Vincent Guittot wrote:
 Hi Juri,

 On 7 July 2015 at 20:24, Morten Rasmussen  
 wrote:
> From: Juri Lelli 

 [snip]

>  }
> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
> struct task_struct *p, int flags)
> if (!se) {
> sub_nr_running(rq, 1);
> update_rq_runnable_avg(rq, 1);
> +   /*
> +* We want to trigger a freq switch request only for 
> tasks that
> +* are going to sleep; this is because we get here 
> also during
> +* load balancing, but in these cases it seems wise 
> to trigger
> +* as single request after load balancing is done.
> +*
> +* Also, we add a margin (same ~20% used for the 
> tipping point)
> +* to our request to provide some head room if p's 
> utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && task_sleep) {
> +   unsigned long req_cap = 
> get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);

 Could you clarify why you want to trig a freq switch for tasks that
 are going to sleep ?
 The cpu_usage should not changed that much as the se_utilization of
 the entity moves from utilization_load_avg to utilization_blocked_avg
 of the rq and the usage and the freq are updated periodically.
>>>
>>> I think we still need to cover multiple back-to-back dequeues. Suppose
>>> that you have, let's say, 3 tasks that get enqueued at the same time.
>>> After some time the first one goes to sleep and its utilization, as you
>>> say, gets moved to utilization_blocked_avg. So, nothing changes, and
>>> the trigger is superfluous (even if no freq change I guess will be
>>> issued as we are already servicing enough capacity). However, after a
>>> while, the second task goes to sleep. Now we still use get_cpu_usage()
>>> and the first task contribution in utilization_blocked_avg should have
>>> been decayed by this time. Same thing may than happen for the third task
>>> as well. So, if we don't check if we need to scale down in
>>> dequeue_task_fair, it seems to me that we might miss some opportunities,
>>> as blocked contribution of other tasks could have been successively
>>> decayed.
>>>
>>> What you think?
>>
>> The tick is used to monitor such variation of the usage (in both way,
>> decay of the usage of sleeping tasks and increase of the usage of
>> running tasks). So in your example, if the duration between the sleep
>> of the 2 tasks is significant enough, the tick will handle this
>> variation
>>
>
> The tick is used to decide if we need to scale up (to max OPP for the
> time being), but we don't scale down. It makes more logical sense to

 why don't you want to check if you need to scale down ?

>>>
>>> Well, because if I'm still executing something the cpu usage is only
>>> subject to raise.
>>
>> This is only true for  system with NO HZ idle
>>
>
> Well, even with !NO_HZ_IDLE usage only decreases when cpu is idle. But,

Well, thanks for this obvious statement that usage only decreases when
cpu is idle but my question has never been about usage variation of
idle/running cpu but about the tick.

> I think I got your point; for !NO_HZ_IDLE configurations we might end
> up not scaling down frequency even if we have the tick running and
> the cpu is idle. I might need some more time to think this through, but
> it seems to me that we are still fine without an explicit trigger in
> task_tick_fair(); if we are running a !NO_HZ_IDLE system we are probably
> not so much concerned about power savings and still we react
> to tasks waking up, sleeping, leaving or moving around (which seems the
> real important events to me); OTOH, we might add that trigger, but this
> will generate unconditional checks at tick time for NO_HZ_IDLE

That will be far less critical than unconditionally check during all
task wake up or sleep. A task t

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-12 Thread Juri Lelli
On 11/08/15 17:37, Vincent Guittot wrote:
> On 11 August 2015 at 17:07, Juri Lelli  wrote:
>> Hi Vincent,
>>
>> On 11/08/15 12:41, Vincent Guittot wrote:
>>> On 11 August 2015 at 11:08, Juri Lelli  wrote:
 On 10/08/15 16:07, Vincent Guittot wrote:
> On 10 August 2015 at 15:43, Juri Lelli  wrote:
>>
>> Hi Vincent,
>>
>> On 04/08/15 14:41, Vincent Guittot wrote:
>>> Hi Juri,
>>>
>>> On 7 July 2015 at 20:24, Morten Rasmussen  
>>> wrote:
 From: Juri Lelli 

 Each time a task is {en,de}queued we might need to adapt the current
 frequency to the new usage. Add triggers on {en,de}queue_task_fair() 
 for
 this purpose.  Only trigger a freq request if we are effectively 
 waking up
 or going to sleep.  Filter out load balancing related calls to reduce 
 the
 number of triggers.

 cc: Ingo Molnar 
 cc: Peter Zijlstra 

 Signed-off-by: Juri Lelli 
 ---
  kernel/sched/fair.c | 42 --
  1 file changed, 40 insertions(+), 2 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index f74e9d2..b8627c6 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
  }
  #endif

 +static unsigned int capacity_margin = 1280; /* ~20% margin */
 +
  static bool cpu_overutilized(int cpu);
 +static unsigned long get_cpu_usage(int cpu);
  struct static_key __sched_energy_freq __read_mostly = 
 STATIC_KEY_INIT_FALSE;

  /*
 @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct 
 task_struct *p, int flags)
 if (!task_new && !rq->rd->overutilized &&
 cpu_overutilized(rq->cpu))
 rq->rd->overutilized = true;
 +   /*
 +* We want to trigger a freq switch request only for 
 tasks that
 +* are waking up; this is because we get here also 
 during
 +* load balancing, but in these cases it seems wise to 
 trigger
 +* as single request after load balancing is done.
 +*
 +* XXX: how about fork()? Do we need a special 
 flag/something
 +*  to tell if we are here after a fork() 
 (wakeup_task_new)?
 +*
 +* Also, we add a margin (same ~20% used for the 
 tipping point)
 +* to our request to provide some head room if p's 
 utilization
 +* further increases.
 +*/
 +   if (sched_energy_freq() && !task_new) {
 +   unsigned long req_cap = 
 get_cpu_usage(cpu_of(rq));
 +
 +   req_cap = req_cap * capacity_margin
 +   >> SCHED_CAPACITY_SHIFT;
 +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
 +   }
 }
 hrtick_update(rq);
  }
 @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
 struct task_struct *p, int flags)
 if (!se) {
 sub_nr_running(rq, 1);
 update_rq_runnable_avg(rq, 1);
 +   /*
 +* We want to trigger a freq switch request only for 
 tasks that
 +* are going to sleep; this is because we get here 
 also during
 +* load balancing, but in these cases it seems wise to 
 trigger
 +* as single request after load balancing is done.
 +*
 +* Also, we add a margin (same ~20% used for the 
 tipping point)
 +* to our request to provide some head room if p's 
 utilization
 +* further increases.
 +*/
 +   if (sched_energy_freq() && task_sleep) {
 +   unsigned long req_cap = 
 get_cpu_usage(cpu_of(rq));
 +
 +   req_cap = req_cap * capacity_margin
 +   >> SCHED_CAPACITY_SHIFT;
 +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>>>
>>> Could you clarify why you want to trig a freq switch for tasks that
>>> are going to sleep ?
>>> The cpu_usage should not changed that much as the se_utilization of
>>

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-11 Thread Vincent Guittot
On 11 August 2015 at 17:07, Juri Lelli  wrote:
> Hi Vincent,
>
> On 11/08/15 12:41, Vincent Guittot wrote:
>> On 11 August 2015 at 11:08, Juri Lelli  wrote:
>>> On 10/08/15 16:07, Vincent Guittot wrote:
 On 10 August 2015 at 15:43, Juri Lelli  wrote:
>
> Hi Vincent,
>
> On 04/08/15 14:41, Vincent Guittot wrote:
>> Hi Juri,
>>
>> On 7 July 2015 at 20:24, Morten Rasmussen  
>> wrote:
>>> From: Juri Lelli 
>>>
>>> Each time a task is {en,de}queued we might need to adapt the current
>>> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
>>> this purpose.  Only trigger a freq request if we are effectively waking 
>>> up
>>> or going to sleep.  Filter out load balancing related calls to reduce 
>>> the
>>> number of triggers.
>>>
>>> cc: Ingo Molnar 
>>> cc: Peter Zijlstra 
>>>
>>> Signed-off-by: Juri Lelli 
>>> ---
>>>  kernel/sched/fair.c | 42 --
>>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index f74e9d2..b8627c6 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>>>  }
>>>  #endif
>>>
>>> +static unsigned int capacity_margin = 1280; /* ~20% margin */
>>> +
>>>  static bool cpu_overutilized(int cpu);
>>> +static unsigned long get_cpu_usage(int cpu);
>>>  struct static_key __sched_energy_freq __read_mostly = 
>>> STATIC_KEY_INIT_FALSE;
>>>
>>>  /*
>>> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct 
>>> task_struct *p, int flags)
>>> if (!task_new && !rq->rd->overutilized &&
>>> cpu_overutilized(rq->cpu))
>>> rq->rd->overutilized = true;
>>> +   /*
>>> +* We want to trigger a freq switch request only for 
>>> tasks that
>>> +* are waking up; this is because we get here also 
>>> during
>>> +* load balancing, but in these cases it seems wise to 
>>> trigger
>>> +* as single request after load balancing is done.
>>> +*
>>> +* XXX: how about fork()? Do we need a special 
>>> flag/something
>>> +*  to tell if we are here after a fork() 
>>> (wakeup_task_new)?
>>> +*
>>> +* Also, we add a margin (same ~20% used for the 
>>> tipping point)
>>> +* to our request to provide some head room if p's 
>>> utilization
>>> +* further increases.
>>> +*/
>>> +   if (sched_energy_freq() && !task_new) {
>>> +   unsigned long req_cap = 
>>> get_cpu_usage(cpu_of(rq));
>>> +
>>> +   req_cap = req_cap * capacity_margin
>>> +   >> SCHED_CAPACITY_SHIFT;
>>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>>> +   }
>>> }
>>> hrtick_update(rq);
>>>  }
>>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
>>> struct task_struct *p, int flags)
>>> if (!se) {
>>> sub_nr_running(rq, 1);
>>> update_rq_runnable_avg(rq, 1);
>>> +   /*
>>> +* We want to trigger a freq switch request only for 
>>> tasks that
>>> +* are going to sleep; this is because we get here also 
>>> during
>>> +* load balancing, but in these cases it seems wise to 
>>> trigger
>>> +* as single request after load balancing is done.
>>> +*
>>> +* Also, we add a margin (same ~20% used for the 
>>> tipping point)
>>> +* to our request to provide some head room if p's 
>>> utilization
>>> +* further increases.
>>> +*/
>>> +   if (sched_energy_freq() && task_sleep) {
>>> +   unsigned long req_cap = 
>>> get_cpu_usage(cpu_of(rq));
>>> +
>>> +   req_cap = req_cap * capacity_margin
>>> +   >> SCHED_CAPACITY_SHIFT;
>>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>>
>> Could you clarify why you want to trig a freq switch for tasks that
>> are going to sleep ?
>> The cpu_usage should not changed that much as the se_utilization of
>> the entity moves from utilization_load_avg to utilization_blocked_avg
>> of the rq and the usage and the freq are updated periodically.
>
> I think we s

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-11 Thread Juri Lelli
Hi Vincent,

On 11/08/15 12:41, Vincent Guittot wrote:
> On 11 August 2015 at 11:08, Juri Lelli  wrote:
>> On 10/08/15 16:07, Vincent Guittot wrote:
>>> On 10 August 2015 at 15:43, Juri Lelli  wrote:

 Hi Vincent,

 On 04/08/15 14:41, Vincent Guittot wrote:
> Hi Juri,
>
> On 7 July 2015 at 20:24, Morten Rasmussen  
> wrote:
>> From: Juri Lelli 
>>
>> Each time a task is {en,de}queued we might need to adapt the current
>> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
>> this purpose.  Only trigger a freq request if we are effectively waking 
>> up
>> or going to sleep.  Filter out load balancing related calls to reduce the
>> number of triggers.
>>
>> cc: Ingo Molnar 
>> cc: Peter Zijlstra 
>>
>> Signed-off-by: Juri Lelli 
>> ---
>>  kernel/sched/fair.c | 42 --
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f74e9d2..b8627c6 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>>  }
>>  #endif
>>
>> +static unsigned int capacity_margin = 1280; /* ~20% margin */
>> +
>>  static bool cpu_overutilized(int cpu);
>> +static unsigned long get_cpu_usage(int cpu);
>>  struct static_key __sched_energy_freq __read_mostly = 
>> STATIC_KEY_INIT_FALSE;
>>
>>  /*
>> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct 
>> task_struct *p, int flags)
>> if (!task_new && !rq->rd->overutilized &&
>> cpu_overutilized(rq->cpu))
>> rq->rd->overutilized = true;
>> +   /*
>> +* We want to trigger a freq switch request only for 
>> tasks that
>> +* are waking up; this is because we get here also during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* XXX: how about fork()? Do we need a special 
>> flag/something
>> +*  to tell if we are here after a fork() 
>> (wakeup_task_new)?
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && !task_new) {
>> +   unsigned long req_cap = 
>> get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>> +   }
>> }
>> hrtick_update(rq);
>>  }
>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
>> struct task_struct *p, int flags)
>> if (!se) {
>> sub_nr_running(rq, 1);
>> update_rq_runnable_avg(rq, 1);
>> +   /*
>> +* We want to trigger a freq switch request only for 
>> tasks that
>> +* are going to sleep; this is because we get here also 
>> during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && task_sleep) {
>> +   unsigned long req_cap = 
>> get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>
> Could you clarify why you want to trig a freq switch for tasks that
> are going to sleep ?
> The cpu_usage should not changed that much as the se_utilization of
> the entity moves from utilization_load_avg to utilization_blocked_avg
> of the rq and the usage and the freq are updated periodically.

 I think we still need to cover multiple back-to-back dequeues. Suppose
 that you have, let's say, 3 tasks that get enqueued at the same time.
 After some time the first one goes to sleep a

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-11 Thread Vincent Guittot
On 11 August 2015 at 11:08, Juri Lelli  wrote:
> On 10/08/15 16:07, Vincent Guittot wrote:
>> On 10 August 2015 at 15:43, Juri Lelli  wrote:
>>>
>>> Hi Vincent,
>>>
>>> On 04/08/15 14:41, Vincent Guittot wrote:
 Hi Juri,

 On 7 July 2015 at 20:24, Morten Rasmussen  wrote:
> From: Juri Lelli 
>
> Each time a task is {en,de}queued we might need to adapt the current
> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
> this purpose.  Only trigger a freq request if we are effectively waking up
> or going to sleep.  Filter out load balancing related calls to reduce the
> number of triggers.
>
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
>
> Signed-off-by: Juri Lelli 
> ---
>  kernel/sched/fair.c | 42 --
>  1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f74e9d2..b8627c6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>  }
>  #endif
>
> +static unsigned int capacity_margin = 1280; /* ~20% margin */
> +
>  static bool cpu_overutilized(int cpu);
> +static unsigned long get_cpu_usage(int cpu);
>  struct static_key __sched_energy_freq __read_mostly = 
> STATIC_KEY_INIT_FALSE;
>
>  /*
> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct 
> task_struct *p, int flags)
> if (!task_new && !rq->rd->overutilized &&
> cpu_overutilized(rq->cpu))
> rq->rd->overutilized = true;
> +   /*
> +* We want to trigger a freq switch request only for 
> tasks that
> +* are waking up; this is because we get here also during
> +* load balancing, but in these cases it seems wise to 
> trigger
> +* as single request after load balancing is done.
> +*
> +* XXX: how about fork()? Do we need a special 
> flag/something
> +*  to tell if we are here after a fork() 
> (wakeup_task_new)?
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's 
> utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && !task_new) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> +   }
> }
> hrtick_update(rq);
>  }
> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, 
> struct task_struct *p, int flags)
> if (!se) {
> sub_nr_running(rq, 1);
> update_rq_runnable_avg(rq, 1);
> +   /*
> +* We want to trigger a freq switch request only for 
> tasks that
> +* are going to sleep; this is because we get here also 
> during
> +* load balancing, but in these cases it seems wise to 
> trigger
> +* as single request after load balancing is done.
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's 
> utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && task_sleep) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);

 Could you clarify why you want to trig a freq switch for tasks that
 are going to sleep ?
 The cpu_usage should not changed that much as the se_utilization of
 the entity moves from utilization_load_avg to utilization_blocked_avg
 of the rq and the usage and the freq are updated periodically.
>>>
>>> I think we still need to cover multiple back-to-back dequeues. Suppose
>>> that you have, let's say, 3 tasks that get enqueued at the same time.
>>> After some time the first one goes to sleep and its utilization, as you
>>> say, gets moved to utilization_blocked_avg. So, nothing changes, and
>>> the trigger is superfluous (even if no freq change I guess will be
>>> issued as we are already

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-11 Thread Juri Lelli
On 10/08/15 16:07, Vincent Guittot wrote:
> On 10 August 2015 at 15:43, Juri Lelli  wrote:
>>
>> Hi Vincent,
>>
>> On 04/08/15 14:41, Vincent Guittot wrote:
>>> Hi Juri,
>>>
>>> On 7 July 2015 at 20:24, Morten Rasmussen  wrote:
 From: Juri Lelli 

 Each time a task is {en,de}queued we might need to adapt the current
 frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
 this purpose.  Only trigger a freq request if we are effectively waking up
 or going to sleep.  Filter out load balancing related calls to reduce the
 number of triggers.

 cc: Ingo Molnar 
 cc: Peter Zijlstra 

 Signed-off-by: Juri Lelli 
 ---
  kernel/sched/fair.c | 42 --
  1 file changed, 40 insertions(+), 2 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index f74e9d2..b8627c6 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
  }
  #endif

 +static unsigned int capacity_margin = 1280; /* ~20% margin */
 +
  static bool cpu_overutilized(int cpu);
 +static unsigned long get_cpu_usage(int cpu);
  struct static_key __sched_energy_freq __read_mostly = 
 STATIC_KEY_INIT_FALSE;

  /*
 @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
 *p, int flags)
 if (!task_new && !rq->rd->overutilized &&
 cpu_overutilized(rq->cpu))
 rq->rd->overutilized = true;
 +   /*
 +* We want to trigger a freq switch request only for tasks 
 that
 +* are waking up; this is because we get here also during
 +* load balancing, but in these cases it seems wise to 
 trigger
 +* as single request after load balancing is done.
 +*
 +* XXX: how about fork()? Do we need a special 
 flag/something
 +*  to tell if we are here after a fork() 
 (wakeup_task_new)?
 +*
 +* Also, we add a margin (same ~20% used for the tipping 
 point)
 +* to our request to provide some head room if p's 
 utilization
 +* further increases.
 +*/
 +   if (sched_energy_freq() && !task_new) {
 +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
 +
 +   req_cap = req_cap * capacity_margin
 +   >> SCHED_CAPACITY_SHIFT;
 +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
 +   }
 }
 hrtick_update(rq);
  }
 @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
 task_struct *p, int flags)
 if (!se) {
 sub_nr_running(rq, 1);
 update_rq_runnable_avg(rq, 1);
 +   /*
 +* We want to trigger a freq switch request only for tasks 
 that
 +* are going to sleep; this is because we get here also 
 during
 +* load balancing, but in these cases it seems wise to 
 trigger
 +* as single request after load balancing is done.
 +*
 +* Also, we add a margin (same ~20% used for the tipping 
 point)
 +* to our request to provide some head room if p's 
 utilization
 +* further increases.
 +*/
 +   if (sched_energy_freq() && task_sleep) {
 +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
 +
 +   req_cap = req_cap * capacity_margin
 +   >> SCHED_CAPACITY_SHIFT;
 +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>>>
>>> Could you clarify why you want to trig a freq switch for tasks that
>>> are going to sleep ?
>>> The cpu_usage should not changed that much as the se_utilization of
>>> the entity moves from utilization_load_avg to utilization_blocked_avg
>>> of the rq and the usage and the freq are updated periodically.
>>
>> I think we still need to cover multiple back-to-back dequeues. Suppose
>> that you have, let's say, 3 tasks that get enqueued at the same time.
>> After some time the first one goes to sleep and its utilization, as you
>> say, gets moved to utilization_blocked_avg. So, nothing changes, and
>> the trigger is superfluous (even if no freq change I guess will be
>> issued as we are already servicing enough capacity). However, after a
>> while, the second task goes to sleep. Now we still use get_cpu_usage()
>> and the first task contribution in util

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-10 Thread Vincent Guittot
On 10 August 2015 at 15:43, Juri Lelli  wrote:
>
> Hi Vincent,
>
> On 04/08/15 14:41, Vincent Guittot wrote:
> > Hi Juri,
> >
> > On 7 July 2015 at 20:24, Morten Rasmussen  wrote:
> >> From: Juri Lelli 
> >>
> >> Each time a task is {en,de}queued we might need to adapt the current
> >> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
> >> this purpose.  Only trigger a freq request if we are effectively waking up
> >> or going to sleep.  Filter out load balancing related calls to reduce the
> >> number of triggers.
> >>
> >> cc: Ingo Molnar 
> >> cc: Peter Zijlstra 
> >>
> >> Signed-off-by: Juri Lelli 
> >> ---
> >>  kernel/sched/fair.c | 42 --
> >>  1 file changed, 40 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index f74e9d2..b8627c6 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
> >>  }
> >>  #endif
> >>
> >> +static unsigned int capacity_margin = 1280; /* ~20% margin */
> >> +
> >>  static bool cpu_overutilized(int cpu);
> >> +static unsigned long get_cpu_usage(int cpu);
> >>  struct static_key __sched_energy_freq __read_mostly = 
> >> STATIC_KEY_INIT_FALSE;
> >>
> >>  /*
> >> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
> >> *p, int flags)
> >> if (!task_new && !rq->rd->overutilized &&
> >> cpu_overutilized(rq->cpu))
> >> rq->rd->overutilized = true;
> >> +   /*
> >> +* We want to trigger a freq switch request only for tasks 
> >> that
> >> +* are waking up; this is because we get here also during
> >> +* load balancing, but in these cases it seems wise to 
> >> trigger
> >> +* as single request after load balancing is done.
> >> +*
> >> +* XXX: how about fork()? Do we need a special 
> >> flag/something
> >> +*  to tell if we are here after a fork() 
> >> (wakeup_task_new)?
> >> +*
> >> +* Also, we add a margin (same ~20% used for the tipping 
> >> point)
> >> +* to our request to provide some head room if p's 
> >> utilization
> >> +* further increases.
> >> +*/
> >> +   if (sched_energy_freq() && !task_new) {
> >> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> >> +
> >> +   req_cap = req_cap * capacity_margin
> >> +   >> SCHED_CAPACITY_SHIFT;
> >> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> >> +   }
> >> }
> >> hrtick_update(rq);
> >>  }
> >> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
> >> task_struct *p, int flags)
> >> if (!se) {
> >> sub_nr_running(rq, 1);
> >> update_rq_runnable_avg(rq, 1);
> >> +   /*
> >> +* We want to trigger a freq switch request only for tasks 
> >> that
> >> +* are going to sleep; this is because we get here also 
> >> during
> >> +* load balancing, but in these cases it seems wise to 
> >> trigger
> >> +* as single request after load balancing is done.
> >> +*
> >> +* Also, we add a margin (same ~20% used for the tipping 
> >> point)
> >> +* to our request to provide some head room if p's 
> >> utilization
> >> +* further increases.
> >> +*/
> >> +   if (sched_energy_freq() && task_sleep) {
> >> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> >> +
> >> +   req_cap = req_cap * capacity_margin
> >> +   >> SCHED_CAPACITY_SHIFT;
> >> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> >
> > Could you clarify why you want to trig a freq switch for tasks that
> > are going to sleep ?
> > The cpu_usage should not changed that much as the se_utilization of
> > the entity moves from utilization_load_avg to utilization_blocked_avg
> > of the rq and the usage and the freq are updated periodically.
>
> I think we still need to cover multiple back-to-back dequeues. Suppose
> that you have, let's say, 3 tasks that get enqueued at the same time.
> After some time the first one goes to sleep and its utilization, as you
> say, gets moved to utilization_blocked_avg. So, nothing changes, and
> the trigger is superfluous (even if no freq change I guess will be
> issued as we are already servicing enough capacity). However, after a
> while, the second task goes to sleep. Now we still use get_cpu_usage()
> and the first task contribution in utilization_blocked_avg should have
> been decayed by this ti

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-10 Thread Juri Lelli
Hi Vincent,

On 04/08/15 14:41, Vincent Guittot wrote:
> Hi Juri,
> 
> On 7 July 2015 at 20:24, Morten Rasmussen  wrote:
>> From: Juri Lelli 
>>
>> Each time a task is {en,de}queued we might need to adapt the current
>> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
>> this purpose.  Only trigger a freq request if we are effectively waking up
>> or going to sleep.  Filter out load balancing related calls to reduce the
>> number of triggers.
>>
>> cc: Ingo Molnar 
>> cc: Peter Zijlstra 
>>
>> Signed-off-by: Juri Lelli 
>> ---
>>  kernel/sched/fair.c | 42 --
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f74e9d2..b8627c6 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>>  }
>>  #endif
>>
>> +static unsigned int capacity_margin = 1280; /* ~20% margin */
>> +
>>  static bool cpu_overutilized(int cpu);
>> +static unsigned long get_cpu_usage(int cpu);
>>  struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
>>
>>  /*
>> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
>> *p, int flags)
>> if (!task_new && !rq->rd->overutilized &&
>> cpu_overutilized(rq->cpu))
>> rq->rd->overutilized = true;
>> +   /*
>> +* We want to trigger a freq switch request only for tasks 
>> that
>> +* are waking up; this is because we get here also during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* XXX: how about fork()? Do we need a special flag/something
>> +*  to tell if we are here after a fork() 
>> (wakeup_task_new)?
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && !task_new) {
>> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>> +   }
>> }
>> hrtick_update(rq);
>>  }
>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
>> task_struct *p, int flags)
>> if (!se) {
>> sub_nr_running(rq, 1);
>> update_rq_runnable_avg(rq, 1);
>> +   /*
>> +* We want to trigger a freq switch request only for tasks 
>> that
>> +* are going to sleep; this is because we get here also 
>> during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && task_sleep) {
>> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> 
> Could you clarify why you want to trig a freq switch for tasks that
> are going to sleep ?
> The cpu_usage should not changed that much as the se_utilization of
> the entity moves from utilization_load_avg to utilization_blocked_avg
> of the rq and the usage and the freq are updated periodically.

I think we still need to cover multiple back-to-back dequeues. Suppose
that you have, let's say, 3 tasks that get enqueued at the same time.
After some time the first one goes to sleep and its utilization, as you
say, gets moved to utilization_blocked_avg. So, nothing changes, and
the trigger is superfluous (even if no freq change I guess will be
issued as we are already servicing enough capacity). However, after a
while, the second task goes to sleep. Now we still use get_cpu_usage()
and the first task contribution in utilization_blocked_avg should have
been decayed by this time. Same thing may than happen for the third task
as well. So, if we don't check if we need to scale down in
dequeue_task_fair, it seems to me that we might miss some opportunities,
as blocked contribution of other tasks could have been successively
decayed.

What you think?

Than

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-08-04 Thread Vincent Guittot
Hi Juri,

On 7 July 2015 at 20:24, Morten Rasmussen  wrote:
> From: Juri Lelli 
>
> Each time a task is {en,de}queued we might need to adapt the current
> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
> this purpose.  Only trigger a freq request if we are effectively waking up
> or going to sleep.  Filter out load balancing related calls to reduce the
> number of triggers.
>
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
>
> Signed-off-by: Juri Lelli 
> ---
>  kernel/sched/fair.c | 42 --
>  1 file changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f74e9d2..b8627c6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>  }
>  #endif
>
> +static unsigned int capacity_margin = 1280; /* ~20% margin */
> +
>  static bool cpu_overutilized(int cpu);
> +static unsigned long get_cpu_usage(int cpu);
>  struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
>
>  /*
> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
> *p, int flags)
> if (!task_new && !rq->rd->overutilized &&
> cpu_overutilized(rq->cpu))
> rq->rd->overutilized = true;
> +   /*
> +* We want to trigger a freq switch request only for tasks 
> that
> +* are waking up; this is because we get here also during
> +* load balancing, but in these cases it seems wise to trigger
> +* as single request after load balancing is done.
> +*
> +* XXX: how about fork()? Do we need a special flag/something
> +*  to tell if we are here after a fork() 
> (wakeup_task_new)?
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && !task_new) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> +   }
> }
> hrtick_update(rq);
>  }
> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
> task_struct *p, int flags)
> if (!se) {
> sub_nr_running(rq, 1);
> update_rq_runnable_avg(rq, 1);
> +   /*
> +* We want to trigger a freq switch request only for tasks 
> that
> +* are going to sleep; this is because we get here also during
> +* load balancing, but in these cases it seems wise to trigger
> +* as single request after load balancing is done.
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && task_sleep) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);

Could you clarify why you want to trig a freq switch for tasks that
are going to sleep ?
The cpu_usage should not changed that much as the se_utilization of
the entity moves from utilization_load_avg to utilization_blocked_avg
of the rq and the usage and the freq are updated periodically.
It should be the same for the wake up of a task in enqueue_task_fair
above, even if it's less obvious for this latter use case because the
cpu might wake up from a long idle phase during which its
utilization_blocked_avg has not been updated. Nevertheless, a trig of
the freq switch at wake up of the cpu once its usage has been updated
should do the job.

So tick, migration of tasks, new tasks, entering/leaving idle state of
cpu should be enough to trig freq switch

Regards,
Vincent


> +   }
> }
> hrtick_update(rq);
>  }
> @@ -4959,8 +4999,6 @@ static int find_new_capacity(struct energy_env *eenv,
> return idx;
>  }
>
> -static unsigned int capacity_margin = 1280; /* ~20% margin */
> -
>  static bool cpu_overutilized(int cpu)
>  {
> return (capacity_of(cpu) * 1024) <
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-07-09 Thread Juri Lelli
Hi Mike,

On 08/07/15 16:42, Michael Turquette wrote:
> Hi Juri,
> 
> Quoting Morten Rasmussen (2015-07-07 11:24:24)
>> From: Juri Lelli 
>>
>> Each time a task is {en,de}queued we might need to adapt the current
>> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
>> this purpose.  Only trigger a freq request if we are effectively waking up
>> or going to sleep.  Filter out load balancing related calls to reduce the
>> number of triggers.
>>
>> cc: Ingo Molnar 
>> cc: Peter Zijlstra 
>>
>> Signed-off-by: Juri Lelli 
>> ---
>>  kernel/sched/fair.c | 42 --
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f74e9d2..b8627c6 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>>  }
>>  #endif
>>  
>> +static unsigned int capacity_margin = 1280; /* ~20% margin */
> 
> This is a 25% margin. Calling it ~20% is a bit misleading :)
> 

Well, 1024 is what you get if your remove 20% to 1280. But, I
confess it wasn't clear to me too at first sight ;). Anyway,
you are right that the way I use it below, you end up adding
25% to req_cap. It is just because I didn't want to add another
margin I guess. :)

> Should margin be scaled for cpus that do not have max capacity == 1024?
> In other words, should margin be dynamically calculated to be 20% of
> *this* cpu's max capacity?
> 
> I'm imagining a corner case where a heterogeneous cpu system is set up
> in such a way that adding margin that is hard-coded to 25% of 1024
> almost always puts req_cap to the highest frequency, skipping some
> reasonable capacity states in between.
> 

But, what below should actually ask for a 25% more related to the
current cpu usage. So, if you have let's say a usage of 300 (this
is both cpu and freq scaled) when you do what below you get:

  300 * 1280 / 1024 = 375

and 375 is 300 + 25%. It is the ratio between capacity_margin and
SCHED_CAPACITY_SCALE that gives you a percentage relative to cpu usage.
Or did I get it wrong?

>> +
>>  static bool cpu_overutilized(int cpu);
>> +static unsigned long get_cpu_usage(int cpu);
>>  struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
>>  
>>  /*
>> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
>> *p, int flags)
>> if (!task_new && !rq->rd->overutilized &&
>> cpu_overutilized(rq->cpu))
>> rq->rd->overutilized = true;
>> +   /*
>> +* We want to trigger a freq switch request only for tasks 
>> that
>> +* are waking up; this is because we get here also during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* XXX: how about fork()? Do we need a special flag/something
>> +*  to tell if we are here after a fork() 
>> (wakeup_task_new)?
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> point)
>> +* to our request to provide some head room if p's 
>> utilization
>> +* further increases.
>> +*/
>> +   if (sched_energy_freq() && !task_new) {
>> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
>> +
>> +   req_cap = req_cap * capacity_margin
>> +   >> SCHED_CAPACITY_SHIFT;
> 
> Probably a dumb question:
> 
> Can we "cheat" here and just assume that capacity and load use the same
> units? That would avoid the multiplication and change your code to the
> following:
> 
>   #define capacity_margin SCHED_CAPACITY_SCALE >> 2; /* 25% */
>   req_cap += SCHED_CAPACITY_SCALE;
> 

I'd rather stick with an increase relative to the current usage
as opposed to adding 256 to every request. I fear that the latter
would end up cutting out some OPPs entirely, as you were saying above.

>> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>> +   }
>> }
>> hrtick_update(rq);
>>  }
>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
>> task_struct *p, int flags)
>> if (!se) {
>> sub_nr_running(rq, 1);
>> update_rq_runnable_avg(rq, 1);
>> +   /*
>> +* We want to trigger a freq switch request only for tasks 
>> that
>> +* are going to sleep; this is because we get here also 
>> during
>> +* load balancing, but in these cases it seems wise to 
>> trigger
>> +* as single request after load balancing is done.
>> +*
>> +* Also, we add a margin (same ~20% used for the tipping 
>> poin

Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-07-08 Thread Michael Turquette
Hi Juri,

Quoting Morten Rasmussen (2015-07-07 11:24:24)
> From: Juri Lelli 
> 
> Each time a task is {en,de}queued we might need to adapt the current
> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
> this purpose.  Only trigger a freq request if we are effectively waking up
> or going to sleep.  Filter out load balancing related calls to reduce the
> number of triggers.
> 
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
> 
> Signed-off-by: Juri Lelli 
> ---
>  kernel/sched/fair.c | 42 --
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f74e9d2..b8627c6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>  }
>  #endif
>  
> +static unsigned int capacity_margin = 1280; /* ~20% margin */

This is a 25% margin. Calling it ~20% is a bit misleading :)

Should margin be scaled for cpus that do not have max capacity == 1024?
In other words, should margin be dynamically calculated to be 20% of
*this* cpu's max capacity?

I'm imagining a corner case where a heterogeneous cpu system is set up
in such a way that adding margin that is hard-coded to 25% of 1024
almost always puts req_cap to the highest frequency, skipping some
reasonable capacity states in between.

> +
>  static bool cpu_overutilized(int cpu);
> +static unsigned long get_cpu_usage(int cpu);
>  struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
>  
>  /*
> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
> *p, int flags)
> if (!task_new && !rq->rd->overutilized &&
> cpu_overutilized(rq->cpu))
> rq->rd->overutilized = true;
> +   /*
> +* We want to trigger a freq switch request only for tasks 
> that
> +* are waking up; this is because we get here also during
> +* load balancing, but in these cases it seems wise to trigger
> +* as single request after load balancing is done.
> +*
> +* XXX: how about fork()? Do we need a special flag/something
> +*  to tell if we are here after a fork() 
> (wakeup_task_new)?
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && !task_new) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;

Probably a dumb question:

Can we "cheat" here and just assume that capacity and load use the same
units? That would avoid the multiplication and change your code to the
following:

#define capacity_margin SCHED_CAPACITY_SCALE >> 2; /* 25% */
req_cap += SCHED_CAPACITY_SCALE;

> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> +   }
> }
> hrtick_update(rq);
>  }
> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
> task_struct *p, int flags)
> if (!se) {
> sub_nr_running(rq, 1);
> update_rq_runnable_avg(rq, 1);
> +   /*
> +* We want to trigger a freq switch request only for tasks 
> that
> +* are going to sleep; this is because we get here also during
> +* load balancing, but in these cases it seems wise to trigger
> +* as single request after load balancing is done.
> +*
> +* Also, we add a margin (same ~20% used for the tipping 
> point)
> +* to our request to provide some head room if p's utilization
> +* further increases.
> +*/
> +   if (sched_energy_freq() && task_sleep) {
> +   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
> +
> +   req_cap = req_cap * capacity_margin
> +   >> SCHED_CAPACITY_SHIFT;
> +   cpufreq_sched_set_cap(cpu_of(rq), req_cap);

Filtering out the load_balance bits is neat.

Regards,
Mike

> +   }
> }
> hrtick_update(rq);
>  }
> @@ -4959,8 +4999,6 @@ static int find_new_capacity(struct energy_env *eenv,
> return idx;
>  }
>  
> -static unsigned int capacity_margin = 1280; /* ~20% margin */
> -
>  static bool cpu_overutilized(int cpu)
>  {
> return (capacity_of(cpu) * 1024) <
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More maj

[RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

2015-07-07 Thread Morten Rasmussen
From: Juri Lelli 

Each time a task is {en,de}queued we might need to adapt the current
frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
this purpose.  Only trigger a freq request if we are effectively waking up
or going to sleep.  Filter out load balancing related calls to reduce the
number of triggers.

cc: Ingo Molnar 
cc: Peter Zijlstra 

Signed-off-by: Juri Lelli 
---
 kernel/sched/fair.c | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f74e9d2..b8627c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
 }
 #endif
 
+static unsigned int capacity_margin = 1280; /* ~20% margin */
+
 static bool cpu_overutilized(int cpu);
+static unsigned long get_cpu_usage(int cpu);
 struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
 
 /*
@@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
int flags)
if (!task_new && !rq->rd->overutilized &&
cpu_overutilized(rq->cpu))
rq->rd->overutilized = true;
+   /*
+* We want to trigger a freq switch request only for tasks that
+* are waking up; this is because we get here also during
+* load balancing, but in these cases it seems wise to trigger
+* as single request after load balancing is done.
+*
+* XXX: how about fork()? Do we need a special flag/something
+*  to tell if we are here after a fork() (wakeup_task_new)?
+*
+* Also, we add a margin (same ~20% used for the tipping point)
+* to our request to provide some head room if p's utilization
+* further increases.
+*/
+   if (sched_energy_freq() && !task_new) {
+   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
+
+   req_cap = req_cap * capacity_margin
+   >> SCHED_CAPACITY_SHIFT;
+   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
+   }
}
hrtick_update(rq);
 }
@@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct 
task_struct *p, int flags)
if (!se) {
sub_nr_running(rq, 1);
update_rq_runnable_avg(rq, 1);
+   /*
+* We want to trigger a freq switch request only for tasks that
+* are going to sleep; this is because we get here also during
+* load balancing, but in these cases it seems wise to trigger
+* as single request after load balancing is done.
+*
+* Also, we add a margin (same ~20% used for the tipping point)
+* to our request to provide some head room if p's utilization
+* further increases.
+*/
+   if (sched_energy_freq() && task_sleep) {
+   unsigned long req_cap = get_cpu_usage(cpu_of(rq));
+
+   req_cap = req_cap * capacity_margin
+   >> SCHED_CAPACITY_SHIFT;
+   cpufreq_sched_set_cap(cpu_of(rq), req_cap);
+   }
}
hrtick_update(rq);
 }
@@ -4959,8 +4999,6 @@ static int find_new_capacity(struct energy_env *eenv,
return idx;
 }
 
-static unsigned int capacity_margin = 1280; /* ~20% margin */
-
 static bool cpu_overutilized(int cpu)
 {
return (capacity_of(cpu) * 1024) <
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/