Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread T. Zhou
On Sat, Aug 15, 2015 at 01:24:12PM +0900, Byungchul Park wrote:
> On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
> > Hi,
> > 
> > On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> > > +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> > > sched_entity *se)
> > > +{
> > > + se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > > + cfs_rq->avg.load_avg += se->avg.load_avg;
> > > + cfs_rq->avg.load_sum += se->avg.load_sum;
> > > + cfs_rq->avg.util_avg += se->avg.util_avg;
> > > + cfs_rq->avg.util_sum += se->avg.util_sum;
> > > +}
> > 
> > I see this function is used in enqueue_entity_load_avg() also.
> > In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
> > as se->last_update_time in migration condition.
> > Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
> > If se->last_update_time is different, next decay may be different too.
> > Just from inspecting code, maybe some reasonable there?
> 
> hello zhou,
> 
> update_cfs_rq_load_avg() would update cfs_rq->avg.last_update_time to now,
> which is returned from cfs_rq_clock_task(). :)
> 
> thanks,
> byungchul
> 

Hi Byungchul,

yes, you are right. se and cfs_rq update load avg at same time. :)

thanks
-- 
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread Byungchul Park
On Sat, Aug 15, 2015 at 03:52:48PM +0900, Byungchul Park wrote:
> On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> > > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > > > 
> > > > yuyang said that switched_to don't need to consider task's load because 
> > > > it
> > > > can have meaningless value. but i think considering task's load is 
> > > > better
> > > > than leaving it unattended at all. and we can also use switched_to if 
> > > > we 
> > > > consider task's load in switched_to.
> > > 
> > > when did I say "don't need to consider..."?
> > > 
> > > Doing more does not mean better, or just trivial. BTW, the task 
> > > switched_to
> > > does not have to be switched_from before. 
> > 
> > Correct, there's a few corner cases we need to consider.
> > 
> > However, I think we unconditionally call init_entity_runnable_average()
> > on all tasks, regardless of their 'initial' sched class, so it should
> > have a valid state.
> > 
> > Another thing to consider is the state being very stale, suppose it
> > started live as FAIR, ran for a bit, got switched to !FAIR by means of
> > sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> > time and for some reason gets switched back to FAIR, we need to age and
> > or re-init things.
> 
> hello,
> 
> what do you think about this approch for solving this problem ?
> it makes se's loads decay for detached periods for that rq. and i used
> rq instead of cfs_rq because it does not have dependency to cfs_rq
> any more.

to be honest with you, i am not sure what kind of clock do i have to
use for the detached se's decaying in this approach..
do you think it's better to use sched_clock() instead of rq task clock?

after checking if this appoach is feasible or not, and then i need to
choose a appropriate clock..

thanks,
byungchul

> 
> ---
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5b50082..8f5e2de 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1191,6 +1191,8 @@ struct load_weight {
>   */
>  struct sched_avg {
>   u64 last_update_time, load_sum;
> + u64 last_detached_time;
> + int last_detached_cpu;
>   u32 util_sum, period_contrib;
>   unsigned long load_avg, util_avg;
>  };
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 72d13af..b2d22c8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
>   struct sched_avg *sa = >avg;
>  
>   sa->last_update_time = 0;
> + sa->last_detached_time = 0;
> + sa->last_detached_cpu = -1;
>   /*
>* sched_avg's period_contrib should be strictly less then 1024, so
>* we give it 1023 to make sure it is almost a period (1024us), and
> @@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct 
> sched_entity *se, int update_tg)
>  
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
>  {
> - se->avg.last_update_time = cfs_rq->avg.last_update_time;
> - cfs_rq->avg.load_avg += se->avg.load_avg;
> - cfs_rq->avg.load_sum += se->avg.load_sum;
> - cfs_rq->avg.util_avg += se->avg.util_avg;
> - cfs_rq->avg.util_sum += se->avg.util_sum;
> + struct sched_avg *sa = >avg;
> + int cpu = sa->last_detached_cpu;
> + u64 delta;
> +
> + if (cpu != -1) {
> + delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time;
> + /*
> +  * compute the number of period passed, where a period is 1 
> msec,
> +  * since the entity had detached from the rq, and ignore 
> decaying
> +  * delta which is less than a period for fast calculation.
> +  */
> + delta >>= 20;
> + if (!delta)
> + goto do_attach;
> +
> + sa->load_sum = decay_load(sa->load_sum, delta);
> + sa->util_sum = decay_load((u64)(sa->util_sum), delta);
> + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> + sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
> LOAD_AVG_MAX;
> + }
> +
> +do_attach:
> + sa->last_detached_cpu = -1;
> + sa->last_detached_time = 0;
> + sa->period_contrib = 0;
> +
> + sa->last_update_time = cfs_rq->avg.last_update_time;
> + cfs_rq->avg.load_avg += sa->load_avg;
> + cfs_rq->avg.load_sum += sa->load_sum;
> + cfs_rq->avg.util_avg += sa->util_avg;
> + cfs_rq->avg.util_sum += sa->util_sum;
>  }
>  
>  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
>  {
> - __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + int cpu = cpu_of(rq_of(cfs_rq));
> +
> + se->avg.last_detached_cpu = cpu;
> + se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
> +
> + 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread Byungchul Park
On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> > On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > > 
> > > yuyang said that switched_to don't need to consider task's load because it
> > > can have meaningless value. but i think considering task's load is better
> > > than leaving it unattended at all. and we can also use switched_to if we 
> > > consider task's load in switched_to.
> > 
> > when did I say "don't need to consider..."?
> > 
> > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > does not have to be switched_from before. 
> 
> Correct, there's a few corner cases we need to consider.
> 
> However, I think we unconditionally call init_entity_runnable_average()
> on all tasks, regardless of their 'initial' sched class, so it should
> have a valid state.
> 
> Another thing to consider is the state being very stale, suppose it
> started live as FAIR, ran for a bit, got switched to !FAIR by means of
> sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> time and for some reason gets switched back to FAIR, we need to age and
> or re-init things.

hello,

what do you think about this approch for solving this problem ?
it makes se's loads decay for detached periods for that rq. and i used
rq instead of cfs_rq because it does not have dependency to cfs_rq
any more.

---

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..8f5e2de 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1191,6 +1191,8 @@ struct load_weight {
  */
 struct sched_avg {
u64 last_update_time, load_sum;
+   u64 last_detached_time;
+   int last_detached_cpu;
u32 util_sum, period_contrib;
unsigned long load_avg, util_avg;
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 72d13af..b2d22c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
struct sched_avg *sa = >avg;
 
sa->last_update_time = 0;
+   sa->last_detached_time = 0;
+   sa->last_detached_cpu = -1;
/*
 * sched_avg's period_contrib should be strictly less then 1024, so
 * we give it 1023 to make sure it is almost a period (1024us), and
@@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity 
*se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
-   se->avg.last_update_time = cfs_rq->avg.last_update_time;
-   cfs_rq->avg.load_avg += se->avg.load_avg;
-   cfs_rq->avg.load_sum += se->avg.load_sum;
-   cfs_rq->avg.util_avg += se->avg.util_avg;
-   cfs_rq->avg.util_sum += se->avg.util_sum;
+   struct sched_avg *sa = >avg;
+   int cpu = sa->last_detached_cpu;
+   u64 delta;
+
+   if (cpu != -1) {
+   delta = rq_clock_task(cpu_rq(cpu)) - sa->last_detached_time;
+   /*
+* compute the number of period passed, where a period is 1 
msec,
+* since the entity had detached from the rq, and ignore 
decaying
+* delta which is less than a period for fast calculation.
+*/
+   delta >>= 20;
+   if (!delta)
+   goto do_attach;
+
+   sa->load_sum = decay_load(sa->load_sum, delta);
+   sa->util_sum = decay_load((u64)(sa->util_sum), delta);
+   sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / 
LOAD_AVG_MAX;
+   }
+
+do_attach:
+   sa->last_detached_cpu = -1;
+   sa->last_detached_time = 0;
+   sa->period_contrib = 0;
+
+   sa->last_update_time = cfs_rq->avg.last_update_time;
+   cfs_rq->avg.load_avg += sa->load_avg;
+   cfs_rq->avg.load_sum += sa->load_sum;
+   cfs_rq->avg.util_avg += sa->util_avg;
+   cfs_rq->avg.util_sum += sa->util_sum;
 }
 
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
-   __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+   int cpu = cpu_of(rq_of(cfs_rq));
+
+   se->avg.last_detached_cpu = cpu;
+   se->avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
+
+   __update_load_avg(cfs_rq->avg.last_update_time, cpu,
>avg, se->on_rq * scale_load_down(se->load.weight),
cfs_rq->curr == se, NULL);

> 
> I _think_ we can use last_update_time for that, but I've not looked too
> hard.
> 
> That is, age based on last_update_time, if all 0, reinit, or somesuch.
> 
> 
> The most common case of switched_from()/switched_to() is Priority
> Inheritance, and that typically results in very short lived stints as
> !FAIR and the avg data should be still accurate by the time we return.
> --
> To unsubscribe from this list: 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread Byungchul Park
On Sat, Aug 15, 2015 at 03:52:48PM +0900, Byungchul Park wrote:
 On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
  On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
   On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:

yuyang said that switched_to don't need to consider task's load because 
it
can have meaningless value. but i think considering task's load is 
better
than leaving it unattended at all. and we can also use switched_to if 
we 
consider task's load in switched_to.
   
   when did I say don't need to consider...?
   
   Doing more does not mean better, or just trivial. BTW, the task 
   switched_to
   does not have to be switched_from before. 
  
  Correct, there's a few corner cases we need to consider.
  
  However, I think we unconditionally call init_entity_runnable_average()
  on all tasks, regardless of their 'initial' sched class, so it should
  have a valid state.
  
  Another thing to consider is the state being very stale, suppose it
  started live as FAIR, ran for a bit, got switched to !FAIR by means of
  sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
  time and for some reason gets switched back to FAIR, we need to age and
  or re-init things.
 
 hello,
 
 what do you think about this approch for solving this problem ?
 it makes se's loads decay for detached periods for that rq. and i used
 rq instead of cfs_rq because it does not have dependency to cfs_rq
 any more.

to be honest with you, i am not sure what kind of clock do i have to
use for the detached se's decaying in this approach..
do you think it's better to use sched_clock() instead of rq task clock?

after checking if this appoach is feasible or not, and then i need to
choose a appropriate clock..

thanks,
byungchul

 
 ---
 
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index 5b50082..8f5e2de 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -1191,6 +1191,8 @@ struct load_weight {
   */
  struct sched_avg {
   u64 last_update_time, load_sum;
 + u64 last_detached_time;
 + int last_detached_cpu;
   u32 util_sum, period_contrib;
   unsigned long load_avg, util_avg;
  };
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 72d13af..b2d22c8 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
   struct sched_avg *sa = se-avg;
  
   sa-last_update_time = 0;
 + sa-last_detached_time = 0;
 + sa-last_detached_cpu = -1;
   /*
* sched_avg's period_contrib should be strictly less then 1024, so
* we give it 1023 to make sure it is almost a period (1024us), and
 @@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct 
 sched_entity *se, int update_tg)
  
  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
 sched_entity *se)
  {
 - se-avg.last_update_time = cfs_rq-avg.last_update_time;
 - cfs_rq-avg.load_avg += se-avg.load_avg;
 - cfs_rq-avg.load_sum += se-avg.load_sum;
 - cfs_rq-avg.util_avg += se-avg.util_avg;
 - cfs_rq-avg.util_sum += se-avg.util_sum;
 + struct sched_avg *sa = se-avg;
 + int cpu = sa-last_detached_cpu;
 + u64 delta;
 +
 + if (cpu != -1) {
 + delta = rq_clock_task(cpu_rq(cpu)) - sa-last_detached_time;
 + /*
 +  * compute the number of period passed, where a period is 1 
 msec,
 +  * since the entity had detached from the rq, and ignore 
 decaying
 +  * delta which is less than a period for fast calculation.
 +  */
 + delta = 20;
 + if (!delta)
 + goto do_attach;
 +
 + sa-load_sum = decay_load(sa-load_sum, delta);
 + sa-util_sum = decay_load((u64)(sa-util_sum), delta);
 + sa-load_avg = div_u64(sa-load_sum, LOAD_AVG_MAX);
 + sa-util_avg = (sa-util_sum  SCHED_LOAD_SHIFT) / 
 LOAD_AVG_MAX;
 + }
 +
 +do_attach:
 + sa-last_detached_cpu = -1;
 + sa-last_detached_time = 0;
 + sa-period_contrib = 0;
 +
 + sa-last_update_time = cfs_rq-avg.last_update_time;
 + cfs_rq-avg.load_avg += sa-load_avg;
 + cfs_rq-avg.load_sum += sa-load_sum;
 + cfs_rq-avg.util_avg += sa-util_avg;
 + cfs_rq-avg.util_sum += sa-util_sum;
  }
  
  static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
 sched_entity *se)
  {
 - __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 + int cpu = cpu_of(rq_of(cfs_rq));
 +
 + se-avg.last_detached_cpu = cpu;
 + se-avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
 +
 + __update_load_avg(cfs_rq-avg.last_update_time, cpu,
   se-avg, se-on_rq * scale_load_down(se-load.weight),
   cfs_rq-curr == se, NULL);
 
  
  I _think_ we can use last_update_time for that, but I've not looked too
  

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread Byungchul Park
On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
 On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
  On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
   
   yuyang said that switched_to don't need to consider task's load because it
   can have meaningless value. but i think considering task's load is better
   than leaving it unattended at all. and we can also use switched_to if we 
   consider task's load in switched_to.
  
  when did I say don't need to consider...?
  
  Doing more does not mean better, or just trivial. BTW, the task switched_to
  does not have to be switched_from before. 
 
 Correct, there's a few corner cases we need to consider.
 
 However, I think we unconditionally call init_entity_runnable_average()
 on all tasks, regardless of their 'initial' sched class, so it should
 have a valid state.
 
 Another thing to consider is the state being very stale, suppose it
 started live as FAIR, ran for a bit, got switched to !FAIR by means of
 sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
 time and for some reason gets switched back to FAIR, we need to age and
 or re-init things.

hello,

what do you think about this approch for solving this problem ?
it makes se's loads decay for detached periods for that rq. and i used
rq instead of cfs_rq because it does not have dependency to cfs_rq
any more.

---

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..8f5e2de 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1191,6 +1191,8 @@ struct load_weight {
  */
 struct sched_avg {
u64 last_update_time, load_sum;
+   u64 last_detached_time;
+   int last_detached_cpu;
u32 util_sum, period_contrib;
unsigned long load_avg, util_avg;
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 72d13af..b2d22c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -673,6 +673,8 @@ void init_entity_runnable_average(struct sched_entity *se)
struct sched_avg *sa = se-avg;
 
sa-last_update_time = 0;
+   sa-last_detached_time = 0;
+   sa-last_detached_cpu = -1;
/*
 * sched_avg's period_contrib should be strictly less then 1024, so
 * we give it 1023 to make sure it is almost a period (1024us), and
@@ -2711,16 +2713,47 @@ static inline void update_load_avg(struct sched_entity 
*se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
-   se-avg.last_update_time = cfs_rq-avg.last_update_time;
-   cfs_rq-avg.load_avg += se-avg.load_avg;
-   cfs_rq-avg.load_sum += se-avg.load_sum;
-   cfs_rq-avg.util_avg += se-avg.util_avg;
-   cfs_rq-avg.util_sum += se-avg.util_sum;
+   struct sched_avg *sa = se-avg;
+   int cpu = sa-last_detached_cpu;
+   u64 delta;
+
+   if (cpu != -1) {
+   delta = rq_clock_task(cpu_rq(cpu)) - sa-last_detached_time;
+   /*
+* compute the number of period passed, where a period is 1 
msec,
+* since the entity had detached from the rq, and ignore 
decaying
+* delta which is less than a period for fast calculation.
+*/
+   delta = 20;
+   if (!delta)
+   goto do_attach;
+
+   sa-load_sum = decay_load(sa-load_sum, delta);
+   sa-util_sum = decay_load((u64)(sa-util_sum), delta);
+   sa-load_avg = div_u64(sa-load_sum, LOAD_AVG_MAX);
+   sa-util_avg = (sa-util_sum  SCHED_LOAD_SHIFT) / 
LOAD_AVG_MAX;
+   }
+
+do_attach:
+   sa-last_detached_cpu = -1;
+   sa-last_detached_time = 0;
+   sa-period_contrib = 0;
+
+   sa-last_update_time = cfs_rq-avg.last_update_time;
+   cfs_rq-avg.load_avg += sa-load_avg;
+   cfs_rq-avg.load_sum += sa-load_sum;
+   cfs_rq-avg.util_avg += sa-util_avg;
+   cfs_rq-avg.util_sum += sa-util_sum;
 }
 
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
-   __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+   int cpu = cpu_of(rq_of(cfs_rq));
+
+   se-avg.last_detached_cpu = cpu;
+   se-avg.last_detached_time = rq_clock_task(rq_of(cfs_rq));
+
+   __update_load_avg(cfs_rq-avg.last_update_time, cpu,
se-avg, se-on_rq * scale_load_down(se-load.weight),
cfs_rq-curr == se, NULL);

 
 I _think_ we can use last_update_time for that, but I've not looked too
 hard.
 
 That is, age based on last_update_time, if all 0, reinit, or somesuch.
 
 
 The most common case of switched_from()/switched_to() is Priority
 Inheritance, and that typically results in very short lived stints as
 !FAIR and the avg data should be still accurate by the time we return.
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-15 Thread T. Zhou
On Sat, Aug 15, 2015 at 01:24:12PM +0900, Byungchul Park wrote:
 On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
  Hi,
  
  On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
   +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
   sched_entity *se)
   +{
   + se-avg.last_update_time = cfs_rq-avg.last_update_time;
   + cfs_rq-avg.load_avg += se-avg.load_avg;
   + cfs_rq-avg.load_sum += se-avg.load_sum;
   + cfs_rq-avg.util_avg += se-avg.util_avg;
   + cfs_rq-avg.util_sum += se-avg.util_sum;
   +}
  
  I see this function is used in enqueue_entity_load_avg() also.
  In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
  as se-last_update_time in migration condition.
  Here, you use cfs_rq-avg.last_update_time as se-last_update_time.
  If se-last_update_time is different, next decay may be different too.
  Just from inspecting code, maybe some reasonable there?
 
 hello zhou,
 
 update_cfs_rq_load_avg() would update cfs_rq-avg.last_update_time to now,
 which is returned from cfs_rq_clock_task(). :)
 
 thanks,
 byungchul
 

Hi Byungchul,

yes, you are right. se and cfs_rq update load avg at same time. :)

thanks
-- 
Tao
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Byungchul Park
On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
> Hi,
> 
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> > +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> > sched_entity *se)
> > +{
> > +   se->avg.last_update_time = cfs_rq->avg.last_update_time;
> > +   cfs_rq->avg.load_avg += se->avg.load_avg;
> > +   cfs_rq->avg.load_sum += se->avg.load_sum;
> > +   cfs_rq->avg.util_avg += se->avg.util_avg;
> > +   cfs_rq->avg.util_sum += se->avg.util_sum;
> > +}
> 
> I see this function is used in enqueue_entity_load_avg() also.
> In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
> as se->last_update_time in migration condition.
> Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
> If se->last_update_time is different, next decay may be different too.
> Just from inspecting code, maybe some reasonable there?

hello zhou,

update_cfs_rq_load_avg() would update cfs_rq->avg.last_update_time to now,
which is returned from cfs_rq_clock_task(). :)

thanks,
byungchul

> 
> Thanks
> > +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> > sched_entity *se)
> > +{
> > +   __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> > +   >avg, se->on_rq * scale_load_down(se->load.weight),
> > +   cfs_rq->curr == se, NULL);
> > +
> > +   cfs_rq->avg.load_avg =
> > +   max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > +   cfs_rq->avg.load_sum =
> > +   max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > +   cfs_rq->avg.util_avg =
> > +   max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > +   cfs_rq->avg.util_sum =
> > +   max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +}
> > +
> >  /* Add the load generated by se into cfs_rq's load average */
> >  static inline void
> >  enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, 
> > struct sched_entity *se)
> > u64 now = cfs_rq_clock_task(cfs_rq);
> > int migrated = 0, decayed;
> >  
> > -   if (sa->last_update_time == 0) {
> > -   sa->last_update_time = now;
> > +   if (sa->last_update_time == 0)
> > migrated = 1;
> > -   }
> > -   else {
> > +   else
> > __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> > -   se->on_rq * scale_load_down(se->load.weight),
> > -   cfs_rq->curr == se, NULL);
> > -   }
> > +   se->on_rq * scale_load_down(se->load.weight),
> > +   cfs_rq->curr == se, NULL);
> >  
> > decayed = update_cfs_rq_load_avg(now, cfs_rq);
> >  
> > cfs_rq->runnable_load_avg += sa->load_avg;
> > cfs_rq->runnable_load_sum += sa->load_sum;
> >  
> > -   if (migrated) {
> > -   cfs_rq->avg.load_avg += sa->load_avg;
> > -   cfs_rq->avg.load_sum += sa->load_sum;
> > -   cfs_rq->avg.util_avg += sa->util_avg;
> > -   cfs_rq->avg.util_sum += sa->util_sum;
> > -   }
> > +   if (migrated)
> > +   attach_entity_load_avg(cfs_rq, se);
> >  
> > if (decayed || migrated)
> > update_tg_load_avg(cfs_rq, 0);
> > @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct 
> > task_struct *p)
> >  
> >  #ifdef CONFIG_SMP
> > /* Catch up with the cfs_rq and remove our load when we leave */
> > -   __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), >avg,
> > -   se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == 
> > se, NULL);
> > -
> > -   cfs_rq->avg.load_avg =
> > -   max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> > -   cfs_rq->avg.load_sum =
> > -   max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> > -   cfs_rq->avg.util_avg =
> > -   max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> > -   cfs_rq->avg.util_sum =
> > -   max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> > +   detach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> >  
> > @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct 
> > task_struct *p)
> >  */
> > se->depth = se->parent ? se->parent->depth + 1 : 0;
> >  #endif
> > +
> > +#ifdef CONFIG_SMP
> > +   /* synchronize task with its cfs_rq */
> > +   attach_entity_load_avg(cfs_rq_of(>se), >se);
> > +#endif
> > if (!task_on_rq_queued(p))
> > return;
> >  
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
> > *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> > /* synchronize task with its prev cfs_rq */
> > -   if (!queued)
> > -   __update_load_avg(cfs_rq->avg.last_update_time, 
> > cpu_of(rq_of(cfs_rq)),
> > -   >avg, se->on_rq * 
> > scale_load_down(se->load.weight),
> > -   cfs_rq->curr == se, NULL);
> > -
> > -   /* remove our 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread T. Zhou
Hi,

On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
> +{
> + se->avg.last_update_time = cfs_rq->avg.last_update_time;
> + cfs_rq->avg.load_avg += se->avg.load_avg;
> + cfs_rq->avg.load_sum += se->avg.load_sum;
> + cfs_rq->avg.util_avg += se->avg.util_avg;
> + cfs_rq->avg.util_sum += se->avg.util_sum;
> +}

I see this function is used in enqueue_entity_load_avg() also.
In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
as se->last_update_time in migration condition.
Here, you use cfs_rq->avg.last_update_time as se->last_update_time.
If se->last_update_time is different, next decay may be different too.
Just from inspecting code, maybe some reasonable there?

Thanks
> +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
> +{
> + __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + >avg, se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
> +
> + cfs_rq->avg.load_avg =
> + max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> + cfs_rq->avg.load_sum =
> + max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> + cfs_rq->avg.util_avg =
> + max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> + cfs_rq->avg.util_sum =
> + max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> +}
> +
>  /* Add the load generated by se into cfs_rq's load average */
>  static inline void
>  enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
>   u64 now = cfs_rq_clock_task(cfs_rq);
>   int migrated = 0, decayed;
>  
> - if (sa->last_update_time == 0) {
> - sa->last_update_time = now;
> + if (sa->last_update_time == 0)
>   migrated = 1;
> - }
> - else {
> + else
>   __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> - se->on_rq * scale_load_down(se->load.weight),
> - cfs_rq->curr == se, NULL);
> - }
> + se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
>  
>   decayed = update_cfs_rq_load_avg(now, cfs_rq);
>  
>   cfs_rq->runnable_load_avg += sa->load_avg;
>   cfs_rq->runnable_load_sum += sa->load_sum;
>  
> - if (migrated) {
> - cfs_rq->avg.load_avg += sa->load_avg;
> - cfs_rq->avg.load_sum += sa->load_sum;
> - cfs_rq->avg.util_avg += sa->util_avg;
> - cfs_rq->avg.util_sum += sa->util_sum;
> - }
> + if (migrated)
> + attach_entity_load_avg(cfs_rq, se);
>  
>   if (decayed || migrated)
>   update_tg_load_avg(cfs_rq, 0);
> @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct 
> task_struct *p)
>  
>  #ifdef CONFIG_SMP
>   /* Catch up with the cfs_rq and remove our load when we leave */
> - __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq), >avg,
> - se->on_rq * scale_load_down(se->load.weight), cfs_rq->curr == 
> se, NULL);
> -
> - cfs_rq->avg.load_avg =
> - max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
> - cfs_rq->avg.load_sum =
> - max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
> - cfs_rq->avg.util_avg =
> - max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
> - cfs_rq->avg.util_sum =
> - max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
> + detach_entity_load_avg(cfs_rq, se);
>  #endif
>  }
>  
> @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct 
> task_struct *p)
>*/
>   se->depth = se->parent ? se->parent->depth + 1 : 0;
>  #endif
> +
> +#ifdef CONFIG_SMP
> + /* synchronize task with its cfs_rq */
> + attach_entity_load_avg(cfs_rq_of(>se), >se);
> +#endif
>   if (!task_on_rq_queued(p))
>   return;
>  
> @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
> *p, int queued)
>  
>  #ifdef CONFIG_SMP
>   /* synchronize task with its prev cfs_rq */
> - if (!queued)
> - __update_load_avg(cfs_rq->avg.last_update_time, 
> cpu_of(rq_of(cfs_rq)),
> - >avg, se->on_rq * 
> scale_load_down(se->load.weight),
> - cfs_rq->curr == se, NULL);
> -
> - /* remove our load when we leave */
> - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> se->avg.load_avg, 0);
> - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> se->avg.load_sum, 0);
> - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> se->avg.util_avg, 0);
> - cfs_rq->avg.util_sum 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 07:20:20AM +0800, Yuyang Du wrote:
> sched: Provide sched class and priority change statistics to task
> 
> The sched class and priority changes make substantial impact for
> a task, but we really have not a quantitative understanding of how
> frequent they are, which makes it diffcult to work on many cases
> especially some corner cases. We privide it at task level.
> 
> Signed-off-by: Yuyang Du 

Sure we can do that. I would think its fairly rare on regular kernels,
whilst fairly common on -RT kernels where PI is much more active.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Yuyang Du
On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
> > when did I say "don't need to consider..."?
> > 
> > Doing more does not mean better, or just trivial. BTW, the task switched_to
> > does not have to be switched_from before. 
> 
> Correct, there's a few corner cases we need to consider.
> 
> However, I think we unconditionally call init_entity_runnable_average()
> on all tasks, regardless of their 'initial' sched class, so it should
> have a valid state.
> 
> Another thing to consider is the state being very stale, suppose it
> started live as FAIR, ran for a bit, got switched to !FAIR by means of
> sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
> time and for some reason gets switched back to FAIR, we need to age and
> or re-init things.
> 
> I _think_ we can use last_update_time for that, but I've not looked too
> hard.
> 
> That is, age based on last_update_time, if all 0, reinit, or somesuch.
> 
> 
> The most common case of switched_from()/switched_to() is Priority
> Inheritance, and that typically results in very short lived stints as
> !FAIR and the avg data should be still accurate by the time we return.

Right, we have the following cases to consider:
queued or not * PI or sched_setscheduler * switched_from_fair or not

After enumerating them, it simply boils down to this basic question:
with either short or long lost record, what we predict?

It can have to do with perspective, preference, etc, and it differs
in how frequent it happens, which we lack. Otherwise, we don't have
the context to see what difference it makes.

To address this, I think the following patch might be helpful first.

--
sched: Provide sched class and priority change statistics to task

The sched class and priority changes make substantial impact for
a task, but we really have not a quantitative understanding of how
frequent they are, which makes it diffcult to work on many cases
especially some corner cases. We privide it at task level.

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |3 +++
 kernel/sched/core.c   |8 +++-
 kernel/sched/debug.c  |2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..9111696 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1228,6 +1228,9 @@ struct sched_statistics {
u64 nr_wakeups_affine_attempts;
u64 nr_wakeups_passive;
u64 nr_wakeups_idle;
+
+   u64 class_change;
+   u64 priority_change;
 };
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d..b65af01 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1009,12 +1009,18 @@ static inline void check_class_changed(struct rq *rq, 
struct task_struct *p,
   int oldprio)
 {
if (prev_class != p->sched_class) {
+   schedstat_inc(p, se.statistics.class_change);
+   schedstat_inc(p, se.statistics.priority_change);
+
if (prev_class->switched_from)
prev_class->switched_from(rq, p);
 
p->sched_class->switched_to(rq, p);
-   } else if (oldprio != p->prio || dl_task(p))
+   } else if (oldprio != p->prio || dl_task(p)) {
+   schedstat_inc(p, se.statistics.priority_change);
+
p->sched_class->prio_changed(rq, p, oldprio);
+   }
 }
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 6415117..fac8b89 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -597,6 +597,8 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
P(se.statistics.nr_wakeups_affine_attempts);
P(se.statistics.nr_wakeups_passive);
P(se.statistics.nr_wakeups_idle);
+   P(se.statistics.class_change);
+   P(se.statistics.priority_change);
 
{
u64 avg_atom, avg_per_cpu;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Yuyang Du
On Thu, Aug 13, 2015 at 05:22:12PM +0200, Peter Zijlstra wrote:
  when did I say don't need to consider...?
  
  Doing more does not mean better, or just trivial. BTW, the task switched_to
  does not have to be switched_from before. 
 
 Correct, there's a few corner cases we need to consider.
 
 However, I think we unconditionally call init_entity_runnable_average()
 on all tasks, regardless of their 'initial' sched class, so it should
 have a valid state.
 
 Another thing to consider is the state being very stale, suppose it
 started live as FAIR, ran for a bit, got switched to !FAIR by means of
 sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
 time and for some reason gets switched back to FAIR, we need to age and
 or re-init things.
 
 I _think_ we can use last_update_time for that, but I've not looked too
 hard.
 
 That is, age based on last_update_time, if all 0, reinit, or somesuch.
 
 
 The most common case of switched_from()/switched_to() is Priority
 Inheritance, and that typically results in very short lived stints as
 !FAIR and the avg data should be still accurate by the time we return.

Right, we have the following cases to consider:
queued or not * PI or sched_setscheduler * switched_from_fair or not

After enumerating them, it simply boils down to this basic question:
with either short or long lost record, what we predict?

It can have to do with perspective, preference, etc, and it differs
in how frequent it happens, which we lack. Otherwise, we don't have
the context to see what difference it makes.

To address this, I think the following patch might be helpful first.

--
sched: Provide sched class and priority change statistics to task

The sched class and priority changes make substantial impact for
a task, but we really have not a quantitative understanding of how
frequent they are, which makes it diffcult to work on many cases
especially some corner cases. We privide it at task level.

Signed-off-by: Yuyang Du yuyang...@intel.com
---
 include/linux/sched.h |3 +++
 kernel/sched/core.c   |8 +++-
 kernel/sched/debug.c  |2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..9111696 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1228,6 +1228,9 @@ struct sched_statistics {
u64 nr_wakeups_affine_attempts;
u64 nr_wakeups_passive;
u64 nr_wakeups_idle;
+
+   u64 class_change;
+   u64 priority_change;
 };
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d..b65af01 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1009,12 +1009,18 @@ static inline void check_class_changed(struct rq *rq, 
struct task_struct *p,
   int oldprio)
 {
if (prev_class != p-sched_class) {
+   schedstat_inc(p, se.statistics.class_change);
+   schedstat_inc(p, se.statistics.priority_change);
+
if (prev_class-switched_from)
prev_class-switched_from(rq, p);
 
p-sched_class-switched_to(rq, p);
-   } else if (oldprio != p-prio || dl_task(p))
+   } else if (oldprio != p-prio || dl_task(p)) {
+   schedstat_inc(p, se.statistics.priority_change);
+
p-sched_class-prio_changed(rq, p, oldprio);
+   }
 }
 
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 6415117..fac8b89 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -597,6 +597,8 @@ void proc_sched_show_task(struct task_struct *p, struct 
seq_file *m)
P(se.statistics.nr_wakeups_affine_attempts);
P(se.statistics.nr_wakeups_passive);
P(se.statistics.nr_wakeups_idle);
+   P(se.statistics.class_change);
+   P(se.statistics.priority_change);
 
{
u64 avg_atom, avg_per_cpu;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Peter Zijlstra
On Fri, Aug 14, 2015 at 07:20:20AM +0800, Yuyang Du wrote:
 sched: Provide sched class and priority change statistics to task
 
 The sched class and priority changes make substantial impact for
 a task, but we really have not a quantitative understanding of how
 frequent they are, which makes it diffcult to work on many cases
 especially some corner cases. We privide it at task level.
 
 Signed-off-by: Yuyang Du yuyang...@intel.com

Sure we can do that. I would think its fairly rare on regular kernels,
whilst fairly common on -RT kernels where PI is much more active.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread Byungchul Park
On Fri, Aug 14, 2015 at 08:59:02PM +0800, T. Zhou wrote:
 Hi,
 
 On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
  +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
  sched_entity *se)
  +{
  +   se-avg.last_update_time = cfs_rq-avg.last_update_time;
  +   cfs_rq-avg.load_avg += se-avg.load_avg;
  +   cfs_rq-avg.load_sum += se-avg.load_sum;
  +   cfs_rq-avg.util_avg += se-avg.util_avg;
  +   cfs_rq-avg.util_sum += se-avg.util_sum;
  +}
 
 I see this function is used in enqueue_entity_load_avg() also.
 In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
 as se-last_update_time in migration condition.
 Here, you use cfs_rq-avg.last_update_time as se-last_update_time.
 If se-last_update_time is different, next decay may be different too.
 Just from inspecting code, maybe some reasonable there?

hello zhou,

update_cfs_rq_load_avg() would update cfs_rq-avg.last_update_time to now,
which is returned from cfs_rq_clock_task(). :)

thanks,
byungchul

 
 Thanks
  +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
  sched_entity *se)
  +{
  +   __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq_of(cfs_rq)),
  +   se-avg, se-on_rq * scale_load_down(se-load.weight),
  +   cfs_rq-curr == se, NULL);
  +
  +   cfs_rq-avg.load_avg =
  +   max_t(long, cfs_rq-avg.load_avg - se-avg.load_avg, 0);
  +   cfs_rq-avg.load_sum =
  +   max_t(s64, cfs_rq-avg.load_sum - se-avg.load_sum, 0);
  +   cfs_rq-avg.util_avg =
  +   max_t(long, cfs_rq-avg.util_avg - se-avg.util_avg, 0);
  +   cfs_rq-avg.util_sum =
  +   max_t(s32, cfs_rq-avg.util_sum - se-avg.util_sum, 0);
  +}
  +
   /* Add the load generated by se into cfs_rq's load average */
   static inline void
   enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
  @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, 
  struct sched_entity *se)
  u64 now = cfs_rq_clock_task(cfs_rq);
  int migrated = 0, decayed;
   
  -   if (sa-last_update_time == 0) {
  -   sa-last_update_time = now;
  +   if (sa-last_update_time == 0)
  migrated = 1;
  -   }
  -   else {
  +   else
  __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
  -   se-on_rq * scale_load_down(se-load.weight),
  -   cfs_rq-curr == se, NULL);
  -   }
  +   se-on_rq * scale_load_down(se-load.weight),
  +   cfs_rq-curr == se, NULL);
   
  decayed = update_cfs_rq_load_avg(now, cfs_rq);
   
  cfs_rq-runnable_load_avg += sa-load_avg;
  cfs_rq-runnable_load_sum += sa-load_sum;
   
  -   if (migrated) {
  -   cfs_rq-avg.load_avg += sa-load_avg;
  -   cfs_rq-avg.load_sum += sa-load_sum;
  -   cfs_rq-avg.util_avg += sa-util_avg;
  -   cfs_rq-avg.util_sum += sa-util_sum;
  -   }
  +   if (migrated)
  +   attach_entity_load_avg(cfs_rq, se);
   
  if (decayed || migrated)
  update_tg_load_avg(cfs_rq, 0);
  @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct 
  task_struct *p)
   
   #ifdef CONFIG_SMP
  /* Catch up with the cfs_rq and remove our load when we leave */
  -   __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq), se-avg,
  -   se-on_rq * scale_load_down(se-load.weight), cfs_rq-curr == 
  se, NULL);
  -
  -   cfs_rq-avg.load_avg =
  -   max_t(long, cfs_rq-avg.load_avg - se-avg.load_avg, 0);
  -   cfs_rq-avg.load_sum =
  -   max_t(s64, cfs_rq-avg.load_sum - se-avg.load_sum, 0);
  -   cfs_rq-avg.util_avg =
  -   max_t(long, cfs_rq-avg.util_avg - se-avg.util_avg, 0);
  -   cfs_rq-avg.util_sum =
  -   max_t(s32, cfs_rq-avg.util_sum - se-avg.util_sum, 0);
  +   detach_entity_load_avg(cfs_rq, se);
   #endif
   }
   
  @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct 
  task_struct *p)
   */
  se-depth = se-parent ? se-parent-depth + 1 : 0;
   #endif
  +
  +#ifdef CONFIG_SMP
  +   /* synchronize task with its cfs_rq */
  +   attach_entity_load_avg(cfs_rq_of(p-se), p-se);
  +#endif
  if (!task_on_rq_queued(p))
  return;
   
  @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
  *p, int queued)
   
   #ifdef CONFIG_SMP
  /* synchronize task with its prev cfs_rq */
  -   if (!queued)
  -   __update_load_avg(cfs_rq-avg.last_update_time, 
  cpu_of(rq_of(cfs_rq)),
  -   se-avg, se-on_rq * 
  scale_load_down(se-load.weight),
  -   cfs_rq-curr == se, NULL);
  -
  -   /* remove our load when we leave */
  -   cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
  se-avg.load_avg, 0);
  -   cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
  se-avg.load_sum, 0);
  -   cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
  se-avg.util_avg, 0);
  -   

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-14 Thread T. Zhou
Hi,

On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
 +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
 sched_entity *se)
 +{
 + se-avg.last_update_time = cfs_rq-avg.last_update_time;
 + cfs_rq-avg.load_avg += se-avg.load_avg;
 + cfs_rq-avg.load_sum += se-avg.load_sum;
 + cfs_rq-avg.util_avg += se-avg.util_avg;
 + cfs_rq-avg.util_sum += se-avg.util_sum;
 +}

I see this function is used in enqueue_entity_load_avg() also.
In tip tree code, enqueue_entity_load_avg() uses cfs_rq_clock_task()
as se-last_update_time in migration condition.
Here, you use cfs_rq-avg.last_update_time as se-last_update_time.
If se-last_update_time is different, next decay may be different too.
Just from inspecting code, maybe some reasonable there?

Thanks
 +static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct 
 sched_entity *se)
 +{
 + __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 + se-avg, se-on_rq * scale_load_down(se-load.weight),
 + cfs_rq-curr == se, NULL);
 +
 + cfs_rq-avg.load_avg =
 + max_t(long, cfs_rq-avg.load_avg - se-avg.load_avg, 0);
 + cfs_rq-avg.load_sum =
 + max_t(s64, cfs_rq-avg.load_sum - se-avg.load_sum, 0);
 + cfs_rq-avg.util_avg =
 + max_t(long, cfs_rq-avg.util_avg - se-avg.util_avg, 0);
 + cfs_rq-avg.util_sum =
 + max_t(s32, cfs_rq-avg.util_sum - se-avg.util_sum, 0);
 +}
 +
  /* Add the load generated by se into cfs_rq's load average */
  static inline void
  enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 @@ -2717,27 +2742,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct 
 sched_entity *se)
   u64 now = cfs_rq_clock_task(cfs_rq);
   int migrated = 0, decayed;
  
 - if (sa-last_update_time == 0) {
 - sa-last_update_time = now;
 + if (sa-last_update_time == 0)
   migrated = 1;
 - }
 - else {
 + else
   __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 - se-on_rq * scale_load_down(se-load.weight),
 - cfs_rq-curr == se, NULL);
 - }
 + se-on_rq * scale_load_down(se-load.weight),
 + cfs_rq-curr == se, NULL);
  
   decayed = update_cfs_rq_load_avg(now, cfs_rq);
  
   cfs_rq-runnable_load_avg += sa-load_avg;
   cfs_rq-runnable_load_sum += sa-load_sum;
  
 - if (migrated) {
 - cfs_rq-avg.load_avg += sa-load_avg;
 - cfs_rq-avg.load_sum += sa-load_sum;
 - cfs_rq-avg.util_avg += sa-util_avg;
 - cfs_rq-avg.util_sum += sa-util_sum;
 - }
 + if (migrated)
 + attach_entity_load_avg(cfs_rq, se);
  
   if (decayed || migrated)
   update_tg_load_avg(cfs_rq, 0);
 @@ -7911,17 +7929,7 @@ static void switched_from_fair(struct rq *rq, struct 
 task_struct *p)
  
  #ifdef CONFIG_SMP
   /* Catch up with the cfs_rq and remove our load when we leave */
 - __update_load_avg(cfs_rq-avg.last_update_time, cpu_of(rq), se-avg,
 - se-on_rq * scale_load_down(se-load.weight), cfs_rq-curr == 
 se, NULL);
 -
 - cfs_rq-avg.load_avg =
 - max_t(long, cfs_rq-avg.load_avg - se-avg.load_avg, 0);
 - cfs_rq-avg.load_sum =
 - max_t(s64, cfs_rq-avg.load_sum - se-avg.load_sum, 0);
 - cfs_rq-avg.util_avg =
 - max_t(long, cfs_rq-avg.util_avg - se-avg.util_avg, 0);
 - cfs_rq-avg.util_sum =
 - max_t(s32, cfs_rq-avg.util_sum - se-avg.util_sum, 0);
 + detach_entity_load_avg(cfs_rq, se);
  #endif
  }
  
 @@ -7938,6 +7946,11 @@ static void switched_to_fair(struct rq *rq, struct 
 task_struct *p)
*/
   se-depth = se-parent ? se-parent-depth + 1 : 0;
  #endif
 +
 +#ifdef CONFIG_SMP
 + /* synchronize task with its cfs_rq */
 + attach_entity_load_avg(cfs_rq_of(p-se), p-se);
 +#endif
   if (!task_on_rq_queued(p))
   return;
  
 @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
 *p, int queued)
  
  #ifdef CONFIG_SMP
   /* synchronize task with its prev cfs_rq */
 - if (!queued)
 - __update_load_avg(cfs_rq-avg.last_update_time, 
 cpu_of(rq_of(cfs_rq)),
 - se-avg, se-on_rq * 
 scale_load_down(se-load.weight),
 - cfs_rq-curr == se, NULL);
 -
 - /* remove our load when we leave */
 - cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
 se-avg.load_avg, 0);
 - cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
 se-avg.load_sum, 0);
 - cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
 se-avg.util_avg, 0);
 - cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
 se-avg.util_sum, 0);
 + detach_entity_load_avg(cfs_rq, se);
  #endif
   set_task_rq(p, task_cpu(p));
   se-depth = se-parent ? se-parent-depth + 1 

Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Peter Zijlstra
On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > 
> > yuyang said that switched_to don't need to consider task's load because it
> > can have meaningless value. but i think considering task's load is better
> > than leaving it unattended at all. and we can also use switched_to if we 
> > consider task's load in switched_to.
> 
> when did I say "don't need to consider..."?
> 
> Doing more does not mean better, or just trivial. BTW, the task switched_to
> does not have to be switched_from before. 

Correct, there's a few corner cases we need to consider.

However, I think we unconditionally call init_entity_runnable_average()
on all tasks, regardless of their 'initial' sched class, so it should
have a valid state.

Another thing to consider is the state being very stale, suppose it
started live as FAIR, ran for a bit, got switched to !FAIR by means of
sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
time and for some reason gets switched back to FAIR, we need to age and
or re-init things.

I _think_ we can use last_update_time for that, but I've not looked too
hard.

That is, age based on last_update_time, if all 0, reinit, or somesuch.


The most common case of switched_from()/switched_to() is Priority
Inheritance, and that typically results in very short lived stints as
!FAIR and the avg data should be still accurate by the time we return.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> > 
> > yuyang said that switched_to don't need to consider task's load because it
> > can have meaningless value. but i think considering task's load is better
> > than leaving it unattended at all. and we can also use switched_to if we 
> > consider task's load in switched_to.
> 
> when did I say "don't need to consider..."?
> 
> Doing more does not mean better, or just trivial. BTW, the task switched_to
> does not have to be switched_from before. 

i see.. i need to add initialization routine for fair class..

thanks,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 04:19:04PM +0900, Byungchul Park wrote:
> > >  #ifdef CONFIG_SMP
> > >   /* synchronize task with its prev cfs_rq */
> > > - if (!queued)
> > > - __update_load_avg(cfs_rq->avg.last_update_time, 
> > > cpu_of(rq_of(cfs_rq)),
> > > - >avg, se->on_rq * 
> > > scale_load_down(se->load.weight),
> > > - cfs_rq->curr == se, NULL);
> > > -
> > > - /* remove our load when we leave */
> > > - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> > > se->avg.load_avg, 0);
> > > - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> > > se->avg.load_sum, 0);
> > > - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> > > se->avg.util_avg, 0);
> > > - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> > > se->avg.util_sum, 0);
> > > + detach_entity_load_avg(cfs_rq, se);
> > >  #endif
> > 
> > You changed the logic.
> 
> yes, i changed it. but i think that calling __update_load_avg() is not
> a problem even in case of "queued == 1". so i didn't think that change
> seriously.
> 
> wrong? :(
> 

It is not a problem, but any good or maybe any bad? And I would suggest you
add the comment I gave you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
> 
> yuyang said that switched_to don't need to consider task's load because it
> can have meaningless value. but i think considering task's load is better
> than leaving it unattended at all. and we can also use switched_to if we 
> consider task's load in switched_to.

when did I say "don't need to consider..."?

Doing more does not mean better, or just trivial. BTW, the task switched_to
does not have to be switched_from before. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
> > *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> > /* synchronize task with its prev cfs_rq */
> > -   if (!queued)
> > -   __update_load_avg(cfs_rq->avg.last_update_time, 
> > cpu_of(rq_of(cfs_rq)),
> > -   >avg, se->on_rq * 
> > scale_load_down(se->load.weight),
> > -   cfs_rq->curr == se, NULL);
> > -
> > -   /* remove our load when we leave */
> > -   cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> > se->avg.load_avg, 0);
> > -   cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> > se->avg.load_sum, 0);
> > -   cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> > se->avg.util_avg, 0);
> > -   cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> > se->avg.util_sum, 0);
> > +   detach_entity_load_avg(cfs_rq, se);
> >  #endif
> > set_task_rq(p, task_cpu(p));
> > se->depth = se->parent ? se->parent->depth + 1 : 0;
> > @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
> > *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> > /* Virtually synchronize task with its new cfs_rq */
> > -   p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> > -   cfs_rq->avg.load_avg += p->se.avg.load_avg;
> > -   cfs_rq->avg.load_sum += p->se.avg.load_sum;
> > -   cfs_rq->avg.util_avg += p->se.avg.util_avg;
> > -   cfs_rq->avg.util_sum += p->se.avg.util_sum;
> > +   attach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> 
> Can't we go one further and do:
> 
> static void task_move_group_fair(struct task_struct *p)
> {
>   struct rq *rq = task_rq(p);
> 
>   switched_from_fair(rq, p);
>   set_task_rq(p, task_cpu(p);
>   switched_to_fair(rq, p);
> }

yeah, i think this is very nice idea which make code simple.
i will try it.

thanks,
byungchul

> 
> switched_from already does the vruntime and load_avg thing,
> switched_to should do the reverse, although it currently doesn't appear
> to put the load_avg back.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> > @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
> > *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> > /* synchronize task with its prev cfs_rq */
> > -   if (!queued)
> > -   __update_load_avg(cfs_rq->avg.last_update_time, 
> > cpu_of(rq_of(cfs_rq)),
> > -   >avg, se->on_rq * 
> > scale_load_down(se->load.weight),
> > -   cfs_rq->curr == se, NULL);
> > -
> > -   /* remove our load when we leave */
> > -   cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> > se->avg.load_avg, 0);
> > -   cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> > se->avg.load_sum, 0);
> > -   cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> > se->avg.util_avg, 0);
> > -   cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> > se->avg.util_sum, 0);
> > +   detach_entity_load_avg(cfs_rq, se);
> >  #endif
> > set_task_rq(p, task_cpu(p));
> > se->depth = se->parent ? se->parent->depth + 1 : 0;
> > @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
> > *p, int queued)
> >  
> >  #ifdef CONFIG_SMP
> > /* Virtually synchronize task with its new cfs_rq */
> > -   p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> > -   cfs_rq->avg.load_avg += p->se.avg.load_avg;
> > -   cfs_rq->avg.load_sum += p->se.avg.load_sum;
> > -   cfs_rq->avg.util_avg += p->se.avg.util_avg;
> > -   cfs_rq->avg.util_sum += p->se.avg.util_sum;
> > +   attach_entity_load_avg(cfs_rq, se);
> >  #endif
> >  }
> 
> Can't we go one further and do:
> 
> static void task_move_group_fair(struct task_struct *p)
> {
>   struct rq *rq = task_rq(p);
> 
>   switched_from_fair(rq, p);
>   set_task_rq(p, task_cpu(p);
>   switched_to_fair(rq, p);
> }

it looks nice, but i will think more..

> 
> switched_from already does the vruntime and load_avg thing,
> switched_to should do the reverse, although it currently doesn't appear
> to put the load_avg back.

yuyang said that switched_to don't need to consider task's load because it
can have meaningless value. but i think considering task's load is better
than leaving it unattended at all. and we can also use switched_to if we 
consider task's load in switched_to.

thanks,
byungchul

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Peter Zijlstra
On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
> *p, int queued)
>  
>  #ifdef CONFIG_SMP
>   /* synchronize task with its prev cfs_rq */
> - if (!queued)
> - __update_load_avg(cfs_rq->avg.last_update_time, 
> cpu_of(rq_of(cfs_rq)),
> - >avg, se->on_rq * 
> scale_load_down(se->load.weight),
> - cfs_rq->curr == se, NULL);
> -
> - /* remove our load when we leave */
> - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> se->avg.load_avg, 0);
> - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> se->avg.load_sum, 0);
> - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> se->avg.util_avg, 0);
> - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> se->avg.util_sum, 0);
> + detach_entity_load_avg(cfs_rq, se);
>  #endif
>   set_task_rq(p, task_cpu(p));
>   se->depth = se->parent ? se->parent->depth + 1 : 0;
> @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
> *p, int queued)
>  
>  #ifdef CONFIG_SMP
>   /* Virtually synchronize task with its new cfs_rq */
> - p->se.avg.last_update_time = cfs_rq->avg.last_update_time;
> - cfs_rq->avg.load_avg += p->se.avg.load_avg;
> - cfs_rq->avg.load_sum += p->se.avg.load_sum;
> - cfs_rq->avg.util_avg += p->se.avg.util_avg;
> - cfs_rq->avg.util_sum += p->se.avg.util_sum;
> + attach_entity_load_avg(cfs_rq, se);
>  #endif
>  }

Can't we go one further and do:

static void task_move_group_fair(struct task_struct *p)
{
struct rq *rq = task_rq(p);

switched_from_fair(rq, p);
set_task_rq(p, task_cpu(p);
switched_to_fair(rq, p);
}

switched_from already does the vruntime and load_avg thing,
switched_to should do the reverse, although it currently doesn't appear
to put the load_avg back.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 06:41:45AM +0800, Yuyang Du wrote:
> On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> > 
> > currently, a task load is synced with its cfs_rq, only when it
> > leaves from fair class. we also need to sync it with cfs_rq when
> > it returns back to fair class, too.
>  
> Syncing it at the time it is switched to fair is not necessary, because
> since last_update_time if it has ever been updated, the load has become
> random, IOW, useless. So we simply leave it unattended, and let itself
> merge in the system.

hello,

i agree with you almost. it would have a meaningless value over time,
while it has a meaningful value as soon as it leaved that cfs_rq.

however, IMHO, nobody know when a task is switched between sched classes.
i think it would be better that we consider that load rather than leave
it unattended, even though, of course, in both of cases __update_load_avg()
will dacay and fix it over time.

shouldn't we consider it?

1. case returning back to fair class very soon:
original code cannot reflect the task load to cfs_rq, while
patched code can reflect the task load to cfs_rq.

2. case returning back to fair class after long:
original code adds 0 to cfs_rq and let __update_load_avg() fix it, while
patched code adds a meaningless value to cfs_rq and let
__update_load_avg() fix it, afterall these become same.

> 
> >  
> >  #ifdef CONFIG_SMP
> > /* synchronize task with its prev cfs_rq */
> > -   if (!queued)
> > -   __update_load_avg(cfs_rq->avg.last_update_time, 
> > cpu_of(rq_of(cfs_rq)),
> > -   >avg, se->on_rq * 
> > scale_load_down(se->load.weight),
> > -   cfs_rq->curr == se, NULL);
> > -
> > -   /* remove our load when we leave */
> > -   cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> > se->avg.load_avg, 0);
> > -   cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> > se->avg.load_sum, 0);
> > -   cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> > se->avg.util_avg, 0);
> > -   cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> > se->avg.util_sum, 0);
> > +   detach_entity_load_avg(cfs_rq, se);
> >  #endif
> 
> You changed the logic.

yes, i changed it. but i think that calling __update_load_avg() is not
a problem even in case of "queued == 1". so i didn't think that change
seriously.

wrong? :(

thanks,
byungchul

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
> 
> currently, a task load is synced with its cfs_rq, only when it
> leaves from fair class. we also need to sync it with cfs_rq when
> it returns back to fair class, too.
 
Syncing it at the time it is switched to fair is not necessary, because
since last_update_time if it has ever been updated, the load has become
random, IOW, useless. So we simply leave it unattended, and let itself
merge in the system.

>  
>  #ifdef CONFIG_SMP
>   /* synchronize task with its prev cfs_rq */
> - if (!queued)
> - __update_load_avg(cfs_rq->avg.last_update_time, 
> cpu_of(rq_of(cfs_rq)),
> - >avg, se->on_rq * 
> scale_load_down(se->load.weight),
> - cfs_rq->curr == se, NULL);
> -
> - /* remove our load when we leave */
> - cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
> se->avg.load_avg, 0);
> - cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - 
> se->avg.load_sum, 0);
> - cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - 
> se->avg.util_avg, 0);
> - cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - 
> se->avg.util_sum, 0);
> + detach_entity_load_avg(cfs_rq, se);
>  #endif

You changed the logic.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
 On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
  @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
  *p, int queued)
   
   #ifdef CONFIG_SMP
  /* synchronize task with its prev cfs_rq */
  -   if (!queued)
  -   __update_load_avg(cfs_rq-avg.last_update_time, 
  cpu_of(rq_of(cfs_rq)),
  -   se-avg, se-on_rq * 
  scale_load_down(se-load.weight),
  -   cfs_rq-curr == se, NULL);
  -
  -   /* remove our load when we leave */
  -   cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
  se-avg.load_avg, 0);
  -   cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
  se-avg.load_sum, 0);
  -   cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
  se-avg.util_avg, 0);
  -   cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
  se-avg.util_sum, 0);
  +   detach_entity_load_avg(cfs_rq, se);
   #endif
  set_task_rq(p, task_cpu(p));
  se-depth = se-parent ? se-parent-depth + 1 : 0;
  @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
  *p, int queued)
   
   #ifdef CONFIG_SMP
  /* Virtually synchronize task with its new cfs_rq */
  -   p-se.avg.last_update_time = cfs_rq-avg.last_update_time;
  -   cfs_rq-avg.load_avg += p-se.avg.load_avg;
  -   cfs_rq-avg.load_sum += p-se.avg.load_sum;
  -   cfs_rq-avg.util_avg += p-se.avg.util_avg;
  -   cfs_rq-avg.util_sum += p-se.avg.util_sum;
  +   attach_entity_load_avg(cfs_rq, se);
   #endif
   }
 
 Can't we go one further and do:
 
 static void task_move_group_fair(struct task_struct *p)
 {
   struct rq *rq = task_rq(p);
 
   switched_from_fair(rq, p);
   set_task_rq(p, task_cpu(p);
   switched_to_fair(rq, p);
 }

it looks nice, but i will think more..

 
 switched_from already does the vruntime and load_avg thing,
 switched_to should do the reverse, although it currently doesn't appear
 to put the load_avg back.

yuyang said that switched_to don't need to consider task's load because it
can have meaningless value. but i think considering task's load is better
than leaving it unattended at all. and we can also use switched_to if we 
consider task's load in switched_to.

thanks,
byungchul

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 09:46:00AM +0200, Peter Zijlstra wrote:
 On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
  @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
  *p, int queued)
   
   #ifdef CONFIG_SMP
  /* synchronize task with its prev cfs_rq */
  -   if (!queued)
  -   __update_load_avg(cfs_rq-avg.last_update_time, 
  cpu_of(rq_of(cfs_rq)),
  -   se-avg, se-on_rq * 
  scale_load_down(se-load.weight),
  -   cfs_rq-curr == se, NULL);
  -
  -   /* remove our load when we leave */
  -   cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
  se-avg.load_avg, 0);
  -   cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
  se-avg.load_sum, 0);
  -   cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
  se-avg.util_avg, 0);
  -   cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
  se-avg.util_sum, 0);
  +   detach_entity_load_avg(cfs_rq, se);
   #endif
  set_task_rq(p, task_cpu(p));
  se-depth = se-parent ? se-parent-depth + 1 : 0;
  @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
  *p, int queued)
   
   #ifdef CONFIG_SMP
  /* Virtually synchronize task with its new cfs_rq */
  -   p-se.avg.last_update_time = cfs_rq-avg.last_update_time;
  -   cfs_rq-avg.load_avg += p-se.avg.load_avg;
  -   cfs_rq-avg.load_sum += p-se.avg.load_sum;
  -   cfs_rq-avg.util_avg += p-se.avg.util_avg;
  -   cfs_rq-avg.util_sum += p-se.avg.util_sum;
  +   attach_entity_load_avg(cfs_rq, se);
   #endif
   }
 
 Can't we go one further and do:
 
 static void task_move_group_fair(struct task_struct *p)
 {
   struct rq *rq = task_rq(p);
 
   switched_from_fair(rq, p);
   set_task_rq(p, task_cpu(p);
   switched_to_fair(rq, p);
 }

yeah, i think this is very nice idea which make code simple.
i will try it.

thanks,
byungchul

 
 switched_from already does the vruntime and load_avg thing,
 switched_to should do the reverse, although it currently doesn't appear
 to put the load_avg back.
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
 
 currently, a task load is synced with its cfs_rq, only when it
 leaves from fair class. we also need to sync it with cfs_rq when
 it returns back to fair class, too.
 
Syncing it at the time it is switched to fair is not necessary, because
since last_update_time if it has ever been updated, the load has become
random, IOW, useless. So we simply leave it unattended, and let itself
merge in the system.

  
  #ifdef CONFIG_SMP
   /* synchronize task with its prev cfs_rq */
 - if (!queued)
 - __update_load_avg(cfs_rq-avg.last_update_time, 
 cpu_of(rq_of(cfs_rq)),
 - se-avg, se-on_rq * 
 scale_load_down(se-load.weight),
 - cfs_rq-curr == se, NULL);
 -
 - /* remove our load when we leave */
 - cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
 se-avg.load_avg, 0);
 - cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
 se-avg.load_sum, 0);
 - cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
 se-avg.util_avg, 0);
 - cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
 se-avg.util_sum, 0);
 + detach_entity_load_avg(cfs_rq, se);
  #endif

You changed the logic.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Peter Zijlstra
On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
 @@ -8023,16 +8036,7 @@ static void task_move_group_fair(struct task_struct 
 *p, int queued)
  
  #ifdef CONFIG_SMP
   /* synchronize task with its prev cfs_rq */
 - if (!queued)
 - __update_load_avg(cfs_rq-avg.last_update_time, 
 cpu_of(rq_of(cfs_rq)),
 - se-avg, se-on_rq * 
 scale_load_down(se-load.weight),
 - cfs_rq-curr == se, NULL);
 -
 - /* remove our load when we leave */
 - cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
 se-avg.load_avg, 0);
 - cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
 se-avg.load_sum, 0);
 - cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
 se-avg.util_avg, 0);
 - cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
 se-avg.util_sum, 0);
 + detach_entity_load_avg(cfs_rq, se);
  #endif
   set_task_rq(p, task_cpu(p));
   se-depth = se-parent ? se-parent-depth + 1 : 0;
 @@ -8042,11 +8046,7 @@ static void task_move_group_fair(struct task_struct 
 *p, int queued)
  
  #ifdef CONFIG_SMP
   /* Virtually synchronize task with its new cfs_rq */
 - p-se.avg.last_update_time = cfs_rq-avg.last_update_time;
 - cfs_rq-avg.load_avg += p-se.avg.load_avg;
 - cfs_rq-avg.load_sum += p-se.avg.load_sum;
 - cfs_rq-avg.util_avg += p-se.avg.util_avg;
 - cfs_rq-avg.util_sum += p-se.avg.util_sum;
 + attach_entity_load_avg(cfs_rq, se);
  #endif
  }

Can't we go one further and do:

static void task_move_group_fair(struct task_struct *p)
{
struct rq *rq = task_rq(p);

switched_from_fair(rq, p);
set_task_rq(p, task_cpu(p);
switched_to_fair(rq, p);
}

switched_from already does the vruntime and load_avg thing,
switched_to should do the reverse, although it currently doesn't appear
to put the load_avg back.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 06:41:45AM +0800, Yuyang Du wrote:
 On Thu, Aug 13, 2015 at 02:55:55PM +0900, byungchul.p...@lge.com wrote:
  
  currently, a task load is synced with its cfs_rq, only when it
  leaves from fair class. we also need to sync it with cfs_rq when
  it returns back to fair class, too.
  
 Syncing it at the time it is switched to fair is not necessary, because
 since last_update_time if it has ever been updated, the load has become
 random, IOW, useless. So we simply leave it unattended, and let itself
 merge in the system.

hello,

i agree with you almost. it would have a meaningless value over time,
while it has a meaningful value as soon as it leaved that cfs_rq.

however, IMHO, nobody know when a task is switched between sched classes.
i think it would be better that we consider that load rather than leave
it unattended, even though, of course, in both of cases __update_load_avg()
will dacay and fix it over time.

shouldn't we consider it?

1. case returning back to fair class very soon:
original code cannot reflect the task load to cfs_rq, while
patched code can reflect the task load to cfs_rq.

2. case returning back to fair class after long:
original code adds 0 to cfs_rq and let __update_load_avg() fix it, while
patched code adds a meaningless value to cfs_rq and let
__update_load_avg() fix it, afterall these become same.

 
   
   #ifdef CONFIG_SMP
  /* synchronize task with its prev cfs_rq */
  -   if (!queued)
  -   __update_load_avg(cfs_rq-avg.last_update_time, 
  cpu_of(rq_of(cfs_rq)),
  -   se-avg, se-on_rq * 
  scale_load_down(se-load.weight),
  -   cfs_rq-curr == se, NULL);
  -
  -   /* remove our load when we leave */
  -   cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
  se-avg.load_avg, 0);
  -   cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
  se-avg.load_sum, 0);
  -   cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
  se-avg.util_avg, 0);
  -   cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
  se-avg.util_sum, 0);
  +   detach_entity_load_avg(cfs_rq, se);
   #endif
 
 You changed the logic.

yes, i changed it. but i think that calling __update_load_avg() is not
a problem even in case of queued == 1. so i didn't think that change
seriously.

wrong? :(

thanks,
byungchul

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
 
 yuyang said that switched_to don't need to consider task's load because it
 can have meaningless value. but i think considering task's load is better
 than leaving it unattended at all. and we can also use switched_to if we 
 consider task's load in switched_to.

when did I say don't need to consider...?

Doing more does not mean better, or just trivial. BTW, the task switched_to
does not have to be switched_from before. 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Byungchul Park
On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
 On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
  
  yuyang said that switched_to don't need to consider task's load because it
  can have meaningless value. but i think considering task's load is better
  than leaving it unattended at all. and we can also use switched_to if we 
  consider task's load in switched_to.
 
 when did I say don't need to consider...?
 
 Doing more does not mean better, or just trivial. BTW, the task switched_to
 does not have to be switched_from before. 

i see.. i need to add initialization routine for fair class..

thanks,
byungchul

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Yuyang Du
On Thu, Aug 13, 2015 at 04:19:04PM +0900, Byungchul Park wrote:
#ifdef CONFIG_SMP
 /* synchronize task with its prev cfs_rq */
   - if (!queued)
   - __update_load_avg(cfs_rq-avg.last_update_time, 
   cpu_of(rq_of(cfs_rq)),
   - se-avg, se-on_rq * 
   scale_load_down(se-load.weight),
   - cfs_rq-curr == se, NULL);
   -
   - /* remove our load when we leave */
   - cfs_rq-avg.load_avg = max_t(long, cfs_rq-avg.load_avg - 
   se-avg.load_avg, 0);
   - cfs_rq-avg.load_sum = max_t(s64, cfs_rq-avg.load_sum - 
   se-avg.load_sum, 0);
   - cfs_rq-avg.util_avg = max_t(long, cfs_rq-avg.util_avg - 
   se-avg.util_avg, 0);
   - cfs_rq-avg.util_sum = max_t(s32, cfs_rq-avg.util_sum - 
   se-avg.util_sum, 0);
   + detach_entity_load_avg(cfs_rq, se);
#endif
  
  You changed the logic.
 
 yes, i changed it. but i think that calling __update_load_avg() is not
 a problem even in case of queued == 1. so i didn't think that change
 seriously.
 
 wrong? :(
 

It is not a problem, but any good or maybe any bad? And I would suggest you
add the comment I gave you.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: sync with the cfs_rq when changing sched class

2015-08-13 Thread Peter Zijlstra
On Thu, Aug 13, 2015 at 10:15:28AM +0800, Yuyang Du wrote:
 On Thu, Aug 13, 2015 at 05:21:27PM +0900, Byungchul Park wrote:
  
  yuyang said that switched_to don't need to consider task's load because it
  can have meaningless value. but i think considering task's load is better
  than leaving it unattended at all. and we can also use switched_to if we 
  consider task's load in switched_to.
 
 when did I say don't need to consider...?
 
 Doing more does not mean better, or just trivial. BTW, the task switched_to
 does not have to be switched_from before. 

Correct, there's a few corner cases we need to consider.

However, I think we unconditionally call init_entity_runnable_average()
on all tasks, regardless of their 'initial' sched class, so it should
have a valid state.

Another thing to consider is the state being very stale, suppose it
started live as FAIR, ran for a bit, got switched to !FAIR by means of
sys_sched_setscheduler()/sys_sched_setattr() or similar, runs for a long
time and for some reason gets switched back to FAIR, we need to age and
or re-init things.

I _think_ we can use last_update_time for that, but I've not looked too
hard.

That is, age based on last_update_time, if all 0, reinit, or somesuch.


The most common case of switched_from()/switched_to() is Priority
Inheritance, and that typically results in very short lived stints as
!FAIR and the avg data should be still accurate by the time we return.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/