tipping point indicator

Quentin Perret Wed, 18 Apr 2018 04:18:09 -0700

On Friday 13 Apr 2018 at 16:56:39 (-0700), Joel Fernandes wrote:
> Hi,
> 
> On Fri, Apr 6, 2018 at 8:36 AM, Dietmar Eggemann
> <[email protected]> wrote:
> > From: Thara Gopinath <[email protected]>
> >
> > Energy-aware scheduling should only operate when the system is not
> > overutilized. There must be cpu time available to place tasks based on
> > utilization in an energy-aware fashion, i.e. to pack tasks on
> > energy-efficient cpus without harming the overall throughput.
> >
> > In case the system operates above this tipping point the tasks have to
> > be placed based on task and cpu load in the classical way of spreading
> > tasks across as many cpus as possible.
> >
> > The point in which a system switches from being not overutilized to
> > being overutilized is called the tipping point.
> >
> > Such a tipping point indicator on a sched domain as the system
> > boundary is introduced here. As soon as one cpu of a sched domain is
> > overutilized the whole sched domain is declared overutilized as well.
> > A cpu becomes overutilized when its utilization is higher that 80%
> > (capacity_margin) of its capacity.
> >
> > The implementation takes advantage of the shared sched domain which is
> > shared across all per-cpu views of a sched domain level. The new
> > overutilized flag is placed in this shared sched domain.
> >
> > Load balancing is skipped in case the energy model is present and the
> > sched domain is not overutilized because under this condition the
> > predominantly load-per-capacity driven load-balancer should not
> > interfere with the energy-aware wakeup placement based on utilization.
> >
> > In case the total utilization of a sched domain is greater than the
> > total sched domain capacity the overutilized flag is set at the parent
> > sched domain level to let other sched groups help getting rid of the
> > overutilization of cpus.
> >
> > Signed-off-by: Thara Gopinath <[email protected]>
> > Signed-off-by: Dietmar Eggemann <[email protected]>
> > ---
> >  include/linux/sched/topology.h |  1 +
> >  kernel/sched/fair.c            | 62 
> > ++++++++++++++++++++++++++++++++++++++++--
> >  kernel/sched/sched.h           |  1 +
> >  kernel/sched/topology.c        | 12 +++-----
> >  4 files changed, 65 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index 26347741ba50..dd001c232646 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -72,6 +72,7 @@ struct sched_domain_shared {
> >         atomic_t        ref;
> >         atomic_t        nr_busy_cpus;
> >         int             has_idle_cores;
> > +       int             overutilized;
> >  };
> >
> >  struct sched_domain {
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0a76ad2ef022..6960e5ef3c14 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5345,6 +5345,28 @@ static inline void hrtick_update(struct rq *rq)
> >  }
> >  #endif
> >
> > +#ifdef CONFIG_SMP
> > +static inline int cpu_overutilized(int cpu);
> > +
> > +static inline int sd_overutilized(struct sched_domain *sd)
> > +{
> > +       return READ_ONCE(sd->shared->overutilized);
> > +}
> > +
> > +static inline void update_overutilized_status(struct rq *rq)
> > +{
> > +       struct sched_domain *sd;
> > +
> > +       rcu_read_lock();
> > +       sd = rcu_dereference(rq->sd);
> > +       if (sd && !sd_overutilized(sd) && cpu_overutilized(rq->cpu))
> > +               WRITE_ONCE(sd->shared->overutilized, 1);
> > +       rcu_read_unlock();
> > +}
> > +#else
> > +static inline void update_overutilized_status(struct rq *rq) {}
> > +#endif /* CONFIG_SMP */
> > +
> >  /*
> >   * The enqueue_task method is called before nr_running is
> >   * increased. Here we update the fair scheduling stats and
> > @@ -5394,8 +5416,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
> > *p, int flags)
> >                 update_cfs_group(se);
> >         }
> >
> > -       if (!se)
> > +       if (!se) {
> >                 add_nr_running(rq, 1);
> > +               update_overutilized_status(rq);
> > +       }
> 
> I'm wondering if it makes sense for considering scenarios whether
> other classes cause CPUs in the domain to go above the tipping point.
> Then in that case also, it makes sense to not to do EAS in that domain
> because of the overutilization.
> 
> I guess task_fits using cpu_util which is PELT only at the moment...
> so may require some other method like aggregation of CFS PELT, with
> RT-PELT and DL running bw or something.
>


So at the moment in cpu_overutilized() we comapre cpu_util() to
capacity_of() which should include RT and IRQ pressure IIRC. But
you're right, we might be able to do more here... Perhaps we
could also use cpu_util_dl() which is available in sched.h now ?

Re: [RFC PATCH v2 3/6] sched: Add over-utilization/tipping point indicator

Reply via email to