fair: use the idle state info to choose the idlest cpu

Nicolas Pitre Thu, 17 Apr 2014 09:22:06 -0700

On Thu, 17 Apr 2014, Daniel Lezcano wrote:

> On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> > On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> >
> > > Ok, refreshed the patchset but before sending it out I would to discuss
> > > about
> > > the rational of the changes and the policy, and change the patchset
> > > consequently.
> > >
> > > What order to choose if the cpu is idle ?
> > >
> > > Let's assume all cpus are idle on a dual socket quad core.
> > >
> > > Also, we can reasonably do the hypothesis if the cluster is in low power
> > > mode,
> > > the cpus belonging to the same cluster are in the same idle state (putting
> > > apart the auto-promote where we don't have control on).
> > >
> > > If the policy you talk above is 'aggressive power saving', we can follow
> > > the
> > > rules with decreasing priority:
> > >
> > > 1. We want to prevent to wakeup the entire cluster
> > > => as the cpus are in the same idle state, by choosing a cpu in
> > > => shallow
> > > state, we should have the guarantee we won't wakeup a cluster (except if
> > > no
> > > shallowest idle cpu are found).
> >
> > This is unclear to me.  Obviously, if an entire cluster is down, that
> > means all the CPUs it contains have been idle for a long time.  And
> > therefore they shouldn't be subject to selection unless there is no
> > other CPUs available.  Is that what you mean?
> 
> Yes, this is what I meant. But also what I meant is we can get rid for the
> moment of the cpu topology and the coupling idle state because if we do this
> described approach, as the idle state will be the same for the cpus belonging
> to the same cluster we won't select a cluster down (except if there is no
> other CPUs available).


CPU topology is needed to properly describe scheduling domains.  Whether 
we balance across domains or pack using as few domains as possible is a 
separate issue.  In other words, you shouldn't have to care in this 
patch series.

And IMHO coupled C-state is a low-level mechanism that should remain 
private to cpuidle which the scheduler shouldn't be aware of.

> > > 2. We want to prevent to wakeup a cpu which did not reach the target
> > > residency
> > > time (will need some work to unify cpuidle idle time and idle task run
> > > time)
> > > => with the target residency and, as a first step, with the idle
> > > => stamp,
> > > we can determine if the cpu slept enough
> >
> > Agreed. However, right now, the scheduler does not have any
> > consideration for that.  So this should be done as a separate patch.
> 
> Yes, I thought as a very first step we can rely on the idle stamp until we
> unify the times with a big comment. Or I can first unify the idle times and
> then take into account the target residency. It is to comply with Rafael's
> request to have the 'big picture'.

I agree, but that should be done incrementally.  Even without this 
consideration, what you proposed is already an improvement over the 
current state of affairs.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu

Reply via email to