On Wed, May 11, 2016 at 12:55:56PM +0100, Matt Fleming wrote:
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7842,13 +7842,13 @@ static inline void set_cpu_sd_state_busy
> >     int cpu = smp_processor_id();
> >  
> >     rcu_read_lock();
> > -   sd = rcu_dereference(per_cpu(sd_busy, cpu));
> > +   sd = rcu_dereference(per_cpu(sd_llc, cpu));
> >  
> >     if (!sd || !sd->nohz_idle)
> >             goto unlock;
> >     sd->nohz_idle = 0;
> >  
> > -   atomic_inc(&sd->groups->sgc->nr_busy_cpus);
> > +   atomic_inc(&sd->shared->nr_busy_cpus);
> >  unlock:
> >     rcu_read_unlock();
> >  }
> 
> This breaks my POWER7 box which presumably doesn't have 
> SD_SHARE_PKG_RESOURCES,
> 

Hmm, PPC folks; what does your topology look like?

Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
seems to suggest your cores do not share cache at all.

https://en.wikipedia.org/wiki/POWER7 seems to agree and states

  "4 MB L3 cache per C1 core"

And 
http://www-03.ibm.com/systems/resources/systems_power_software_i_perfmgmt_underthehood.pdf
also explicitly draws pictures with the L3 per core.

_however_, that same document describes L3 inter-core fill and lateral
cast-out, which sounds like the L3s work together to form a node wide
caching system.

Do we want to model this co-operative L3 slices thing as a sort of
node-wide LLC for the purpose of the scheduler ?

While we should definitely fix the assumption that an LLC exists (and I
need to look at why it isn't set to the core domain instead as well),
the scheduler does try and scale things by 'assuming' LLC := node.

It does this for NOHZ, and these here patches under discussion would be
doing the same for idle-core state.

Would this make sense for power, or should we somehow think of something
else?

Reply via email to