Hi Barry, On 08/02/21 10:04, Song Bao Hua (Barry Song) wrote: >> -----Original Message----- >> From: Valentin Schneider [mailto:valentin.schnei...@arm.com]
> > Hi Valentin, > > While I like your approach, this will require more time > to evaluate possible influence as the approach also affects > all machines without 3-hops issue. So x86 platforms need to > be tested and benchmark is required. > > What about we firstly finish the review of "grandchild" approach > v2 and have a solution for kunpeng920 and Sun Fire X4600-M2 > while not impacting other machines which haven't 3-hops issues > first? > I figured I'd toss this out while the iron was hot (and I had the topology crud paged in), but I ultimately agree that it's better to first go with something that fixes the diameter > 2 topologies and leaves the other ones untouched, which is exactly what you have. > I would appreciate very much if you could comment on v2: > https://lore.kernel.org/lkml/20210203111201.20720-1-song.bao....@hisilicon.com/ > See my comment below on domain degeneration; with that taken care of I would say it's good to go. Have a look at what patch1+patch3 squashed together looks like, passing the right sd to init_overlap_sched_group() looks a bit neater IMO. >> +static struct sched_domain *find_node_domain(struct sched_domain *sd) >> +{ >> + struct sched_domain *parent; >> + >> + BUG_ON(!(sd->flags & SD_NUMA)); >> + >> + /* Get to the level above NODE */ >> + while (sd && sd->child) { >> + parent = sd; >> + sd = sd->child; >> + >> + if (!(sd->flags & SD_NUMA)) >> + break; >> + } >> + /* >> + * We're going to create cross topology level sched_group_capacity >> + * references. This can only work if the domains resulting from said >> + * levels won't be degenerated, as we need said sgc to be periodically >> + * updated: it needs to be attached to the local group of a domain >> + * that didn't get degenerated. >> + * >> + * Of course, groups aren't available yet, so we can't call the usual >> + * sd_degenerate(). Checking domain spans is the closest we get. >> + * Start from NODE's parent, and keep going up until we get a domain >> + * we're sure won't be degenerated. >> + */ >> + while (sd->parent && >> + cpumask_equal(sched_domain_span(sd), sched_domain_span(parent))) >> { >> + sd = parent; >> + parent = sd->parent; >> + } > > So this is because the sched_domain which doesn't contribute to scheduler > will be destroyed during cpu_attach_domain() since sd and parent span > the seam mask? > Yes; let's take your topology for instance: node 0 1 2 3 0: 10 12 20 22 1: 12 10 22 24 2: 20 22 10 12 3: 22 24 12 10 2 10 2 0 <---> 1 <---> 2 <---> 3 Domains for node1 will look like (before any fixes are applied): NUMA<=10: span=1 groups=(1) NUMA<=12: span=0-1 groups=(1)->(0) NUMA<=20: span=0-1 groups=(0,1) NUMA<=22: span=0-2 groups=(0,1)->(0,2-3) NUMA<=24: span=0-3 groups=(0-2)->(0,2-3) As you can see, the domain representing distance <= 20 will be degenerated (it has a single group). If we were to e.g. add some more nodes to the left of node0, then we would trigger the "grandchildren logic" for node1 and would end up creating a reference to node1 NUMA<=20's sgc, which is a mistake: that domain will be degenerated, and that sgc will never be updated. The right thing to do here would be reference node1 NUMA<=12's sgc, which the above snippet does. >> + >> + return parent; >> +} >> +