topology: fix sched groups on NUMA machines with mesh topology

Lauro Venancio Mon, 17 Apr 2017 07:41:23 -0700

On 04/14/2017 01:58 PM, Peter Zijlstra wrote:
> On Fri, Apr 14, 2017 at 01:38:13PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 13, 2017 at 10:56:08AM -0300, Lauro Ramos Venancio wrote:
>>> This patch constructs the sched groups from each CPU perspective. So, on
>>> a 4 nodes machine with ring topology, while nodes 0 and 2 keep the same
>>> groups as before [(3, 0, 1)(1, 2, 3)], nodes 1 and 3 have new groups
>>> [(0, 1, 2)(2, 3, 0)]. This allows moving tasks between any node 2-hops
>>> apart.
>> Ah,.. so after drawing pictures I see what went wrong; duh :-(
>>
>> An equivalent patch would be (if for_each_cpu_wrap() were exposed):
>>
>> @@ -521,11 +588,11 @@ build_overlap_sched_groups(struct sched_domain *sd, 
>> int cpu)
>>      struct cpumask *covered = sched_domains_tmpmask;
>>      struct sd_data *sdd = sd->private;
>>      struct sched_domain *sibling;
>> -    int i;
>> +    int i, wrap;
>>  
>>      cpumask_clear(covered);
>>  
>> -    for_each_cpu(i, span) {
>> +    for_each_cpu_wrap(i, span, cpu, wrap) {
>>              struct cpumask *sg_span;
>>  
>>              if (cpumask_test_cpu(i, covered))
>>
>>
>> We need to start iterating at @cpu, not start at 0 every time.
>>
>>
> OK, please have a look here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core


Looks good, but please hold these patches while patch 3 is not applied.
Without it, the sched_group_capacity (sg->sgc) instance is not selected
correctly and we have an important performance regression in all NUMA
machines.

I will continue this discussion in the other thread.

Re: [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology

Reply via email to