Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Benjamin Serebrin
On Wed, Feb 15, 2017 at 1:49 PM, Michael S. Tsirkin  wrote:

> The logic is simple really. With #VCPUs == #queues we can reasonably
> assume this box is mostly doing networking so we can set affinity
> the way we like. With VCPUs > queues clearly VM is doing more stuff
> so we need a userspace policy to take that into account,
> we don't know ourselves what is the right thing to do.
>
> Arguably for #VCPUs == #queues we are not always doing the right thing
> either but I see this as an argument to move more smarts
> into core kernel not for adding more dumb heuristics in the driver.

Thanks for all the feedback, Michael.  We'll drop this patch and move
to user mode.

Thanks,
Ben


Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Benjamin Serebrin
On Wed, Feb 15, 2017 at 11:17 AM, Michael S. Tsirkin  wrote:

> Right. But userspace knows it's random at least. If kernel supplies
> affinity e.g. the way your patch does, userspace ATM accepts this as a
> gospel.

The existing code supplies the same affinity gospels in the #vcpu ==
#queue case today.
And the patch (unless it has a bug in it) should not affect the #vcpu
== #queue case's
behavior.  I don't quite understand what property we'd be changing
with the patch.

Here's the same dump of smp_affinity_list, on a 16 VCPU machine with
unmodified kernel:

0
0
1
1
2
2
[..]
15
15

And xps_cpus
0001
0002
[...]
8000

This patch causes #vcpu != #queue case to follow the same pattern.


Thanks again!
Ben


Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Benjamin Serebrin
On Wed, Feb 15, 2017 at 9:42 AM, Michael S. Tsirkin  wrote:
>
>
> > For pure network load, assigning each txqueue IRQ exclusively
> > to one of the cores that generates traffic on that queue is the
> > optimal layout in terms of load spreading. Irqbalance does
> > not have the XPS information to make this optimal decision.
>
> Try to add hints for it?
>
>
> > Overall system load affects this calculation both in the case of 1:1
> > mapping uneven queue distribution. In both cases, irqbalance
> > is hopefully smart enough to migrate other non-pinned IRQs to
> > cpus with lower overall load.
>
> Not if everyone starts inserting hacks like this one in code.


It seems to me that the default behavior is equally "random" - why would we want
XPS striped across the cores the way it's done today?

What we're trying to do here is avoid the surprise cliff that guests will
hit when queue count is limited to less than VCPU count.  That will
happen because
we limit queue pair count to 32.  I'll happily push further complexity
to user mode.

If this won't fly, we can leave all of this behavior in user code.
Michael, would
you prefer that I abandon this patch?

> That's another problem with this patch. If you care about hyperthreads
> you want an API to probe for that.


It's something of a happy accident that hyperthreads line up that way.
Keeping the topology knowledge out of the patch and into user space seems
cleaner, would you agree?

Thanks!
Ben


Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-14 Thread Benjamin Serebrin
On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin  wrote:

> IIRC irqbalance will bail out and avoid touching affinity
> if you set affinity from driver.  Breaking that's not nice.
> Pls correct me if I'm wrong.


I believe you're right that irqbalance will leave the affinity alone.

Irqbalance has had changes that may or may not be in the versions bundled with
various guests, and I don't have a definitive cross-correlation of irqbalance
version to guest version.  But in the existing code, the driver does
set affinity for #VCPUs==#queues, so that's been happening anyway.

The (original) intention of this patch was to extend the existing behavior
to the case where we limit queue counts, to avoid the surprising discontinuity
when #VCPU != #queues.

It's not obvious that it's wrong to cause irqbalance to leave these
queues alone:  Generally you want the interrupt to come to the core that
caused the work, to have cache locality and avoid lock contention.
Doing fancier things is outside the scope of this patch.

> Doesn't look like this will handle the case of num cpus < num queues well.

I believe it's correct.  The first #VCPUs queues will have one bit set in their
xps mask, and the remaining queues have no bits set.  That means each VCPU uses
its own assigned TX queue (and the TX interrupt comes back to that VCPU).

Thanks again for the review!
Ben


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-05 Thread Benjamin Serebrin
On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang  wrote:
>
>
> On 2017年02月03日 14:19, Ben Serebrin wrote:
>>
>> From: Benjamin Serebrin 
>>
>> If the number of virtio queue pairs is not equal to the
>> number of VCPUs, the virtio guest driver doesn't assign
>> any CPU affinity for the queue interrupts or the xps
>> aggregation interrupt.
>
>
> So this in fact is not a affinity fixing for #cpus > 32 but adding  affinity
> for #cpus != #queue pairs.

Fair enough.  I'll adjust the title line in the subsequent version.


>
>> Google Compute Engine currently provides 1 queue pair for
>> every VCPU, but limits that at a maximum of 32 queue pairs.
>>
>> This code assigns interrupt affinity even when there are more than
>> 32 VCPUs.
>>
>> Tested:
>>
>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
>>
>> Without the fix we see all queues affinitized to all CPUs:
>
>
> [...]
>
>>   + /* If there are more cpus than queues, then assign the queues'
>> +* interrupts to the first cpus until we run out.
>> +*/
>> i = 0;
>> for_each_online_cpu(cpu) {
>> +   if (i == vi->max_queue_pairs)
>> +   break;
>> virtqueue_set_affinity(vi->rq[i].vq, cpu);
>> virtqueue_set_affinity(vi->sq[i].vq, cpu);
>> -   netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
>> i++;
>> }
>>   + /* Stripe the XPS affinities across the online CPUs.
>> +* Hyperthread pairs are typically assigned such that Linux's
>> +* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
>> +* hyperthread twins to share TX queues, in the case where there
>> are
>> +* more cpus than queues.
>
>
> Since we use combined queue pairs, why not use the same policy for RX?

XPS is for transmit only.


> Thanks
>
>
>> +*/
>> +   for (i = 0; i < vi->max_queue_pairs; i++) {
>> +   struct cpumask mask;
>> +   int skip = i;
>> +
>> +   cpumask_clear(&mask);
>> +   for_each_online_cpu(cpu) {
>> +   while (skip--)
>> +   cpu = cpumask_next(cpu, cpu_online_mask);
>> +   if (cpu < num_possible_cpus())
>> +   cpumask_set_cpu(cpu, &mask);
>> +   skip = vi->max_queue_pairs - 1;
>> +   }
>> +   netif_set_xps_queue(vi->dev, &mask, i);
>> +   }
>> +
>> vi->affinity_hint_set = true;
>>   }
>>
>
>


Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

2017-02-03 Thread Benjamin Serebrin
Thanks, Michael, I'll put this text in the commit log:

XPS settings aren't write-able from userspace, so the only way I know
to fix XPS is in the driver.

The interrupt affinity settings could be set by irqbalancer, yes, but
it's not always enabled
(by intention) in all guests.  I just confirmed that on an unpatched
64VCPU VM with no irqbalancer,
all virtio-net interrupts end up on core 0, and on a patched 64VCPU,
interrupts end up on the core whose
number matches the queue number.  This patch is just extending the
existing behavior in the interrupt affinity
assignment loop, and I think it stands on its own.

It actually sparked some internal discussion about further
intelligence in XPS and interrupt affinity, and
we'll do some experiments and come back with further patches.

On Fri, Feb 3, 2017 at 7:07 AM, Michael S. Tsirkin  wrote:
> On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote:
>> From: Benjamin Serebrin 
>>
>> If the number of virtio queue pairs is not equal to the
>> number of VCPUs, the virtio guest driver doesn't assign
>> any CPU affinity for the queue interrupts or the xps
>> aggregation interrupt.
>>
>> Google Compute Engine currently provides 1 queue pair for
>> every VCPU, but limits that at a maximum of 32 queue pairs.
>>
>> This code assigns interrupt affinity even when there are more than
>> 32 VCPUs.
>>
>> Tested:
>>
>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
>>
>> Without the fix we see all queues affinitized to all CPUs:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-63
>> [...]
>> 0-63
>>
>> and we see all TX queues' xps_cpus affinitzed to no cores:
>>
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
>> ,
>> [...]
>> ,
>>
>> With the fix, we see each queue assigned to the a single core,
>> and xps affinity set to 1 unique cpu per TX queue.
>>
>> 64 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>>
>> 0-63
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29
>> 29
>> 30
>> 30
>> 31
>> 31
>> 0-63
>> 0-63
>> 0-63
>> 0-63
>>
>> cd /sys/class/net/eth0/queues
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
>>
>> 0001,0001
>> 0002,0002
>> 0004,0004
>> 0008,0008
>> 0010,0010
>> 0020,0020
>> 0040,0040
>> 0080,0080
>> 0100,0100
>> 0200,0200
>> 0400,0400
>> 0800,0800
>> 1000,1000
>> 2000,2000
>> 4000,4000
>> 8000,8000
>> 0001,0001
>> 0002,0002
>> 0004,0004
>> 0008,0008
>> 0010,0010
>> 0020,0020
>> 0040,0040
>> 0080,0080
>> 0100,0100
>> 0200,0200
>> 0400,0400
>> 0800,0800
>> 1000,1000
>> 2000,2000
>> 4000,4000
>> 8000,8000
>>
>> 48 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-47
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29