Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
On Wed, Feb 15, 2017 at 1:49 PM, Michael S. Tsirkin wrote: > The logic is simple really. With #VCPUs == #queues we can reasonably > assume this box is mostly doing networking so we can set affinity > the way we like. With VCPUs > queues clearly VM is doing more stuff > so we need a userspace policy to take that into account, > we don't know ourselves what is the right thing to do. > > Arguably for #VCPUs == #queues we are not always doing the right thing > either but I see this as an argument to move more smarts > into core kernel not for adding more dumb heuristics in the driver. Thanks for all the feedback, Michael. We'll drop this patch and move to user mode. Thanks, Ben
Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
On Wed, Feb 15, 2017 at 11:17 AM, Michael S. Tsirkin wrote: > Right. But userspace knows it's random at least. If kernel supplies > affinity e.g. the way your patch does, userspace ATM accepts this as a > gospel. The existing code supplies the same affinity gospels in the #vcpu == #queue case today. And the patch (unless it has a bug in it) should not affect the #vcpu == #queue case's behavior. I don't quite understand what property we'd be changing with the patch. Here's the same dump of smp_affinity_list, on a 16 VCPU machine with unmodified kernel: 0 0 1 1 2 2 [..] 15 15 And xps_cpus 0001 0002 [...] 8000 This patch causes #vcpu != #queue case to follow the same pattern. Thanks again! Ben
Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
On Wed, Feb 15, 2017 at 9:42 AM, Michael S. Tsirkin wrote: > > > > For pure network load, assigning each txqueue IRQ exclusively > > to one of the cores that generates traffic on that queue is the > > optimal layout in terms of load spreading. Irqbalance does > > not have the XPS information to make this optimal decision. > > Try to add hints for it? > > > > Overall system load affects this calculation both in the case of 1:1 > > mapping uneven queue distribution. In both cases, irqbalance > > is hopefully smart enough to migrate other non-pinned IRQs to > > cpus with lower overall load. > > Not if everyone starts inserting hacks like this one in code. It seems to me that the default behavior is equally "random" - why would we want XPS striped across the cores the way it's done today? What we're trying to do here is avoid the surprise cliff that guests will hit when queue count is limited to less than VCPU count. That will happen because we limit queue pair count to 32. I'll happily push further complexity to user mode. If this won't fly, we can leave all of this behavior in user code. Michael, would you prefer that I abandon this patch? > That's another problem with this patch. If you care about hyperthreads > you want an API to probe for that. It's something of a happy accident that hyperthreads line up that way. Keeping the topology knowledge out of the patch and into user space seems cleaner, would you agree? Thanks! Ben
Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin wrote: > IIRC irqbalance will bail out and avoid touching affinity > if you set affinity from driver. Breaking that's not nice. > Pls correct me if I'm wrong. I believe you're right that irqbalance will leave the affinity alone. Irqbalance has had changes that may or may not be in the versions bundled with various guests, and I don't have a definitive cross-correlation of irqbalance version to guest version. But in the existing code, the driver does set affinity for #VCPUs==#queues, so that's been happening anyway. The (original) intention of this patch was to extend the existing behavior to the case where we limit queue counts, to avoid the surprising discontinuity when #VCPU != #queues. It's not obvious that it's wrong to cause irqbalance to leave these queues alone: Generally you want the interrupt to come to the core that caused the work, to have cache locality and avoid lock contention. Doing fancier things is outside the scope of this patch. > Doesn't look like this will handle the case of num cpus < num queues well. I believe it's correct. The first #VCPUs queues will have one bit set in their xps mask, and the remaining queues have no bits set. That means each VCPU uses its own assigned TX queue (and the TX interrupt comes back to that VCPU). Thanks again for the review! Ben
Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs
On Sun, Feb 5, 2017 at 11:24 PM, Jason Wang wrote: > > > On 2017年02月03日 14:19, Ben Serebrin wrote: >> >> From: Benjamin Serebrin >> >> If the number of virtio queue pairs is not equal to the >> number of VCPUs, the virtio guest driver doesn't assign >> any CPU affinity for the queue interrupts or the xps >> aggregation interrupt. > > > So this in fact is not a affinity fixing for #cpus > 32 but adding affinity > for #cpus != #queue pairs. Fair enough. I'll adjust the title line in the subsequent version. > >> Google Compute Engine currently provides 1 queue pair for >> every VCPU, but limits that at a maximum of 32 queue pairs. >> >> This code assigns interrupt affinity even when there are more than >> 32 VCPUs. >> >> Tested: >> >> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) >> >> Without the fix we see all queues affinitized to all CPUs: > > > [...] > >> + /* If there are more cpus than queues, then assign the queues' >> +* interrupts to the first cpus until we run out. >> +*/ >> i = 0; >> for_each_online_cpu(cpu) { >> + if (i == vi->max_queue_pairs) >> + break; >> virtqueue_set_affinity(vi->rq[i].vq, cpu); >> virtqueue_set_affinity(vi->sq[i].vq, cpu); >> - netif_set_xps_queue(vi->dev, cpumask_of(cpu), i); >> i++; >> } >> + /* Stripe the XPS affinities across the online CPUs. >> +* Hyperthread pairs are typically assigned such that Linux's >> +* CPU X and X + (numcpus / 2) are hyperthread twins, so we cause >> +* hyperthread twins to share TX queues, in the case where there >> are >> +* more cpus than queues. > > > Since we use combined queue pairs, why not use the same policy for RX? XPS is for transmit only. > Thanks > > >> +*/ >> + for (i = 0; i < vi->max_queue_pairs; i++) { >> + struct cpumask mask; >> + int skip = i; >> + >> + cpumask_clear(&mask); >> + for_each_online_cpu(cpu) { >> + while (skip--) >> + cpu = cpumask_next(cpu, cpu_online_mask); >> + if (cpu < num_possible_cpus()) >> + cpumask_set_cpu(cpu, &mask); >> + skip = vi->max_queue_pairs - 1; >> + } >> + netif_set_xps_queue(vi->dev, &mask, i); >> + } >> + >> vi->affinity_hint_set = true; >> } >> > >
Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs
Thanks, Michael, I'll put this text in the commit log: XPS settings aren't write-able from userspace, so the only way I know to fix XPS is in the driver. The interrupt affinity settings could be set by irqbalancer, yes, but it's not always enabled (by intention) in all guests. I just confirmed that on an unpatched 64VCPU VM with no irqbalancer, all virtio-net interrupts end up on core 0, and on a patched 64VCPU, interrupts end up on the core whose number matches the queue number. This patch is just extending the existing behavior in the interrupt affinity assignment loop, and I think it stands on its own. It actually sparked some internal discussion about further intelligence in XPS and interrupt affinity, and we'll do some experiments and come back with further patches. On Fri, Feb 3, 2017 at 7:07 AM, Michael S. Tsirkin wrote: > On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote: >> From: Benjamin Serebrin >> >> If the number of virtio queue pairs is not equal to the >> number of VCPUs, the virtio guest driver doesn't assign >> any CPU affinity for the queue interrupts or the xps >> aggregation interrupt. >> >> Google Compute Engine currently provides 1 queue pair for >> every VCPU, but limits that at a maximum of 32 queue pairs. >> >> This code assigns interrupt affinity even when there are more than >> 32 VCPUs. >> >> Tested: >> >> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2) >> >> Without the fix we see all queues affinitized to all CPUs: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> 0-63 >> [...] >> 0-63 >> >> and we see all TX queues' xps_cpus affinitzed to no cores: >> >> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done >> , >> [...] >> , >> >> With the fix, we see each queue assigned to the a single core, >> and xps affinity set to 1 unique cpu per TX queue. >> >> 64 VCPU: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> >> 0-63 >> 0 >> 0 >> 1 >> 1 >> 2 >> 2 >> 3 >> 3 >> 4 >> 4 >> 5 >> 5 >> 6 >> 6 >> 7 >> 7 >> 8 >> 8 >> 9 >> 9 >> 10 >> 10 >> 11 >> 11 >> 12 >> 12 >> 13 >> 13 >> 14 >> 14 >> 15 >> 15 >> 16 >> 16 >> 17 >> 17 >> 18 >> 18 >> 19 >> 19 >> 20 >> 20 >> 21 >> 21 >> 22 >> 22 >> 23 >> 23 >> 24 >> 24 >> 25 >> 25 >> 26 >> 26 >> 27 >> 27 >> 28 >> 28 >> 29 >> 29 >> 30 >> 30 >> 31 >> 31 >> 0-63 >> 0-63 >> 0-63 >> 0-63 >> >> cd /sys/class/net/eth0/queues >> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done >> >> 0001,0001 >> 0002,0002 >> 0004,0004 >> 0008,0008 >> 0010,0010 >> 0020,0020 >> 0040,0040 >> 0080,0080 >> 0100,0100 >> 0200,0200 >> 0400,0400 >> 0800,0800 >> 1000,1000 >> 2000,2000 >> 4000,4000 >> 8000,8000 >> 0001,0001 >> 0002,0002 >> 0004,0004 >> 0008,0008 >> 0010,0010 >> 0020,0020 >> 0040,0040 >> 0080,0080 >> 0100,0100 >> 0200,0200 >> 0400,0400 >> 0800,0800 >> 1000,1000 >> 2000,2000 >> 4000,4000 >> 8000,8000 >> >> 48 VCPU: >> >> cd /proc/irq >> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list; done >> 0-47 >> 0 >> 0 >> 1 >> 1 >> 2 >> 2 >> 3 >> 3 >> 4 >> 4 >> 5 >> 5 >> 6 >> 6 >> 7 >> 7 >> 8 >> 8 >> 9 >> 9 >> 10 >> 10 >> 11 >> 11 >> 12 >> 12 >> 13 >> 13 >> 14 >> 14 >> 15 >> 15 >> 16 >> 16 >> 17 >> 17 >> 18 >> 18 >> 19 >> 19 >> 20 >> 20 >> 21 >> 21 >> 22 >> 22 >> 23 >> 23 >> 24 >> 24 >> 25 >> 25 >> 26 >> 26 >> 27 >> 27 >> 28 >> 28 >> 29