Re: [net-next RFC PATCH 0/5] Series short description
On Thu, 15 Dec 2011 01:36:44 +, Ben Hutchings wrote: > On Fri, 2011-12-09 at 16:01 +1030, Rusty Russell wrote: > > On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings > > wrote: > > > Most multi-queue controllers could support a kind of hash-based > > > filtering for TCP/IP by adjusting the RSS indirection table. However, > > > this table is usually quite small (64-256 entries). This means that > > > hash collisions will be quite common and this can result in reordering. > > > The same applies to the small table Jason has proposed for virtio-net. > > > > But this happens on real hardware today. Better that real hardware is > > nice, but is it overkill? > > What do you mean, it happens on real hardware today? So far as I know, > the only cases where we have dynamic adjustment of flow steering are in > ixgbe (big table of hash filters, I think) and sfc (perfect filters). > I don't think that anyone's currently doing flow steering with the RSS > indirection table. (At least, not on Linux. I think that Microsoft was > intending to do so on Windows, but I don't know whether they ever did.) Thanks, I missed the word "could". > > And can't you reorder even with perfect matching, since prior packets > > will be on the old queue and more recent ones on the new queue? Does it > > discard or requeue old ones? Or am I missing a trick? > > Yes, that is possible. RFS is careful to avoid such reordering by only > changing the steering of a flow when none of its packets can be in a > software receive queue. It is not generally possible to do the same for > hardware receive queues. However, when the first condition is met it is > likely that there won't be a whole lot of packets for that flow in the > hardware receive queue either. (But if there are, then I think as a > side-effect of commit 09994d1 RFS will repeatedly ask the driver to > steer the flow. Which isn't ideal.) Should be easy to test, but the question is, how hard should we fight to maintain ordering? Dave? It comes down to this. We can say in the spec that a virtio nic which offers VIRTIO_F_NET_RFS: 1) Must do a perfect matching, with perfect ordering. This means you need perfect filters, and handle inter-queue ordering if you change a filter (requeue packets?) 2) Must do a perfect matching, but don't worry about ordering across changes. 3) Best effort matching, with perfect ordering. 3) Best effort matching, best effort ordering. For a perfect filtering setup, the virtio nic needs to either say how many filter slots it has, or have a way to fail an RFS request. For best effort, you can simply ignore RFS requests or accept hash collisions, without bothering the guest driver at all. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On Fri, 2011-12-09 at 16:01 +1030, Rusty Russell wrote: > On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings > wrote: > > Solarflare controllers (sfc driver) have 8192 perfect filters for > > TCP/IPv4 and UDP/IPv4 which can be used for flow steering. (The filters > > are organised as a hash table, but matched based on 5-tuples.) I > > implemented the 'accelerated RFS' interface in this driver. > > > > I believe the Intel 82599 controllers (ixgbe driver) have both > > hash-based and perfect filter modes and the driver can be configured to > > use one or the other. The driver has its own independent mechanism for > > steering RX and TX flows which predates RFS; I don't know whether it > > uses hash-based or perfect filters. > > Thanks for this summary (and Jason, too). I've fallen a long way behind > NIC state-of-the-art. > > > Most multi-queue controllers could support a kind of hash-based > > filtering for TCP/IP by adjusting the RSS indirection table. However, > > this table is usually quite small (64-256 entries). This means that > > hash collisions will be quite common and this can result in reordering. > > The same applies to the small table Jason has proposed for virtio-net. > > But this happens on real hardware today. Better that real hardware is > nice, but is it overkill? What do you mean, it happens on real hardware today? So far as I know, the only cases where we have dynamic adjustment of flow steering are in ixgbe (big table of hash filters, I think) and sfc (perfect filters). I don't think that anyone's currently doing flow steering with the RSS indirection table. (At least, not on Linux. I think that Microsoft was intending to do so on Windows, but I don't know whether they ever did.) > And can't you reorder even with perfect matching, since prior packets > will be on the old queue and more recent ones on the new queue? Does it > discard or requeue old ones? Or am I missing a trick? Yes, that is possible. RFS is careful to avoid such reordering by only changing the steering of a flow when none of its packets can be in a software receive queue. It is not generally possible to do the same for hardware receive queues. However, when the first condition is met it is likely that there won't be a whole lot of packets for that flow in the hardware receive queue either. (But if there are, then I think as a side-effect of commit 09994d1 RFS will repeatedly ask the driver to steer the flow. Which isn't ideal.) Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings wrote: > Solarflare controllers (sfc driver) have 8192 perfect filters for > TCP/IPv4 and UDP/IPv4 which can be used for flow steering. (The filters > are organised as a hash table, but matched based on 5-tuples.) I > implemented the 'accelerated RFS' interface in this driver. > > I believe the Intel 82599 controllers (ixgbe driver) have both > hash-based and perfect filter modes and the driver can be configured to > use one or the other. The driver has its own independent mechanism for > steering RX and TX flows which predates RFS; I don't know whether it > uses hash-based or perfect filters. Thanks for this summary (and Jason, too). I've fallen a long way behind NIC state-of-the-art. > Most multi-queue controllers could support a kind of hash-based > filtering for TCP/IP by adjusting the RSS indirection table. However, > this table is usually quite small (64-256 entries). This means that > hash collisions will be quite common and this can result in reordering. > The same applies to the small table Jason has proposed for virtio-net. But this happens on real hardware today. Better that real hardware is nice, but is it overkill? And can't you reorder even with perfect matching, since prior packets will be on the old queue and more recent ones on the new queue? Does it discard or requeue old ones? Or am I missing a trick? Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On 12/08/2011 01:02 AM, Ben Hutchings wrote: On Wed, 2011-12-07 at 19:31 +0800, Jason Wang wrote: On 12/07/2011 03:30 PM, Rusty Russell wrote: On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang wrote: multiple queue virtio-net: flow steering through host/guest cooperation Hello all: This is a rough series adds the guest/host cooperation of flow steering support based on Krish Kumar's multiple queue virtio-net driver patch 3/3 (http://lwn.net/Articles/467283/). Is there a real (physical) device which does this kind of thing? How do they do it? Can we copy them? Cheers, Rusty. As far as I see, ixgbe and sfc have similar but much more sophisticated mechanism. The idea was originally suggested by Ben and it was just borrowed form those real physical nic cards who can dispatch packets based on their hash. All of theses cards can filter the flow based on the hash of L2/L3/L4 header and the stack would tell the card which queue should this flow goes. Solarflare controllers (sfc driver) have 8192 perfect filters for TCP/IPv4 and UDP/IPv4 which can be used for flow steering. (The filters are organised as a hash table, but matched based on 5-tuples.) I implemented the 'accelerated RFS' interface in this driver. I believe the Intel 82599 controllers (ixgbe driver) have both hash-based and perfect filter modes and the driver can be configured to use one or the other. The driver has its own independent mechanism for steering RX and TX flows which predates RFS; I don't know whether it uses hash-based or perfect filters. As far as I see, their driver predates RFS by binding the TX queue and RX queue to the same CPU and adding hash based filter during packet transmission. Most multi-queue controllers could support a kind of hash-based filtering for TCP/IP by adjusting the RSS indirection table. However, this table is usually quite small (64-256 entries). This means that hash collisions will be quite common and this can result in reordering. The same applies to the small table Jason has proposed for virtio-net. Thanks for the clarification. Consider the hash were provided by host nic or host kernel, the collision rate is not fixed. Perfect filter is more suitable then. So in host, a simple hash to queue table were introduced in tap/macvtap and in guest, the guest driver would tell the desired queue of a flow through changing this table. I don't think accelerated RFS can work well without the use of perfect filtering or hash-based filtering with a very low rate of collisions. Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On Wed, 2011-12-07 at 19:31 +0800, Jason Wang wrote: > On 12/07/2011 03:30 PM, Rusty Russell wrote: > > On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang wrote: > >> multiple queue virtio-net: flow steering through host/guest cooperation > >> > >> Hello all: > >> > >> This is a rough series adds the guest/host cooperation of flow > >> steering support based on Krish Kumar's multiple queue virtio-net > >> driver patch 3/3 (http://lwn.net/Articles/467283/). > > Is there a real (physical) device which does this kind of thing? How do > > they do it? Can we copy them? > > > > Cheers, > > Rusty. > As far as I see, ixgbe and sfc have similar but much more sophisticated > mechanism. > > The idea was originally suggested by Ben and it was just borrowed form > those real physical nic cards who can dispatch packets based on their > hash. All of theses cards can filter the flow based on the hash of > L2/L3/L4 header and the stack would tell the card which queue should > this flow goes. Solarflare controllers (sfc driver) have 8192 perfect filters for TCP/IPv4 and UDP/IPv4 which can be used for flow steering. (The filters are organised as a hash table, but matched based on 5-tuples.) I implemented the 'accelerated RFS' interface in this driver. I believe the Intel 82599 controllers (ixgbe driver) have both hash-based and perfect filter modes and the driver can be configured to use one or the other. The driver has its own independent mechanism for steering RX and TX flows which predates RFS; I don't know whether it uses hash-based or perfect filters. Most multi-queue controllers could support a kind of hash-based filtering for TCP/IP by adjusting the RSS indirection table. However, this table is usually quite small (64-256 entries). This means that hash collisions will be quite common and this can result in reordering. The same applies to the small table Jason has proposed for virtio-net. > So in host, a simple hash to queue table were introduced in tap/macvtap > and in guest, the guest driver would tell the desired queue of a flow > through changing this table. I don't think accelerated RFS can work well without the use of perfect filtering or hash-based filtering with a very low rate of collisions. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On 12/07/2011 03:30 PM, Rusty Russell wrote: On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang wrote: multiple queue virtio-net: flow steering through host/guest cooperation Hello all: This is a rough series adds the guest/host cooperation of flow steering support based on Krish Kumar's multiple queue virtio-net driver patch 3/3 (http://lwn.net/Articles/467283/). Is there a real (physical) device which does this kind of thing? How do they do it? Can we copy them? Cheers, Rusty. As far as I see, ixgbe and sfc have similar but much more sophisticated mechanism. The idea was originally suggested by Ben and it was just borrowed form those real physical nic cards who can dispatch packets based on their hash. All of theses cards can filter the flow based on the hash of L2/L3/L4 header and the stack would tell the card which queue should this flow goes. So in host, a simple hash to queue table were introduced in tap/macvtap and in guest, the guest driver would tell the desired queue of a flow through changing this table. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 0/5] Series short description
On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang wrote: > multiple queue virtio-net: flow steering through host/guest cooperation > > Hello all: > > This is a rough series adds the guest/host cooperation of flow > steering support based on Krish Kumar's multiple queue virtio-net > driver patch 3/3 (http://lwn.net/Articles/467283/). Is there a real (physical) device which does this kind of thing? How do they do it? Can we copy them? Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC PATCH 0/5] Series short description
multiple queue virtio-net: flow steering through host/guest cooperation Hello all: This is a rough series adds the guest/host cooperation of flow steering support based on Krish Kumar's multiple queue virtio-net driver patch 3/3 (http://lwn.net/Articles/467283/). This idea is simple, the backend pass the rxhash to the guest and guest would tell the backend the hash to queue mapping when necessary then backend can choose the queue based on the hash value of the packet. The table is just a page shared bettwen userspace and the backend. Patch 1 enable the ability to pass the rxhash through vnet_hdr to guest. Patch 2,3 implement a very simple flow director for tap and mavtap. tap part is based on the multiqueue tap patches posted by me (http://lwn.net/Articles/459270/). Patch 4 implement a method for virtio device to find the irq of a specific virtqueue, in order to do device specific interrupt optimization Patch 5 is the part of the guest driver that using accelerate rfs to program the flow director and with some optimizations on irq affinity and tx queue selection. This is just a prototype that demonstrates the idea, there are still things need to be discussed: - An alternative idea instead of shared page is ctrl vq, the reason that a shared table is preferable is the delay of ctrl vq itself. - Optimization on irq affinity and tx queue selection Comments are welcomed, thanks! --- Jason Wang (5): virtio_net: passing rxhash through vnet_hdr tuntap: simple flow director support macvtap: flow director support virtio: introduce a method to get the irq of a specific virtqueue virtio-net: flow director support drivers/lguest/lguest_device.c |8 ++ drivers/net/macvlan.c |4 + drivers/net/macvtap.c | 42 - drivers/net/tun.c | 105 -- drivers/net/virtio_net.c | 189 +++- drivers/s390/kvm/kvm_virtio.c |6 + drivers/vhost/net.c| 10 +- drivers/vhost/vhost.h |5 + drivers/virtio/virtio_mmio.c |8 ++ drivers/virtio/virtio_pci.c| 12 +++ include/linux/if_macvlan.h |1 include/linux/if_tun.h | 11 ++ include/linux/virtio_config.h |4 + include/linux/virtio_net.h | 16 +++ 14 files changed, 377 insertions(+), 44 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html