Re: [net-next RFC PATCH 0/5] Series short description

2011-12-15 Thread Rusty Russell
On Thu, 15 Dec 2011 01:36:44 +, Ben Hutchings bhutchi...@solarflare.com 
wrote:
 On Fri, 2011-12-09 at 16:01 +1030, Rusty Russell wrote:
  On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings 
  bhutchi...@solarflare.com wrote:
   Most multi-queue controllers could support a kind of hash-based
   filtering for TCP/IP by adjusting the RSS indirection table.  However,
   this table is usually quite small (64-256 entries).  This means that
   hash collisions will be quite common and this can result in reordering.
   The same applies to the small table Jason has proposed for virtio-net.
  
  But this happens on real hardware today.  Better that real hardware is
  nice, but is it overkill?
 
 What do you mean, it happens on real hardware today?  So far as I know,
 the only cases where we have dynamic adjustment of flow steering are in
 ixgbe (big table of hash filters, I think) and sfc (perfect filters).
 I don't think that anyone's currently doing flow steering with the RSS
 indirection table.  (At least, not on Linux.  I think that Microsoft was
 intending to do so on Windows, but I don't know whether they ever did.)

Thanks, I missed the word could.

  And can't you reorder even with perfect matching, since prior packets
  will be on the old queue and more recent ones on the new queue?  Does it
  discard or requeue old ones?  Or am I missing a trick?
 
 Yes, that is possible.  RFS is careful to avoid such reordering by only
 changing the steering of a flow when none of its packets can be in a
 software receive queue.  It is not generally possible to do the same for
 hardware receive queues.  However, when the first condition is met it is
 likely that there won't be a whole lot of packets for that flow in the
 hardware receive queue either.  (But if there are, then I think as a
 side-effect of commit 09994d1 RFS will repeatedly ask the driver to
 steer the flow.  Which isn't ideal.)

Should be easy to test, but the question is, how hard should we fight to
maintain ordering?  Dave?

It comes down to this.  We can say in the spec that a virtio nic which
offers VIRTIO_F_NET_RFS:

1) Must do a perfect matching, with perfect ordering.  This means you need
   perfect filters, and handle inter-queue ordering if you change a
   filter (requeue packets?)
2) Must do a perfect matching, but don't worry about ordering across changes.
3) Best effort matching, with perfect ordering.
3) Best effort matching, best effort ordering.

For a perfect filtering setup, the virtio nic needs to either say how
many filter slots it has, or have a way to fail an RFS request.  For
best effort, you can simply ignore RFS requests or accept hash
collisions, without bothering the guest driver at all.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-14 Thread Ben Hutchings
On Fri, 2011-12-09 at 16:01 +1030, Rusty Russell wrote:
 On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings bhutchi...@solarflare.com 
 wrote:
  Solarflare controllers (sfc driver) have 8192 perfect filters for
  TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
  are organised as a hash table, but matched based on 5-tuples.)  I
  implemented the 'accelerated RFS' interface in this driver.
  
  I believe the Intel 82599 controllers (ixgbe driver) have both
  hash-based and perfect filter modes and the driver can be configured to
  use one or the other.  The driver has its own independent mechanism for
  steering RX and TX flows which predates RFS; I don't know whether it
  uses hash-based or perfect filters.
 
 Thanks for this summary (and Jason, too).  I've fallen a long way behind
 NIC state-of-the-art.
  
  Most multi-queue controllers could support a kind of hash-based
  filtering for TCP/IP by adjusting the RSS indirection table.  However,
  this table is usually quite small (64-256 entries).  This means that
  hash collisions will be quite common and this can result in reordering.
  The same applies to the small table Jason has proposed for virtio-net.
 
 But this happens on real hardware today.  Better that real hardware is
 nice, but is it overkill?

What do you mean, it happens on real hardware today?  So far as I know,
the only cases where we have dynamic adjustment of flow steering are in
ixgbe (big table of hash filters, I think) and sfc (perfect filters).
I don't think that anyone's currently doing flow steering with the RSS
indirection table.  (At least, not on Linux.  I think that Microsoft was
intending to do so on Windows, but I don't know whether they ever did.)

 And can't you reorder even with perfect matching, since prior packets
 will be on the old queue and more recent ones on the new queue?  Does it
 discard or requeue old ones?  Or am I missing a trick?

Yes, that is possible.  RFS is careful to avoid such reordering by only
changing the steering of a flow when none of its packets can be in a
software receive queue.  It is not generally possible to do the same for
hardware receive queues.  However, when the first condition is met it is
likely that there won't be a whole lot of packets for that flow in the
hardware receive queue either.  (But if there are, then I think as a
side-effect of commit 09994d1 RFS will repeatedly ask the driver to
steer the flow.  Which isn't ideal.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-09 Thread Rusty Russell
On Wed, 7 Dec 2011 17:02:04 +, Ben Hutchings bhutchi...@solarflare.com 
wrote:
 Solarflare controllers (sfc driver) have 8192 perfect filters for
 TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
 are organised as a hash table, but matched based on 5-tuples.)  I
 implemented the 'accelerated RFS' interface in this driver.
 
 I believe the Intel 82599 controllers (ixgbe driver) have both
 hash-based and perfect filter modes and the driver can be configured to
 use one or the other.  The driver has its own independent mechanism for
 steering RX and TX flows which predates RFS; I don't know whether it
 uses hash-based or perfect filters.

Thanks for this summary (and Jason, too).  I've fallen a long way behind
NIC state-of-the-art.
 
 Most multi-queue controllers could support a kind of hash-based
 filtering for TCP/IP by adjusting the RSS indirection table.  However,
 this table is usually quite small (64-256 entries).  This means that
 hash collisions will be quite common and this can result in reordering.
 The same applies to the small table Jason has proposed for virtio-net.

But this happens on real hardware today.  Better that real hardware is
nice, but is it overkill?

And can't you reorder even with perfect matching, since prior packets
will be on the old queue and more recent ones on the new queue?  Does it
discard or requeue old ones?  Or am I missing a trick?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-08 Thread Jason Wang

On 12/08/2011 01:02 AM, Ben Hutchings wrote:

On Wed, 2011-12-07 at 19:31 +0800, Jason Wang wrote:

On 12/07/2011 03:30 PM, Rusty Russell wrote:

On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wangjasow...@redhat.com   wrote:

multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.

As far as I see, ixgbe and sfc have similar but much more sophisticated
mechanism.

The idea was originally suggested by Ben and it was just borrowed form
those real physical nic cards who can dispatch packets based on their
hash. All of theses cards can filter the flow based on the hash of
L2/L3/L4 header and the stack would tell the card which queue should
this flow goes.

Solarflare controllers (sfc driver) have 8192 perfect filters for
TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
are organised as a hash table, but matched based on 5-tuples.)  I
implemented the 'accelerated RFS' interface in this driver.

I believe the Intel 82599 controllers (ixgbe driver) have both
hash-based and perfect filter modes and the driver can be configured to
use one or the other.  The driver has its own independent mechanism for
steering RX and TX flows which predates RFS; I don't know whether it
uses hash-based or perfect filters.


As far as I see, their driver predates RFS by binding the TX queue and 
RX queue to the same CPU and adding hash based filter during packet 
transmission.



Most multi-queue controllers could support a kind of hash-based
filtering for TCP/IP by adjusting the RSS indirection table.  However,
this table is usually quite small (64-256 entries).  This means that
hash collisions will be quite common and this can result in reordering.
The same applies to the small table Jason has proposed for virtio-net.



Thanks for the clarification. Consider the hash were provided by host 
nic or host kernel, the collision rate is not fixed. Perfect filter is 
more suitable then.

So in host, a simple hash to queue table were introduced in tap/macvtap
and in guest, the guest driver would tell the desired queue of a flow
through changing this table.

I don't think accelerated RFS can work well without the use of perfect
filtering or hash-based filtering with a very low rate of collisions.

Ben.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Rusty Russell
On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wang jasow...@redhat.com wrote:
 multiple queue virtio-net: flow steering through host/guest cooperation
 
 Hello all:
 
 This is a rough series adds the guest/host cooperation of flow
 steering support based on Krish Kumar's multiple queue virtio-net
 driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Jason Wang

On 12/07/2011 03:30 PM, Rusty Russell wrote:

On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wangjasow...@redhat.com  wrote:

multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

Is there a real (physical) device which does this kind of thing?  How do
they do it?  Can we copy them?

Cheers,
Rusty.
As far as I see, ixgbe and sfc have similar but much more sophisticated 
mechanism.


The idea was originally suggested by Ben and it was just borrowed form 
those real physical nic cards who can dispatch packets based on their 
hash. All of theses cards can filter the flow based on the hash of 
L2/L3/L4 header and the stack would tell the card which queue should 
this flow goes.


So in host, a simple hash to queue table were introduced in tap/macvtap 
and in guest, the guest driver would tell the desired queue of a flow 
through changing this table.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 0/5] Series short description

2011-12-07 Thread Ben Hutchings
On Wed, 2011-12-07 at 19:31 +0800, Jason Wang wrote:
 On 12/07/2011 03:30 PM, Rusty Russell wrote:
  On Mon, 05 Dec 2011 16:58:37 +0800, Jason Wangjasow...@redhat.com  wrote:
  multiple queue virtio-net: flow steering through host/guest cooperation
 
  Hello all:
 
  This is a rough series adds the guest/host cooperation of flow
  steering support based on Krish Kumar's multiple queue virtio-net
  driver patch 3/3 (http://lwn.net/Articles/467283/).
  Is there a real (physical) device which does this kind of thing?  How do
  they do it?  Can we copy them?
 
  Cheers,
  Rusty.
 As far as I see, ixgbe and sfc have similar but much more sophisticated 
 mechanism.
 
 The idea was originally suggested by Ben and it was just borrowed form 
 those real physical nic cards who can dispatch packets based on their 
 hash. All of theses cards can filter the flow based on the hash of 
 L2/L3/L4 header and the stack would tell the card which queue should 
 this flow goes.

Solarflare controllers (sfc driver) have 8192 perfect filters for
TCP/IPv4 and UDP/IPv4 which can be used for flow steering.  (The filters
are organised as a hash table, but matched based on 5-tuples.)  I
implemented the 'accelerated RFS' interface in this driver.

I believe the Intel 82599 controllers (ixgbe driver) have both
hash-based and perfect filter modes and the driver can be configured to
use one or the other.  The driver has its own independent mechanism for
steering RX and TX flows which predates RFS; I don't know whether it
uses hash-based or perfect filters.

Most multi-queue controllers could support a kind of hash-based
filtering for TCP/IP by adjusting the RSS indirection table.  However,
this table is usually quite small (64-256 entries).  This means that
hash collisions will be quite common and this can result in reordering.
The same applies to the small table Jason has proposed for virtio-net.

 So in host, a simple hash to queue table were introduced in tap/macvtap 
 and in guest, the guest driver would tell the desired queue of a flow 
 through changing this table.

I don't think accelerated RFS can work well without the use of perfect
filtering or hash-based filtering with a very low rate of collisions.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next RFC PATCH 0/5] Series short description

2011-12-05 Thread Jason Wang
multiple queue virtio-net: flow steering through host/guest cooperation

Hello all:

This is a rough series adds the guest/host cooperation of flow
steering support based on Krish Kumar's multiple queue virtio-net
driver patch 3/3 (http://lwn.net/Articles/467283/).

This idea is simple, the backend pass the rxhash to the guest and
guest would tell the backend the hash to queue mapping when necessary
then backend can choose the queue based on the hash value of the
packet.  The table is just a page shared bettwen userspace and the
backend.

Patch 1 enable the ability to pass the rxhash through vnet_hdr to
guest.
Patch 2,3 implement a very simple flow director for tap and
mavtap. tap part is based on the multiqueue tap patches posted by me
(http://lwn.net/Articles/459270/).
Patch 4 implement a method for virtio device to find the irq of a
specific virtqueue, in order to do device specific interrupt
optimization
Patch 5 is the part of the guest driver that using accelerate rfs to
program the flow director and with some optimizations on irq affinity
and tx queue selection.

This is just a prototype that demonstrates the idea, there are still
things need to be discussed:

- An alternative idea instead of shared page is ctrl vq, the reason
  that a shared table is preferable is the delay of ctrl vq itself.
- Optimization on irq affinity and tx queue selection

Comments are welcomed, thanks!

---

Jason Wang (5):
  virtio_net: passing rxhash through vnet_hdr
  tuntap: simple flow director support
  macvtap: flow director support
  virtio: introduce a method to get the irq of a specific virtqueue
  virtio-net: flow director support


 drivers/lguest/lguest_device.c |8 ++
 drivers/net/macvlan.c  |4 +
 drivers/net/macvtap.c  |   42 -
 drivers/net/tun.c  |  105 --
 drivers/net/virtio_net.c   |  189 +++-
 drivers/s390/kvm/kvm_virtio.c  |6 +
 drivers/vhost/net.c|   10 +-
 drivers/vhost/vhost.h  |5 +
 drivers/virtio/virtio_mmio.c   |8 ++
 drivers/virtio/virtio_pci.c|   12 +++
 include/linux/if_macvlan.h |1 
 include/linux/if_tun.h |   11 ++
 include/linux/virtio_config.h  |4 +
 include/linux/virtio_net.h |   16 +++
 14 files changed, 377 insertions(+), 44 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html