Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-08 Thread Sridhar Samudrala

On 12/7/2011 3:02 AM, Jason Wang wrote:

On 12/06/2011 11:42 PM, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  
wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com
wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be 
the same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost 
or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle 
packets to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.


Even if we have per-cpu workthread, only one socket is used to queue 
the packet then, so a multiple queue(sockets) tap/macvtap is still 
needed.
I think so.  We need per-cpu tap/macvtap sockets along with per-cpu 
vhost threads.

This will parallelize all the way from physical NIC to vhost.

Thanks
Sridhar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi
On Wed, Dec 7, 2011 at 3:03 AM, Jason Wang jasow...@redhat.com wrote:
 On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  wrote:

 On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com
  wrote:

 On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

 The vcpus are just threads and may not be bound to physical CPUs, so
 what is the big picture here?  Is the guest even in the position to
 set the best queue mappings today?


 Not sure it could publish the best mapping but the idea is to make sure
 the
 packets of a flow were handled by the same guest vcpu and may be the same
 vhost thread in order to eliminate the packet reordering and lock
 contention. But this assumption does not take the bouncing of vhost or
 vcpu
 threads which would also affect the result.

 Okay, this is why I'd like to know what the big picture here is.  What
 solution are you proposing?  How are we going to have everything from
 guest application, guest kernel, host threads, and host NIC driver
 play along so we get the right steering up the entire stack.  I think
 there needs to be an answer to that before changing virtio-net to add
 any steering mechanism.


 Consider the complexity of the host nic each with their own steering
 features,  this series make the first step with minimal effort to try to let
 guest driver and host tap/macvtap co-operate like what physical nic does.
 There may be other method, but performance numbers is also needed to give
 the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/06/2011 11:42 PM, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com
wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be the 
same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost 
or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle 
packets to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.


Even if we have per-cpu workthread, only one socket is used to queue the 
packet then, so a multiple queue(sockets) tap/macvtap is still needed.


Thanks
Sridhar



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/07/2011 07:10 AM, Sridhar Samudrala wrote:

On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com   
wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
On Tue, Dec 6, 2011 at 6:33 AM, Jason 
Wangjasow...@redhat.com wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?
Not sure it could publish the best mapping but the idea is to make 
sure the
packets of a flow were handled by the same guest vcpu and may be 
the same

vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of 
vhost or vcpu

threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.



Yes. Also the current model of  a vhost thread per VM's interface
doesn't help with packet steering
all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle
packets to/from physical NIC's
TX/RX queues.
Currently we have a single vhost thread for a VM's i/f
that handles all the packets from
various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.

At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing
perf. improvements as long as TX and RX threads are on the same 
cpu-socket.


I emulated this through my multi-queue series in the past, looks like it 
damages the performance of single stream especially guest tx.


On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

But this may not be scalable long term when we want to support a large 
number of VMs each

having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Jason Wang

On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
[...]

  Consider the complexity of the host nic each with their own steering
  features,  this series make the first step with minimal effort to try to let
  guest driver and host tap/macvtap co-operate like what physical nic does.
  There may be other method, but performance numbers is also needed to give
  the answer.

I agree that performance results for this need to be shown.

My original point is really that it's not a good idea to take
individual steps without a good big picture because this will change
the virtio-net device specification.  If this turns out to be a dead
end then hosts will need to continue to support the interface forever
(legacy guests could still try to use it).  So please first explain
what the full stack picture is going to look like and how you think it
will lead to better performance.  You don't need to have all the code
or evidence, but just enough explanation so we see where this is all
going.

I think I mention too little in the cover message.

There's no much changes with Krishna's series except the method that 
choosing a rx virtqueue. Since original series use different hash 
methods in host (rxhash) and guest (txhash), a different virtqueue were 
chose for a flow which could lead packets of a flow to be handled by 
different vhost thread and vcpu. This may damage the performance.


This series tries to let one vhost thread to process the packets of a 
flow and also let the packets to be sent directly to a vcpu local to the 
thread process the data. This is done by letting guest tell the desired 
queue form which it want to receive the pakcet of a dedicated flow.


So passing the hash from host to guest is needed to get the same hash in 
the two sides. Then a guest programmable hash to queue table were 
introduced and guest co-operate with the host through accelerate RFS in 
guest.





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-07 Thread Stefan Hajnoczi
On Wed, Dec 7, 2011 at 12:10 PM, Jason Wang jasow...@redhat.com wrote:
 On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote:
 [...]

   Consider the complexity of the host nic each with their own steering
   features,  this series make the first step with minimal effort to try
  to let
   guest driver and host tap/macvtap co-operate like what physical nic
  does.
   There may be other method, but performance numbers is also needed to
  give
   the answer.

 I agree that performance results for this need to be shown.

 My original point is really that it's not a good idea to take
 individual steps without a good big picture because this will change
 the virtio-net device specification.  If this turns out to be a dead
 end then hosts will need to continue to support the interface forever
 (legacy guests could still try to use it).  So please first explain
 what the full stack picture is going to look like and how you think it
 will lead to better performance.  You don't need to have all the code
 or evidence, but just enough explanation so we see where this is all
 going.

 I think I mention too little in the cover message.

 There's no much changes with Krishna's series except the method that
 choosing a rx virtqueue. Since original series use different hash methods in
 host (rxhash) and guest (txhash), a different virtqueue were chose for a
 flow which could lead packets of a flow to be handled by different vhost
 thread and vcpu. This may damage the performance.

 This series tries to let one vhost thread to process the packets of a flow
 and also let the packets to be sent directly to a vcpu local to the thread
 process the data. This is done by letting guest tell the desired queue form
 which it want to receive the pakcet of a dedicated flow.

 So passing the hash from host to guest is needed to get the same hash in the
 two sides. Then a guest programmable hash to queue table were introduced and
 guest co-operate with the host through accelerate RFS in guest.

Thanks for the context, that helps me understand it a little better.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Stefan Hajnoczi
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang jasow...@redhat.com wrote:
 On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com  wrote:

 +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
 +{
 +       struct virtnet_info *vi = netdev_priv(dev);
 +       struct virtio_device *vdev = vi-vdev;
 +
 +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
 +               vdev-config-set(vdev,
 +                                 offsetof(struct virtio_net_config_fd,
 addr),
 +pfn, sizeof(u32));

 Please use the virtio model (i.e. virtqueues) instead of shared
 memory.  Mapping a page breaks the virtio abstraction.


 Using control virtqueue is more suitable but there's are also some problems:

 One problem is the interface,  if we use control virtqueue, we need a
 interface between the backend and tap/macvtap to change the flow mapping.
 But qemu and vhost_net only know about the file descriptor, more
 informations or interfaces need to be exposed in order to let ethtool or
 ioctl work.

QEMU could provide map a shared page with tap/macvtap.  The difference
would be that the guest-host interface is still virtio and QEMU
pokes values into the shared page on behalf of the guest.

 Another problem is the delay introduced by ctrl vq, as the ctrl vq would be
 used in the critical path in guest and it use busy wait to get the response,
 the delay is not neglectable.

Then you need to find a better way of doing this.  Can the host
automatically associate the flow from the tx virtqueue packets are
transmitted on?  Does it make sense to add a virtio_net_hdr field that
updates the queue mapping?

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Jason Wang

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com  wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.comwrote:

+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev-config-set(vdev,
+ offsetof(struct virtio_net_config_fd,
addr),
+pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.


Using control virtqueue is more suitable but there's are also some problems:

One problem is the interface,  if we use control virtqueue, we need a
interface between the backend and tap/macvtap to change the flow mapping.
But qemu and vhost_net only know about the file descriptor, more
informations or interfaces need to be exposed in order to let ethtool or
ioctl work.

QEMU could provide map a shared page with tap/macvtap.  The difference
would be that the guest-host interface is still virtio and QEMU
pokes values into the shared page on behalf of the guest.


This makes sense.


Another problem is the delay introduced by ctrl vq, as the ctrl vq would be
used in the critical path in guest and it use busy wait to get the response,
the delay is not neglectable.

Then you need to find a better way of doing this.  Can the host
automatically associate the flow from the tx virtqueue packets are
transmitted on?  Does it make sense to add a virtio_net_hdr field that
updates the queue mapping?


It can but it can not properly handling the the packet re-ordering 
caused by the moving of guest applications among guest cpus. One more 
problem for virtio_net_hdr is we need to build a empty packet when there 
no other packet to send.


One solution is to introduce unblock cmd for ctrl vq.

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure 
the packets of a flow were handled by the same guest vcpu and may be the 
same vhost thread in order to eliminate the packet reordering and lock 
contention. But this assumption does not take the bouncing of vhost or 
vcpu threads which would also affect the result.


Anyway, the mapping from guest was an important reference.

Stefan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Stefan Hajnoczi
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang jasow...@redhat.com wrote:
 On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

 On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com  wrote:

 On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

 On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:
 The vcpus are just threads and may not be bound to physical CPUs, so
 what is the big picture here?  Is the guest even in the position to
 set the best queue mappings today?


 Not sure it could publish the best mapping but the idea is to make sure the
 packets of a flow were handled by the same guest vcpu and may be the same
 vhost thread in order to eliminate the packet reordering and lock
 contention. But this assumption does not take the bouncing of vhost or vcpu
 threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Sridhar Samudrala

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.comwrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Yes. Also the current model of  a vhost thread per VM's interface 
doesn't help with packet steering

all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle packets 
to/from physical NIC's
TX/RX queues. Currently we have a single vhost thread for a VM's i/f 
that handles all the packets from

various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Michael S. Tsirkin
On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:
 On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:
 On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  wrote:
 On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:
 On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.comwrote:
 On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:
 On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
   wrote:
 The vcpus are just threads and may not be bound to physical CPUs, so
 what is the big picture here?  Is the guest even in the position to
 set the best queue mappings today?
 
 Not sure it could publish the best mapping but the idea is to make sure the
 packets of a flow were handled by the same guest vcpu and may be the same
 vhost thread in order to eliminate the packet reordering and lock
 contention. But this assumption does not take the bouncing of vhost or vcpu
 threads which would also affect the result.
 Okay, this is why I'd like to know what the big picture here is.  What
 solution are you proposing?  How are we going to have everything from
 guest application, guest kernel, host threads, and host NIC driver
 play along so we get the right steering up the entire stack.  I think
 there needs to be an answer to that before changing virtio-net to add
 any steering mechanism.
 
 
 Yes. Also the current model of  a vhost thread per VM's interface
 doesn't help with packet steering
 all the way from the guest to the host physical NIC.
 
 I think we need to have vhost thread(s) per-CPU that can handle
 packets to/from physical NIC's
 TX/RX queues.
 Currently we have a single vhost thread for a VM's i/f
 that handles all the packets from
 various flows coming from a multi-queue physical NIC.
 
 Thanks
 Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
   this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV

On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Ben Hutchings
On Tue, 2011-12-06 at 15:25 +0800, Jason Wang wrote:
 On 12/06/2011 04:42 AM, Ben Hutchings wrote:
[...]
  This is not a proper implementation of ndo_rx_flow_steer.  If you steer
  a flow by changing the RSS table this can easily cause packet reordering
  in other flows.  The filtering should be more precise, ideally matching
  exactly a single flow by e.g. VID and IP 5-tuple.
 
  I think you need to add a second hash table which records exactly which
  flow is supposed to be steered.  Also, you must call
  rps_may_expire_flow() to check whether an entry in this table may be
  replaced; otherwise you can cause packet reordering in the flow that was
  previously being steered.
 
  Finally, this function must return the table index it assigned, so that
  rps_may_expire_flow() works.
 
 Thanks for the explanation, how about document this briefly in scaling.txt?
[...]

I believe scaling.txt is intended for users/administrators, not
developers.

The documentation for implementers of accelerated RFS is in the comment
for struct net_device_ops and the commit message adding it.  But I
really should improve that comment.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Sridhar Samudrala

On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote:

On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote:

On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com   wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.com wrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?

Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.



Yes. Also the current model of  a vhost thread per VM's interface
doesn't help with packet steering
all the way from the guest to the host physical NIC.

I think we need to have vhost thread(s) per-CPU that can handle
packets to/from physical NIC's
TX/RX queues.
Currently we have a single vhost thread for a VM's i/f
that handles all the packets from
various flows coming from a multi-queue physical NIC.

Thanks
Sridhar

It's not hard to try that:
1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb
this will convert our thread to a workqueue
2. convert the workqueue to a per-cpu one

It didn't work that well in the past, but YMMV
Yes. I tried this before we went ahead with per-interface vhost 
threading model.

At that time, per-cpu vhost  showed a regression with a single-VM and
per-vq vhost showed good performance improvements upto 8 VMs.

So  just making it per-cpu would not be enough. I think we may need a way
to schedule vcpu threads on the same cpu-socket as vhost.

Another aspect we need to look into is the splitting of vhost thread 
into separate
threads for TX and RX. Shirley is doing some work in this area and she 
is seeing

perf. improvements as long as TX and RX threads are on the same cpu-socket.


On the surface I'd say a single thread makes some sense
as long as guest uses a single queue.

But this may not be scalable long term when we want to support a large 
number of VMs each

having multiple virtio-net interfaces with multiple queues.

Thanks
Sridhar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-06 Thread Jason Wang

On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 10:21 AM, Jason Wangjasow...@redhat.com  wrote:

On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote:

On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangjasow...@redhat.comwrote:

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com
  wrote:

The vcpus are just threads and may not be bound to physical CPUs, so
what is the big picture here?  Is the guest even in the position to
set the best queue mappings today?


Not sure it could publish the best mapping but the idea is to make sure the
packets of a flow were handled by the same guest vcpu and may be the same
vhost thread in order to eliminate the packet reordering and lock
contention. But this assumption does not take the bouncing of vhost or vcpu
threads which would also affect the result.

Okay, this is why I'd like to know what the big picture here is.  What
solution are you proposing?  How are we going to have everything from
guest application, guest kernel, host threads, and host NIC driver
play along so we get the right steering up the entire stack.  I think
there needs to be an answer to that before changing virtio-net to add
any steering mechanism.


Consider the complexity of the host nic each with their own steering 
features,  this series make the first step with minimal effort to try to 
let guest driver and host tap/macvtap co-operate like what physical nic 
does. There may be other method, but performance numbers is also needed 
to give the answer.


Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang
In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 drivers/net/virtio_net.c   |  165 +++-
 include/linux/virtio_net.h |6 ++
 2 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0d871f8..89bb5e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,6 +26,10 @@
 #include linux/scatterlist.h
 #include linux/if_vlan.h
 #include linux/slab.h
+#include linux/highmem.h
+#include linux/cpu_rmap.h
+#include linux/interrupt.h
+#include linux/cpumask.h
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
@@ -40,6 +44,7 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 #define VIRTNET_DRIVER_VERSION 1.0.0
+#define TAP_HASH_MASK 0xFF
 
 struct virtnet_send_stats {
struct u64_stats_sync syncp;
@@ -89,6 +94,9 @@ struct receive_queue {
 
/* Active rx statistics */
struct virtnet_recv_stats __percpu *stats;
+
+   /* FIXME: per vector instead of per queue ?? */
+   cpumask_var_t affinity_mask;
 };
 
 struct virtnet_info {
@@ -110,6 +118,11 @@ struct virtnet_info {
 
/* Host will pass rxhash to us. */
bool has_rxhash;
+
+   /* A page of flow director */
+   struct page *fd_page;
+
+   cpumask_var_t affinity_mask;
 };
 
 struct skb_vnet_hdr {
@@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void 
*buf, unsigned int len)
if (vi-has_rxhash)
skb-rxhash = hdr-rhdr.rxhash;
 
+   skb_record_rx_queue(skb, rq-vq-queue_index / 2);
netif_receive_skb(skb);
return;
 
@@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev-config-set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+ pfn, sizeof(u32));
+   }
+   return 0;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, 
int new_mtu)
return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash)
+   return -EPROTONOSUPPORT;
+
+   table = kmap_atomic(vi-fd_page);
+   table[skb-rxhash  TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif
+
+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq = dev-real_num_tx_queues))
+   txq -= dev-real_num_tx_queues;
+   return txq;
+}
+
 static const struct net_device_ops virtnet_netdev = {
.ndo_open= virtnet_open,
.ndo_stop= virtnet_close,
@@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+   .ndo_select_queue= virtnet_select_queue,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = virtnet_netpoll,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+   .ndo_rx_flow_steer   = virtnet_fd,
+#endif
 };
 
 static void virtnet_update_status(struct virtnet_info *vi)
@@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi)
 

Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Stefan Hajnoczi
On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang jasow...@redhat.com wrote:
 +static int virtnet_set_fd(struct net_device *dev, u32 pfn)
 +{
 +       struct virtnet_info *vi = netdev_priv(dev);
 +       struct virtio_device *vdev = vi-vdev;
 +
 +       if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
 +               vdev-config-set(vdev,
 +                                 offsetof(struct virtio_net_config_fd, addr),
 +                                 pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Ben Hutchings
On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:
 In order to let the packets of a flow to be passed to the desired
 guest cpu, we can co-operate with devices through programming the flow
 director which was just a hash to queue table.
 
 This kinds of co-operation is done through the accelerate RFS support,
 a device specific flow sterring method virtnet_fd() is used to modify
 the flow director based on rfs mapping. The desired queue were
 calculated through reverse mapping of the irq affinity table. In order
 to parallelize the ingress path, irq affinity of rx queue were also
 provides by the driver.
 
 In addition to accelerate RFS, we can also use the guest scheduler to
 balance the load of TX and reduce the lock contention on egress path,
 so the processor_id() were used to tx queue selection.
[...]
 +#ifdef CONFIG_RFS_ACCEL
 +
 +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
 +u16 rxq_index, u32 flow_id)
 +{
 + struct virtnet_info *vi = netdev_priv(net_dev);
 + u16 *table = NULL;
 +
 + if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash)
 + return -EPROTONOSUPPORT;

Why only IPv4?

 + table = kmap_atomic(vi-fd_page);
 + table[skb-rxhash  TAP_HASH_MASK] = rxq_index;
 + kunmap_atomic(table);
 +
 + return 0;
 +}
 +#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.

 +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
 +{
 + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
 +smp_processor_id();
 +
 + /* As we make use of the accelerate rfs which let the scheduler to
 +  * balance the load, it make sense to choose the tx queue also based on
 +  * theprocessor id?
 +  */
 + while (unlikely(txq = dev-real_num_tx_queues))
 + txq -= dev-real_num_tx_queues;
 + return txq;
 +}
[...]

Don't do this, let XPS handle it.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote:

On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangjasow...@redhat.com  wrote:

+static int virtnet_set_fd(struct net_device *dev, u32 pfn)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   struct virtio_device *vdev = vi-vdev;
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) {
+   vdev-config-set(vdev,
+ offsetof(struct virtio_net_config_fd, addr),
+pfn, sizeof(u32));

Please use the virtio model (i.e. virtqueues) instead of shared
memory.  Mapping a page breaks the virtio abstraction.


Using control virtqueue is more suitable but there's are also some problems:

One problem is the interface,  if we use control virtqueue, we need a 
interface between the backend and tap/macvtap to change the flow 
mapping. But qemu and vhost_net only know about the file descriptor, 
more informations or interfaces need to be exposed in order to let 
ethtool or ioctl work.


Another problem is the delay introduced by ctrl vq, as the ctrl vq would 
be used in the critical path in guest and it use busy wait to get the 
response, the delay is not neglectable.



Stefan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC PATCH 5/5] virtio-net: flow director support

2011-12-05 Thread Jason Wang

On 12/06/2011 04:42 AM, Ben Hutchings wrote:

On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote:

In order to let the packets of a flow to be passed to the desired
guest cpu, we can co-operate with devices through programming the flow
director which was just a hash to queue table.

This kinds of co-operation is done through the accelerate RFS support,
a device specific flow sterring method virtnet_fd() is used to modify
the flow director based on rfs mapping. The desired queue were
calculated through reverse mapping of the irq affinity table. In order
to parallelize the ingress path, irq affinity of rx queue were also
provides by the driver.

In addition to accelerate RFS, we can also use the guest scheduler to
balance the load of TX and reduce the lock contention on egress path,
so the processor_id() were used to tx queue selection.

[...]

+#ifdef CONFIG_RFS_ACCEL
+
+int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb,
+  u16 rxq_index, u32 flow_id)
+{
+   struct virtnet_info *vi = netdev_priv(net_dev);
+   u16 *table = NULL;
+
+   if (skb-protocol != htons(ETH_P_IP) || !skb-rxhash)
+   return -EPROTONOSUPPORT;

Why only IPv4?


Oops, IPv6 should work also.

+   table = kmap_atomic(vi-fd_page);
+   table[skb-rxhash  TAP_HASH_MASK] = rxq_index;
+   kunmap_atomic(table);
+
+   return 0;
+}
+#endif

This is not a proper implementation of ndo_rx_flow_steer.  If you steer
a flow by changing the RSS table this can easily cause packet reordering
in other flows.  The filtering should be more precise, ideally matching
exactly a single flow by e.g. VID and IP 5-tuple.

I think you need to add a second hash table which records exactly which
flow is supposed to be steered.  Also, you must call
rps_may_expire_flow() to check whether an entry in this table may be
replaced; otherwise you can cause packet reordering in the flow that was
previously being steered.

Finally, this function must return the table index it assigned, so that
rps_may_expire_flow() works.


Thanks for the explanation, how about document this briefly in scaling.txt?

+static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) :
+  smp_processor_id();
+
+   /* As we make use of the accelerate rfs which let the scheduler to
+* balance the load, it make sense to choose the tx queue also based on
+* theprocessor id?
+*/
+   while (unlikely(txq= dev-real_num_tx_queues))
+   txq -= dev-real_num_tx_queues;
+   return txq;
+}

[...]

Don't do this, let XPS handle it.

Ben.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html