Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/7/2011 3:02 AM, Jason Wang wrote: On 12/06/2011 11:42 PM, Sridhar Samudrala wrote: On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Yes. Also the current model of a vhost thread per VM's interface doesn't help with packet steering all the way from the guest to the host physical NIC. I think we need to have vhost thread(s) per-CPU that can handle packets to/from physical NIC's TX/RX queues. Currently we have a single vhost thread for a VM's i/f that handles all the packets from various flows coming from a multi-queue physical NIC. Even if we have per-cpu workthread, only one socket is used to queue the packet then, so a multiple queue(sockets) tap/macvtap is still needed. I think so. We need per-cpu tap/macvtap sockets along with per-cpu vhost threads. This will parallelize all the way from physical NIC to vhost. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Wed, Dec 7, 2011 at 12:10 PM, Jason Wang wrote: > On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote: > [...] > >>> > Consider the complexity of the host nic each with their own steering >>> > features, this series make the first step with minimal effort to try >>> > to let >>> > guest driver and host tap/macvtap co-operate like what physical nic >>> > does. >>> > There may be other method, but performance numbers is also needed to >>> > give >>> > the answer. >> >> I agree that performance results for this need to be shown. >> >> My original point is really that it's not a good idea to take >> individual steps without a good big picture because this will change >> the virtio-net device specification. If this turns out to be a dead >> end then hosts will need to continue to support the interface forever >> (legacy guests could still try to use it). So please first explain >> what the full stack picture is going to look like and how you think it >> will lead to better performance. You don't need to have all the code >> or evidence, but just enough explanation so we see where this is all >> going. > > I think I mention too little in the cover message. > > There's no much changes with Krishna's series except the method that > choosing a rx virtqueue. Since original series use different hash methods in > host (rxhash) and guest (txhash), a different virtqueue were chose for a > flow which could lead packets of a flow to be handled by different vhost > thread and vcpu. This may damage the performance. > > This series tries to let one vhost thread to process the packets of a flow > and also let the packets to be sent directly to a vcpu local to the thread > process the data. This is done by letting guest tell the desired queue form > which it want to receive the pakcet of a dedicated flow. > > So passing the hash from host to guest is needed to get the same hash in the > two sides. Then a guest programmable hash to queue table were introduced and > guest co-operate with the host through accelerate RFS in guest. Thanks for the context, that helps me understand it a little better. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/07/2011 05:08 PM, Stefan Hajnoczi wrote: [...] > Consider the complexity of the host nic each with their own steering > features, this series make the first step with minimal effort to try to let > guest driver and host tap/macvtap co-operate like what physical nic does. > There may be other method, but performance numbers is also needed to give > the answer. I agree that performance results for this need to be shown. My original point is really that it's not a good idea to take individual steps without a good big picture because this will change the virtio-net device specification. If this turns out to be a dead end then hosts will need to continue to support the interface forever (legacy guests could still try to use it). So please first explain what the full stack picture is going to look like and how you think it will lead to better performance. You don't need to have all the code or evidence, but just enough explanation so we see where this is all going. I think I mention too little in the cover message. There's no much changes with Krishna's series except the method that choosing a rx virtqueue. Since original series use different hash methods in host (rxhash) and guest (txhash), a different virtqueue were chose for a flow which could lead packets of a flow to be handled by different vhost thread and vcpu. This may damage the performance. This series tries to let one vhost thread to process the packets of a flow and also let the packets to be sent directly to a vcpu local to the thread process the data. This is done by letting guest tell the desired queue form which it want to receive the pakcet of a dedicated flow. So passing the hash from host to guest is needed to get the same hash in the two sides. Then a guest programmable hash to queue table were introduced and guest co-operate with the host through accelerate RFS in guest. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/07/2011 07:10 AM, Sridhar Samudrala wrote: On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote: On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote: On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Yes. Also the current model of a vhost thread per VM's interface doesn't help with packet steering all the way from the guest to the host physical NIC. I think we need to have vhost thread(s) per-CPU that can handle packets to/from physical NIC's TX/RX queues. Currently we have a single vhost thread for a VM's i/f that handles all the packets from various flows coming from a multi-queue physical NIC. Thanks Sridhar It's not hard to try that: 1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb this will convert our thread to a workqueue 2. convert the workqueue to a per-cpu one It didn't work that well in the past, but YMMV Yes. I tried this before we went ahead with per-interface vhost threading model. At that time, per-cpu vhost showed a regression with a single-VM and per-vq vhost showed good performance improvements upto 8 VMs. So just making it per-cpu would not be enough. I think we may need a way to schedule vcpu threads on the same cpu-socket as vhost. Another aspect we need to look into is the splitting of vhost thread into separate threads for TX and RX. Shirley is doing some work in this area and she is seeing perf. improvements as long as TX and RX threads are on the same cpu-socket. I emulated this through my multi-queue series in the past, looks like it damages the performance of single stream especially guest tx. On the surface I'd say a single thread makes some sense as long as guest uses a single queue. But this may not be scalable long term when we want to support a large number of VMs each having multiple virtio-net interfaces with multiple queues. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/06/2011 11:42 PM, Sridhar Samudrala wrote: On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Yes. Also the current model of a vhost thread per VM's interface doesn't help with packet steering all the way from the guest to the host physical NIC. I think we need to have vhost thread(s) per-CPU that can handle packets to/from physical NIC's TX/RX queues. Currently we have a single vhost thread for a VM's i/f that handles all the packets from various flows coming from a multi-queue physical NIC. Even if we have per-cpu workthread, only one socket is used to queue the packet then, so a multiple queue(sockets) tap/macvtap is still needed. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Wed, Dec 7, 2011 at 3:03 AM, Jason Wang wrote: > On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote: >> >> On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: >>> >>> On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: > > On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: >> >> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang >> wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? >>> >>> >>> Not sure it could publish the best mapping but the idea is to make sure >>> the >>> packets of a flow were handled by the same guest vcpu and may be the same >>> vhost thread in order to eliminate the packet reordering and lock >>> contention. But this assumption does not take the bouncing of vhost or >>> vcpu >>> threads which would also affect the result. >> >> Okay, this is why I'd like to know what the big picture here is. What >> solution are you proposing? How are we going to have everything from >> guest application, guest kernel, host threads, and host NIC driver >> play along so we get the right steering up the entire stack. I think >> there needs to be an answer to that before changing virtio-net to add >> any steering mechanism. > > > Consider the complexity of the host nic each with their own steering > features, this series make the first step with minimal effort to try to let > guest driver and host tap/macvtap co-operate like what physical nic does. > There may be other method, but performance numbers is also needed to give > the answer. I agree that performance results for this need to be shown. My original point is really that it's not a good idea to take individual steps without a good big picture because this will change the virtio-net device specification. If this turns out to be a dead end then hosts will need to continue to support the interface forever (legacy guests could still try to use it). So please first explain what the full stack picture is going to look like and how you think it will lead to better performance. You don't need to have all the code or evidence, but just enough explanation so we see where this is all going. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/06/2011 09:15 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Consider the complexity of the host nic each with their own steering features, this series make the first step with minimal effort to try to let guest driver and host tap/macvtap co-operate like what physical nic does. There may be other method, but performance numbers is also needed to give the answer. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/6/2011 8:14 AM, Michael S. Tsirkin wrote: On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote: On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Yes. Also the current model of a vhost thread per VM's interface doesn't help with packet steering all the way from the guest to the host physical NIC. I think we need to have vhost thread(s) per-CPU that can handle packets to/from physical NIC's TX/RX queues. Currently we have a single vhost thread for a VM's i/f that handles all the packets from various flows coming from a multi-queue physical NIC. Thanks Sridhar It's not hard to try that: 1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb this will convert our thread to a workqueue 2. convert the workqueue to a per-cpu one It didn't work that well in the past, but YMMV Yes. I tried this before we went ahead with per-interface vhost threading model. At that time, per-cpu vhost showed a regression with a single-VM and per-vq vhost showed good performance improvements upto 8 VMs. So just making it per-cpu would not be enough. I think we may need a way to schedule vcpu threads on the same cpu-socket as vhost. Another aspect we need to look into is the splitting of vhost thread into separate threads for TX and RX. Shirley is doing some work in this area and she is seeing perf. improvements as long as TX and RX threads are on the same cpu-socket. On the surface I'd say a single thread makes some sense as long as guest uses a single queue. But this may not be scalable long term when we want to support a large number of VMs each having multiple virtio-net interfaces with multiple queues. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Tue, 2011-12-06 at 15:25 +0800, Jason Wang wrote: > On 12/06/2011 04:42 AM, Ben Hutchings wrote: [...] > > This is not a proper implementation of ndo_rx_flow_steer. If you steer > > a flow by changing the RSS table this can easily cause packet reordering > > in other flows. The filtering should be more precise, ideally matching > > exactly a single flow by e.g. VID and IP 5-tuple. > > > > I think you need to add a second hash table which records exactly which > > flow is supposed to be steered. Also, you must call > > rps_may_expire_flow() to check whether an entry in this table may be > > replaced; otherwise you can cause packet reordering in the flow that was > > previously being steered. > > > > Finally, this function must return the table index it assigned, so that > > rps_may_expire_flow() works. > > Thanks for the explanation, how about document this briefly in scaling.txt? [...] I believe scaling.txt is intended for users/administrators, not developers. The documentation for implementers of accelerated RFS is in the comment for struct net_device_ops and the commit message adding it. But I really should improve that comment. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Tue, Dec 06, 2011 at 07:42:54AM -0800, Sridhar Samudrala wrote: > On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: > >On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: > >>On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: > >>>On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote: > On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: > >On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang > > wrote: > >>>The vcpus are just threads and may not be bound to physical CPUs, so > >>>what is the big picture here? Is the guest even in the position to > >>>set the best queue mappings today? > >> > >>Not sure it could publish the best mapping but the idea is to make sure the > >>packets of a flow were handled by the same guest vcpu and may be the same > >>vhost thread in order to eliminate the packet reordering and lock > >>contention. But this assumption does not take the bouncing of vhost or vcpu > >>threads which would also affect the result. > >Okay, this is why I'd like to know what the big picture here is. What > >solution are you proposing? How are we going to have everything from > >guest application, guest kernel, host threads, and host NIC driver > >play along so we get the right steering up the entire stack. I think > >there needs to be an answer to that before changing virtio-net to add > >any steering mechanism. > > > > > Yes. Also the current model of a vhost thread per VM's interface > doesn't help with packet steering > all the way from the guest to the host physical NIC. > > I think we need to have vhost thread(s) per-CPU that can handle > packets to/from physical NIC's > TX/RX queues. > Currently we have a single vhost thread for a VM's i/f > that handles all the packets from > various flows coming from a multi-queue physical NIC. > > Thanks > Sridhar It's not hard to try that: 1. revert c23f3445e68e1db0e74099f264bc5ff5d55ebdeb this will convert our thread to a workqueue 2. convert the workqueue to a per-cpu one It didn't work that well in the past, but YMMV On the surface I'd say a single thread makes some sense as long as guest uses a single queue. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/6/2011 5:15 AM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wangwrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Yes. Also the current model of a vhost thread per VM's interface doesn't help with packet steering all the way from the guest to the host physical NIC. I think we need to have vhost thread(s) per-CPU that can handle packets to/from physical NIC's TX/RX queues. Currently we have a single vhost thread for a VM's i/f that handles all the packets from various flows coming from a multi-queue physical NIC. Thanks Sridhar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Tue, Dec 6, 2011 at 10:21 AM, Jason Wang wrote: > On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: >> >> On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: >>> >>> On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: >> The vcpus are just threads and may not be bound to physical CPUs, so >> what is the big picture here? Is the guest even in the position to >> set the best queue mappings today? > > > Not sure it could publish the best mapping but the idea is to make sure the > packets of a flow were handled by the same guest vcpu and may be the same > vhost thread in order to eliminate the packet reordering and lock > contention. But this assumption does not take the bouncing of vhost or vcpu > threads which would also affect the result. Okay, this is why I'd like to know what the big picture here is. What solution are you proposing? How are we going to have everything from guest application, guest kernel, host threads, and host NIC driver play along so we get the right steering up the entire stack. I think there needs to be an answer to that before changing virtio-net to add any steering mechanism. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/06/2011 05:18 PM, Stefan Hajnoczi wrote: On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wangwrote: +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev->config->set(vdev, + offsetof(struct virtio_net_config_fd, addr), +&pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Using control virtqueue is more suitable but there's are also some problems: One problem is the interface, if we use control virtqueue, we need a interface between the backend and tap/macvtap to change the flow mapping. But qemu and vhost_net only know about the file descriptor, more informations or interfaces need to be exposed in order to let ethtool or ioctl work. QEMU could provide map a shared page with tap/macvtap. The difference would be that the guest<->host interface is still virtio and QEMU pokes values into the shared page on behalf of the guest. This makes sense. Another problem is the delay introduced by ctrl vq, as the ctrl vq would be used in the critical path in guest and it use busy wait to get the response, the delay is not neglectable. Then you need to find a better way of doing this. Can the host automatically associate the flow from the tx virtqueue packets are transmitted on? Does it make sense to add a virtio_net_hdr field that updates the queue mapping? It can but it can not properly handling the the packet re-ordering caused by the moving of guest applications among guest cpus. One more problem for virtio_net_hdr is we need to build a empty packet when there no other packet to send. One solution is to introduce unblock cmd for ctrl vq. The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Not sure it could publish the best mapping but the idea is to make sure the packets of a flow were handled by the same guest vcpu and may be the same vhost thread in order to eliminate the packet reordering and lock contention. But this assumption does not take the bouncing of vhost or vcpu threads which would also affect the result. Anyway, the mapping from guest was an important reference. Stefan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Tue, Dec 6, 2011 at 6:33 AM, Jason Wang wrote: > On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: >> >> On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: >>> >>> +static int virtnet_set_fd(struct net_device *dev, u32 pfn) >>> +{ >>> + struct virtnet_info *vi = netdev_priv(dev); >>> + struct virtio_device *vdev = vi->vdev; >>> + >>> + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { >>> + vdev->config->set(vdev, >>> + offsetof(struct virtio_net_config_fd, >>> addr), >>> +&pfn, sizeof(u32)); >> >> Please use the virtio model (i.e. virtqueues) instead of shared >> memory. Mapping a page breaks the virtio abstraction. > > > Using control virtqueue is more suitable but there's are also some problems: > > One problem is the interface, if we use control virtqueue, we need a > interface between the backend and tap/macvtap to change the flow mapping. > But qemu and vhost_net only know about the file descriptor, more > informations or interfaces need to be exposed in order to let ethtool or > ioctl work. QEMU could provide map a shared page with tap/macvtap. The difference would be that the guest<->host interface is still virtio and QEMU pokes values into the shared page on behalf of the guest. > Another problem is the delay introduced by ctrl vq, as the ctrl vq would be > used in the critical path in guest and it use busy wait to get the response, > the delay is not neglectable. Then you need to find a better way of doing this. Can the host automatically associate the flow from the tx virtqueue packets are transmitted on? Does it make sense to add a virtio_net_hdr field that updates the queue mapping? The vcpus are just threads and may not be bound to physical CPUs, so what is the big picture here? Is the guest even in the position to set the best queue mappings today? Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/06/2011 04:42 AM, Ben Hutchings wrote: On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote: In order to let the packets of a flow to be passed to the desired guest cpu, we can co-operate with devices through programming the flow director which was just a hash to queue table. This kinds of co-operation is done through the accelerate RFS support, a device specific flow sterring method virtnet_fd() is used to modify the flow director based on rfs mapping. The desired queue were calculated through reverse mapping of the irq affinity table. In order to parallelize the ingress path, irq affinity of rx queue were also provides by the driver. In addition to accelerate RFS, we can also use the guest scheduler to balance the load of TX and reduce the lock contention on egress path, so the processor_id() were used to tx queue selection. [...] +#ifdef CONFIG_RFS_ACCEL + +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, + u16 rxq_index, u32 flow_id) +{ + struct virtnet_info *vi = netdev_priv(net_dev); + u16 *table = NULL; + + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash) + return -EPROTONOSUPPORT; Why only IPv4? Oops, IPv6 should work also. + table = kmap_atomic(vi->fd_page); + table[skb->rxhash& TAP_HASH_MASK] = rxq_index; + kunmap_atomic(table); + + return 0; +} +#endif This is not a proper implementation of ndo_rx_flow_steer. If you steer a flow by changing the RSS table this can easily cause packet reordering in other flows. The filtering should be more precise, ideally matching exactly a single flow by e.g. VID and IP 5-tuple. I think you need to add a second hash table which records exactly which flow is supposed to be steered. Also, you must call rps_may_expire_flow() to check whether an entry in this table may be replaced; otherwise you can cause packet reordering in the flow that was previously being steered. Finally, this function must return the table index it assigned, so that rps_may_expire_flow() works. Thanks for the explanation, how about document this briefly in scaling.txt? +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) +{ + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : + smp_processor_id(); + + /* As we make use of the accelerate rfs which let the scheduler to +* balance the load, it make sense to choose the tx queue also based on +* theprocessor id? +*/ + while (unlikely(txq>= dev->real_num_tx_queues)) + txq -= dev->real_num_tx_queues; + return txq; +} [...] Don't do this, let XPS handle it. Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On 12/05/2011 06:55 PM, Stefan Hajnoczi wrote: On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev->config->set(vdev, + offsetof(struct virtio_net_config_fd, addr), +&pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Using control virtqueue is more suitable but there's are also some problems: One problem is the interface, if we use control virtqueue, we need a interface between the backend and tap/macvtap to change the flow mapping. But qemu and vhost_net only know about the file descriptor, more informations or interfaces need to be exposed in order to let ethtool or ioctl work. Another problem is the delay introduced by ctrl vq, as the ctrl vq would be used in the critical path in guest and it use busy wait to get the response, the delay is not neglectable. Stefan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, 2011-12-05 at 16:59 +0800, Jason Wang wrote: > In order to let the packets of a flow to be passed to the desired > guest cpu, we can co-operate with devices through programming the flow > director which was just a hash to queue table. > > This kinds of co-operation is done through the accelerate RFS support, > a device specific flow sterring method virtnet_fd() is used to modify > the flow director based on rfs mapping. The desired queue were > calculated through reverse mapping of the irq affinity table. In order > to parallelize the ingress path, irq affinity of rx queue were also > provides by the driver. > > In addition to accelerate RFS, we can also use the guest scheduler to > balance the load of TX and reduce the lock contention on egress path, > so the processor_id() were used to tx queue selection. [...] > +#ifdef CONFIG_RFS_ACCEL > + > +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, > +u16 rxq_index, u32 flow_id) > +{ > + struct virtnet_info *vi = netdev_priv(net_dev); > + u16 *table = NULL; > + > + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash) > + return -EPROTONOSUPPORT; Why only IPv4? > + table = kmap_atomic(vi->fd_page); > + table[skb->rxhash & TAP_HASH_MASK] = rxq_index; > + kunmap_atomic(table); > + > + return 0; > +} > +#endif This is not a proper implementation of ndo_rx_flow_steer. If you steer a flow by changing the RSS table this can easily cause packet reordering in other flows. The filtering should be more precise, ideally matching exactly a single flow by e.g. VID and IP 5-tuple. I think you need to add a second hash table which records exactly which flow is supposed to be steered. Also, you must call rps_may_expire_flow() to check whether an entry in this table may be replaced; otherwise you can cause packet reordering in the flow that was previously being steered. Finally, this function must return the table index it assigned, so that rps_may_expire_flow() works. > +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) > +{ > + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : > +smp_processor_id(); > + > + /* As we make use of the accelerate rfs which let the scheduler to > + * balance the load, it make sense to choose the tx queue also based on > + * theprocessor id? > + */ > + while (unlikely(txq >= dev->real_num_tx_queues)) > + txq -= dev->real_num_tx_queues; > + return txq; > +} [...] Don't do this, let XPS handle it. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC PATCH 5/5] virtio-net: flow director support
On Mon, Dec 5, 2011 at 8:59 AM, Jason Wang wrote: > +static int virtnet_set_fd(struct net_device *dev, u32 pfn) > +{ > + struct virtnet_info *vi = netdev_priv(dev); > + struct virtio_device *vdev = vi->vdev; > + > + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { > + vdev->config->set(vdev, > + offsetof(struct virtio_net_config_fd, addr), > + &pfn, sizeof(u32)); Please use the virtio model (i.e. virtqueues) instead of shared memory. Mapping a page breaks the virtio abstraction. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC PATCH 5/5] virtio-net: flow director support
In order to let the packets of a flow to be passed to the desired guest cpu, we can co-operate with devices through programming the flow director which was just a hash to queue table. This kinds of co-operation is done through the accelerate RFS support, a device specific flow sterring method virtnet_fd() is used to modify the flow director based on rfs mapping. The desired queue were calculated through reverse mapping of the irq affinity table. In order to parallelize the ingress path, irq affinity of rx queue were also provides by the driver. In addition to accelerate RFS, we can also use the guest scheduler to balance the load of TX and reduce the lock contention on egress path, so the processor_id() were used to tx queue selection. Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 165 +++- include/linux/virtio_net.h |6 ++ 2 files changed, 169 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0d871f8..89bb5e7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,6 +26,10 @@ #include #include #include +#include +#include +#include +#include static int napi_weight = 128; module_param(napi_weight, int, 0444); @@ -40,6 +44,7 @@ module_param(gso, bool, 0444); #define VIRTNET_SEND_COMMAND_SG_MAX2 #define VIRTNET_DRIVER_VERSION "1.0.0" +#define TAP_HASH_MASK 0xFF struct virtnet_send_stats { struct u64_stats_sync syncp; @@ -89,6 +94,9 @@ struct receive_queue { /* Active rx statistics */ struct virtnet_recv_stats __percpu *stats; + + /* FIXME: per vector instead of per queue ?? */ + cpumask_var_t affinity_mask; }; struct virtnet_info { @@ -110,6 +118,11 @@ struct virtnet_info { /* Host will pass rxhash to us. */ bool has_rxhash; + + /* A page of flow director */ + struct page *fd_page; + + cpumask_var_t affinity_mask; }; struct skb_vnet_hdr { @@ -386,6 +399,7 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len) if (vi->has_rxhash) skb->rxhash = hdr->rhdr.rxhash; + skb_record_rx_queue(skb, rq->vq->queue_index / 2); netif_receive_skb(skb); return; @@ -722,6 +736,19 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +static int virtnet_set_fd(struct net_device *dev, u32 pfn) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct virtio_device *vdev = vi->vdev; + + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_FD)) { + vdev->config->set(vdev, + offsetof(struct virtio_net_config_fd, addr), + &pfn, sizeof(u32)); + } + return 0; +} + static int virtnet_set_mac_address(struct net_device *dev, void *p) { struct virtnet_info *vi = netdev_priv(dev); @@ -1017,6 +1044,39 @@ static int virtnet_change_mtu(struct net_device *dev, int new_mtu) return 0; } +#ifdef CONFIG_RFS_ACCEL + +int virtnet_fd(struct net_device *net_dev, const struct sk_buff *skb, + u16 rxq_index, u32 flow_id) +{ + struct virtnet_info *vi = netdev_priv(net_dev); + u16 *table = NULL; + + if (skb->protocol != htons(ETH_P_IP) || !skb->rxhash) + return -EPROTONOSUPPORT; + + table = kmap_atomic(vi->fd_page); + table[skb->rxhash & TAP_HASH_MASK] = rxq_index; + kunmap_atomic(table); + + return 0; +} +#endif + +static u16 virtnet_select_queue(struct net_device *dev, struct sk_buff *skb) +{ + int txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : + smp_processor_id(); + + /* As we make use of the accelerate rfs which let the scheduler to +* balance the load, it make sense to choose the tx queue also based on +* theprocessor id? +*/ + while (unlikely(txq >= dev->real_num_tx_queues)) + txq -= dev->real_num_tx_queues; + return txq; +} + static const struct net_device_ops virtnet_netdev = { .ndo_open= virtnet_open, .ndo_stop= virtnet_close, @@ -1028,9 +1088,13 @@ static const struct net_device_ops virtnet_netdev = { .ndo_get_stats64 = virtnet_stats, .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, + .ndo_select_queue= virtnet_select_queue, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = virtnet_netpoll, #endif +#ifdef CONFIG_RFS_ACCEL + .ndo_rx_flow_steer = virtnet_fd, +#endif }; static void virtnet_update_status(struct virtnet_info *vi) @@ -1272,12 +1336,76 @@ static int virtnet_setup_vqs(struct virtnet_info *vi) return ret; } +static int virtnet_init_rx_cpu_rmap(struct virtnet_info *vi) +{ +#ifdef CONFIG_RFS_ACCEL