Re: [ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
On Wed, Dec 18, 2019 at 10:44:21AM +0800, yang_y_yi wrote: > Hi, William > > > I used OVS DPDK to test it, you shouldn't add tap interface to ovs DPDK > bridge if you use vdev to add tap, virtio_user is just for it, but that won't > use this receive function to receive packets. Right. I mean if you already use OVS-DPDK, you can create tap device using s.t like ovs-vsctl -- set interface dpdk-p0 type=dpdk \ options:dpdk-devargs=vdev:net_af_packet0,iface=dpdk-p0 Then you can get better veth performance around 2.3Gbps, without your patch. William > > At 2019-12-17 02:55:50, "William Tu" wrote: > >On Fri, Dec 06, 2019 at 02:09:24AM -0500, yang_y...@163.com wrote: > >> From: Yi Yang > >> > >> Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock > >> just receive single packet, that is very inefficient, per my test > >> case which adds two tap ports or veth ports into OVS bridge > >> (datapath_type=netdev) and use iperf3 to do performance test > >> between two ports (they are set into different network name space). > >> > >> The result is as below: > >> > >> tap: 295 Mbits/sec > >> veth: 207 Mbits/sec > >> > >> After I change netdev_linux_rxq_recv_tap and > >> netdev_linux_rxq_recv_sock to use batch process, the performance > >> is boosted by about 7 times, here is the result: > >> > >> tap: 1.96 Gbits/sec > >> veth: 1.47 Gbits/sec > >> > >> Undoubtedly this is a huge improvement although it can't match > >> OVS kernel datapath yet. > >> > >> FYI: here is thr result for OVS kernel datapath: > >> > >> tap: 37.2 Gbits/sec > >> veth: 36.3 Gbits/sec > >> > >> Note: performance result is highly related with your test machine > >> , you shouldn't expect the same results on your test machine. > >> > >> Signed-off-by: Yi Yang > > > >Hi Yi Yang, > > > >Are you testing this using OVS-DPDK? > >If you're using OVS-DPDK, then you should use DPDK's vdev to > >open and attach tap/veth device to OVS. I think you'll see much > >better performance. > > > >The performance issue you pointed out only happens when using > >userspace datapath without DPDK library, where afxdp is used. > >I'm still looking for a better solutions for faster interface > >for veth (af_packet) and tap. > > > >Thanks > >William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
Hi, William I used OVS DPDK to test it, you shouldn't add tap interface to ovs DPDK bridge if you use vdev to add tap, virtio_user is just for it, but that won't use this receive function to receive packets. At 2019-12-17 02:55:50, "William Tu" wrote: >On Fri, Dec 06, 2019 at 02:09:24AM -0500, yang_y...@163.com wrote: >> From: Yi Yang >> >> Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock >> just receive single packet, that is very inefficient, per my test >> case which adds two tap ports or veth ports into OVS bridge >> (datapath_type=netdev) and use iperf3 to do performance test >> between two ports (they are set into different network name space). >> >> The result is as below: >> >> tap: 295 Mbits/sec >> veth: 207 Mbits/sec >> >> After I change netdev_linux_rxq_recv_tap and >> netdev_linux_rxq_recv_sock to use batch process, the performance >> is boosted by about 7 times, here is the result: >> >> tap: 1.96 Gbits/sec >> veth: 1.47 Gbits/sec >> >> Undoubtedly this is a huge improvement although it can't match >> OVS kernel datapath yet. >> >> FYI: here is thr result for OVS kernel datapath: >> >> tap: 37.2 Gbits/sec >> veth: 36.3 Gbits/sec >> >> Note: performance result is highly related with your test machine >> , you shouldn't expect the same results on your test machine. >> >> Signed-off-by: Yi Yang > >Hi Yi Yang, > >Are you testing this using OVS-DPDK? >If you're using OVS-DPDK, then you should use DPDK's vdev to >open and attach tap/veth device to OVS. I think you'll see much >better performance. > >The performance issue you pointed out only happens when using >userspace datapath without DPDK library, where afxdp is used. >I'm still looking for a better solutions for faster interface >for veth (af_packet) and tap. > >Thanks >William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
On Fri, Dec 06, 2019 at 02:09:24AM -0500, yang_y...@163.com wrote: > From: Yi Yang > > Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock > just receive single packet, that is very inefficient, per my test > case which adds two tap ports or veth ports into OVS bridge > (datapath_type=netdev) and use iperf3 to do performance test > between two ports (they are set into different network name space). Thanks for the patch! This is an impressive performance improvement! Each call to netdev_linux_batch_rxq_recv_sock() now calls malloc() 32 times. This is expensive if only a few packets (or none) are received. Maybe it doesn't matter, but I wonder whether it affects performance. I think that no packets are freed on error. Fix: diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 9cb45d5c7d29..3414a6495ced 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1198,6 +1198,7 @@ netdev_linux_batch_rxq_recv_sock(int fd, int mtu, if (retval < 0) { /* Save -errno to retval temporarily */ retval = -errno; +i = 0; goto free_buffers; } To get sparse to work, one must fold in the following: diff --git a/include/sparse/sys/socket.h b/include/sparse/sys/socket.h index 4178f57e2bda..e954ade714b5 100644 --- a/include/sparse/sys/socket.h +++ b/include/sparse/sys/socket.h @@ -27,6 +27,7 @@ typedef unsigned short int sa_family_t; typedef __socklen_t socklen_t; +struct timespec; struct sockaddr { sa_family_t sa_family; @@ -171,4 +172,7 @@ int sockatmark(int); int socket(int, int, int); int socketpair(int, int, int, int[2]); +int sendmmsg(int, struct mmsghdr *, unsigned int, int); +int recvmmsg(int, struct mmsghdr *, unsigned int, int, struct timespec *); + #endif /* for sparse */ ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
On Fri, Dec 06, 2019 at 02:09:24AM -0500, yang_y...@163.com wrote: > From: Yi Yang > > Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock > just receive single packet, that is very inefficient, per my test > case which adds two tap ports or veth ports into OVS bridge > (datapath_type=netdev) and use iperf3 to do performance test > between two ports (they are set into different network name space). > > The result is as below: > > tap: 295 Mbits/sec > veth: 207 Mbits/sec > > After I change netdev_linux_rxq_recv_tap and > netdev_linux_rxq_recv_sock to use batch process, the performance > is boosted by about 7 times, here is the result: > > tap: 1.96 Gbits/sec > veth: 1.47 Gbits/sec > > Undoubtedly this is a huge improvement although it can't match > OVS kernel datapath yet. > > FYI: here is thr result for OVS kernel datapath: > > tap: 37.2 Gbits/sec > veth: 36.3 Gbits/sec > > Note: performance result is highly related with your test machine > , you shouldn't expect the same results on your test machine. > > Signed-off-by: Yi Yang Hi Yi Yang, Are you testing this using OVS-DPDK? If you're using OVS-DPDK, then you should use DPDK's vdev to open and attach tap/veth device to OVS. I think you'll see much better performance. The performance issue you pointed out only happens when using userspace datapath without DPDK library, where afxdp is used. I'm still looking for a better solutions for faster interface for veth (af_packet) and tap. Thanks William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
On Thu, Dec 5, 2019 at 11:09 PM wrote: > > From: Yi Yang > > Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock > just receive single packet, that is very inefficient, per my test > case which adds two tap ports or veth ports into OVS bridge > (datapath_type=netdev) and use iperf3 to do performance test > between two ports (they are set into different network name space). > > The result is as below: > > tap: 295 Mbits/sec > veth: 207 Mbits/sec > > After I change netdev_linux_rxq_recv_tap and > netdev_linux_rxq_recv_sock to use batch process, the performance > is boosted by about 7 times, here is the result: > > tap: 1.96 Gbits/sec > veth: 1.47 Gbits/sec > > Undoubtedly this is a huge improvement although it can't match > OVS kernel datapath yet. > > FYI: here is thr result for OVS kernel datapath: > > tap: 37.2 Gbits/sec > veth: 36.3 Gbits/sec > > Note: performance result is highly related with your test machine > , you shouldn't expect the same results on your test machine. Hi Yi Yang, Thanks for the patch, it's amazing with so much performance improvement. I haven't reviewed the code but Yifeng and I applied and tested this patch. Using netdev-afxdp + tap port, we do see performance improves from 300Mbps to 2Gbps in our testbed! Will add more feedback next week. William ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] Use batch process recv for tap and raw socket in netdev datapath
From: Yi Yang Current netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock just receive single packet, that is very inefficient, per my test case which adds two tap ports or veth ports into OVS bridge (datapath_type=netdev) and use iperf3 to do performance test between two ports (they are set into different network name space). The result is as below: tap: 295 Mbits/sec veth: 207 Mbits/sec After I change netdev_linux_rxq_recv_tap and netdev_linux_rxq_recv_sock to use batch process, the performance is boosted by about 7 times, here is the result: tap: 1.96 Gbits/sec veth: 1.47 Gbits/sec Undoubtedly this is a huge improvement although it can't match OVS kernel datapath yet. FYI: here is thr result for OVS kernel datapath: tap: 37.2 Gbits/sec veth: 36.3 Gbits/sec Note: performance result is highly related with your test machine , you shouldn't expect the same results on your test machine. Signed-off-by: Yi Yang --- lib/netdev-linux.c | 166 ++--- 1 file changed, 108 insertions(+), 58 deletions(-) diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index f8e59ba..9cb45d5 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -1151,90 +1151,146 @@ auxdata_has_vlan_tci(const struct tpacket_auxdata *aux) return aux->tp_vlan_tci || aux->tp_status & TP_STATUS_VLAN_VALID; } +/* + * Receive packets from raw socket in batch process for better performance, + * it can receive NETDEV_MAX_BURST packets at most once, the received + * packets are added into *batch. The return value is 0 or errno. + * + * It also used recvmmsg to reduce multiple syscalls overhead; + */ static int -netdev_linux_rxq_recv_sock(int fd, struct dp_packet *buffer) +netdev_linux_batch_rxq_recv_sock(int fd, int mtu, + struct dp_packet_batch *batch) { size_t size; ssize_t retval; -struct iovec iov; +struct iovec iovs[NETDEV_MAX_BURST]; struct cmsghdr *cmsg; union { struct cmsghdr cmsg; char buffer[CMSG_SPACE(sizeof(struct tpacket_auxdata))]; -} cmsg_buffer; -struct msghdr msgh; - -/* Reserve headroom for a single VLAN tag */ -dp_packet_reserve(buffer, VLAN_HEADER_LEN); -size = dp_packet_tailroom(buffer); - -iov.iov_base = dp_packet_data(buffer); -iov.iov_len = size; -msgh.msg_name = NULL; -msgh.msg_namelen = 0; -msgh.msg_iov = &iov; -msgh.msg_iovlen = 1; -msgh.msg_control = &cmsg_buffer; -msgh.msg_controllen = sizeof cmsg_buffer; -msgh.msg_flags = 0; +} cmsg_buffers[NETDEV_MAX_BURST]; +struct mmsghdr mmsgs[NETDEV_MAX_BURST]; +struct dp_packet *buffers[NETDEV_MAX_BURST]; +int i; + +for (i = 0; i < NETDEV_MAX_BURST; i++) { + buffers[i] = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, + DP_NETDEV_HEADROOM); + /* Reserve headroom for a single VLAN tag */ + dp_packet_reserve(buffers[i], VLAN_HEADER_LEN); + size = dp_packet_tailroom(buffers[i]); + iovs[i].iov_base = dp_packet_data(buffers[i]); + iovs[i].iov_len = size; + mmsgs[i].msg_hdr.msg_name = NULL; + mmsgs[i].msg_hdr.msg_namelen = 0; + mmsgs[i].msg_hdr.msg_iov = &iovs[i]; + mmsgs[i].msg_hdr.msg_iovlen = 1; + mmsgs[i].msg_hdr.msg_control = &cmsg_buffers[i]; + mmsgs[i].msg_hdr.msg_controllen = sizeof cmsg_buffers[i]; + mmsgs[i].msg_hdr.msg_flags = 0; +} do { -retval = recvmsg(fd, &msgh, MSG_TRUNC); +retval = recvmmsg(fd, mmsgs, NETDEV_MAX_BURST, MSG_TRUNC, NULL); } while (retval < 0 && errno == EINTR); if (retval < 0) { -return errno; -} else if (retval > size) { -return EMSGSIZE; +/* Save -errno to retval temporarily */ +retval = -errno; +goto free_buffers; } -dp_packet_set_size(buffer, dp_packet_size(buffer) + retval); - -for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg; cmsg = CMSG_NXTHDR(&msgh, cmsg)) { -const struct tpacket_auxdata *aux; - -if (cmsg->cmsg_level != SOL_PACKET -|| cmsg->cmsg_type != PACKET_AUXDATA -|| cmsg->cmsg_len < CMSG_LEN(sizeof(struct tpacket_auxdata))) { -continue; +for (i = 0; i < retval; i++) { +if (mmsgs[i].msg_len < ETH_HEADER_LEN) { +break; } -aux = ALIGNED_CAST(struct tpacket_auxdata *, CMSG_DATA(cmsg)); -if (auxdata_has_vlan_tci(aux)) { -struct eth_header *eth; -bool double_tagged; +dp_packet_set_size(buffers[i], + dp_packet_size(buffers[i]) + mmsgs[i].msg_len); + +for (cmsg = CMSG_FIRSTHDR(&mmsgs[i].msg_hdr); cmsg; + cmsg = CMSG_NXTHDR(&mmsgs[i].msg_hdr, cmsg)) { +const struct tpacket_auxdata *aux; -if (retval < ETH_HEADER_LEN) { -return EINVAL; +