On 19.11.2019 17:16, Eelco Chaudron wrote: > > > On 7 Nov 2019, at 12:36, Ilya Maximets wrote: > >> Until now there was only two options for XDP mode in OVS: SKB or DRV. >> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'. >> >> Devices like 'veth' interfaces in Linux supports native XDP, but >> doesn't support zero-copy mode. This case can not be covered by >> existing API and we have to use slower generic XDP for such devices. >> There are few more issues, e.g. TCP is not supported in generic XDP >> mode for veth interfaces due to kernel limitations, however it is >> supported in native mode. >> >> This change introduces ability to use native XDP without zero-copy >> along with best-effort configuration option that enabled by default. >> In best-effort case OVS will sequentially try different modes starting >> from the fastest one and will choose the first acceptable for current >> interface. This will guarantee the best possible performance. >> >> If user will want to choose specific mode, it's still possible by >> setting the 'options:xdp-mode'. >> >> This change additionally changes the API by renaming the configuration >> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes >> themselves to be more user-friendly. >> >> The full list of currently supported modes: >> * native-with-zerocopy - former DRV >> * native - new one, DRV without zero-copy >> * generic - former SKB >> * best-effort - new one, chooses the best available from >> 3 above modes >> >> Since 'best-effort' is a default mode, users will not need to >> explicitely set 'xdp-mode' in most cases. >> >> TCP related tests enabled back in system afxdp testsuite, because >> 'best-effort' will choose 'native' mode for veth interfaces >> and this mode has no issues with TCP. > Patch in general looks good, two small comments inline.
Thanks for review. > > The only thing that bothers me is the worse performance of the TAP interface > with the new default config. Can we somehow keep the old behavior for TAP > interfaces? Could you check if TCP works over tap interfaces in generic mode? For me the point is that correctness is better than performance. I also hope that native implementation for tap will be improved over time. > >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> --- >> >> With this patch I modified the user-visible API, but I think it's OK >> since it's still an experimental netdev. Comments are welcome. >> >> Version 2: >> * Rebased on current master. >> >> Documentation/intro/install/afxdp.rst | 54 ++++--- >> NEWS | 12 +- >> lib/netdev-afxdp.c | 223 ++++++++++++++++---------- >> lib/netdev-afxdp.h | 9 ++ >> lib/netdev-linux-private.h | 8 +- >> tests/system-afxdp-macros.at | 7 - >> vswitchd/vswitch.xml | 38 +++-- >> 7 files changed, 227 insertions(+), 124 deletions(-) >> >> diff --git a/Documentation/intro/install/afxdp.rst >> b/Documentation/intro/install/afxdp.rst >> index a136db0c9..937770ad0 100644 >> --- a/Documentation/intro/install/afxdp.rst >> +++ b/Documentation/intro/install/afxdp.rst >> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting:: >> make check-afxdp TESTSUITEFLAGS='1' >> >> .. note:: >> - Not all test cases pass at this time. Currenly all TCP related >> - tests, ex: using wget or http, are skipped due to XDP limitations >> - on veth. cvlan test is also skipped. >> + Not all test cases pass at this time. Currenly all cvlan tests are >> skipped >> + due to kernel issues. >> >> If a test case fails, check the log at:: >> >> @@ -177,33 +176,35 @@ in :doc:`general`:: >> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev >> >> Make sure your device driver support AF_XDP, netdev-afxdp supports >> -the following additional options (see man ovs-vswitchd.conf.db for >> +the following additional options (see ``man ovs-vswitchd.conf.db`` for >> more details): >> >> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode. >> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``, >> + ``native`` or ``generic``. Defaults to ``best-effort``, i.e. best of >> + supported modes, so in most cases you don't need to change it. >> >> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise >> false. >> + * ``use-need-wakeup``: default ``true`` if libbpf supports it, >> + otherwise ``false``. >> >> For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device, >> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**. >> -The **xdpmode** can be "drv" or "skb":: >> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and >> +``n_rxq``:: >> >> ethtool -L enp2s0 combined 1 >> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 >> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ >> - options:n_rxq=1 options:xdpmode=drv \ >> - other_config:pmd-rxq-affinity="0:4" >> + other_config:pmd-rxq-affinity="0:4" >> >> Or, use 4 pmds/cores and 4 queues by doing:: >> >> ethtool -L enp2s0 combined 4 >> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 >> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ >> - options:n_rxq=4 options:xdpmode=drv \ >> - other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4" >> + options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4" >> >> .. note:: >> - pmd-rxq-affinity is optional. If not specified, system will auto-assign. >> + ``pmd-rxq-affinity`` is optional. If not specified, system will >> auto-assign. >> + ``n_rxq`` equals ``1`` by default. >> >> To validate that the bridge has successfully instantiated, you can use the:: >> >> @@ -214,12 +215,21 @@ Should show something like:: >> Port "ens802f0" >> Interface "ens802f0" >> type: afxdp >> - options: {n_rxq="1", xdpmode=drv} >> + options: {n_rxq="1"} >> >> Otherwise, enable debugging by:: >> >> ovs-appctl vlog/set netdev_afxdp::dbg >> >> +To check which XDP mode was chosen by ``best-effort``, you can look for >> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``:: >> + >> + # ovs-appctl dpctl/show >> + netdev@ovs-netdev: >> + <...> >> + port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true, >> + xdp-mode=best-effort, >> + xdp-mode-in-use=native-with-zerocopy) >> >> References >> ---------- >> @@ -323,8 +333,11 @@ Limitations/Known Issues >> #. Most of the tests are done using i40e single port. Multiple ports and >> also ixgbe driver also needs to be tested. >> #. No latency test result (TODO items) >> -#. Due to limitations of current upstream kernel, TCP and various offloading >> +#. Due to limitations of current upstream kernel, various offloading >> (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair). >> + Also, TCP is not working over virtual interfaces in generic XDP mode. >> + Some more information and possible workaround available `here >> + <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ >> . >> >> >> PVP using tap device >> @@ -335,8 +348,7 @@ First, start OVS, then add physical port:: >> ethtool -L enp2s0 combined 1 >> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 >> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ >> - options:n_rxq=1 options:xdpmode=drv \ >> - other_config:pmd-rxq-affinity="0:4" >> + options:n_rxq=1 other_config:pmd-rxq-affinity="0:4" >> >> Start a VM with virtio and tap device:: >> >> @@ -414,13 +426,11 @@ Create namespace and veth peer devices:: >> >> Attach the veth port to br0 (linux kernel mode):: >> >> - ovs-vsctl add-port br0 afxdp-p0 -- \ >> - set interface afxdp-p0 options:n_rxq=1 >> + ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 >> >> -Or, use AF_XDP with skb mode:: >> +Or, use AF_XDP:: >> >> - ovs-vsctl add-port br0 afxdp-p0 -- \ >> - set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb >> + ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp" >> >> Setup the OpenFlow rules:: >> >> diff --git a/NEWS b/NEWS >> index 88b818948..d5f476d6e 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -5,11 +5,19 @@ Post-v2.12.0 >> separate project. You can find it at >> https://github.com/ovn-org/ovn.git >> - Userspace datapath: >> + * Add option to enable, disable and query TCP sequence checking in >> + conntrack. >> + - AF_XDP: >> * New option 'use-need-wakeup' for netdev-afxdp to control enabling >> of corresponding 'need_wakeup' flag in AF_XDP rings. Enabled by >> default >> if supported by libbpf. >> - * Add option to enable, disable and query TCP sequence checking in >> - conntrack. >> + * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'. >> + Modes also updated. New values: >> + native-with-zerocopy - former DRV >> + native - new one, DRV without zero-copy >> + generic - former SKB >> + best-effort [default] - new one, chooses the best available from >> + 3 above modes >> >> v2.12.0 - 03 Sep 2019 >> --------------------- >> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c >> index af654d498..74dde219d 100644 >> --- a/lib/netdev-afxdp.c >> +++ b/lib/netdev-afxdp.c >> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS); >> #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base)) >> >> static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id, >> - int mode, bool >> use_need_wakeup); >> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode); >> + enum afxdp_mode mode, >> + bool use_need_wakeup, >> + bool report_socket_failures); >> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode); >> static void xsk_destroy(struct xsk_socket_info *xsk); >> static int xsk_configure_all(struct netdev *netdev); >> static void xsk_destroy_all(struct netdev *netdev); >> >> +static struct { >> + const char *name; >> + uint32_t bind_flags; >> + uint32_t xdp_flags; >> +} xdp_modes[] = { >> + [OVS_AF_XDP_MODE_UNSPEC] = { >> + .name = "unspecified", .bind_flags = 0, .xdp_flags = 0, >> + }, >> + [OVS_AF_XDP_MODE_BEST_EFFORT] = { >> + .name = "best-effort", .bind_flags = 0, .xdp_flags = 0, >> + }, > > <nitpicking> Maybe format the two entries above the same as below, as it’s > more pleasing to my OCD brain </nitpicking> Not a big deal for me. I just wanted to save some space. Could be changed before applying. > >> + [OVS_AF_XDP_MODE_NATIVE_ZC] = { >> + .name = "native-with-zerocopy", >> + .bind_flags = XDP_ZEROCOPY, >> + .xdp_flags = XDP_FLAGS_DRV_MODE, >> + }, >> + [OVS_AF_XDP_MODE_NATIVE] = { >> + .name = "native", >> + .bind_flags = XDP_COPY, >> + .xdp_flags = XDP_FLAGS_DRV_MODE, >> + }, >> + [OVS_AF_XDP_MODE_GENERIC] = { >> + .name = "generic", >> + .bind_flags = XDP_COPY, >> + .xdp_flags = XDP_FLAGS_SKB_MODE, >> + }, >> +}; >> + >> struct unused_pool { >> struct xsk_umem_info *umem_info; >> int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */ >> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) >> } >> >> static struct xsk_umem_info * >> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode) >> +xsk_configure_umem(void *buffer, uint64_t size) >> { >> struct xsk_umem_config uconfig; >> struct xsk_umem_info *umem; >> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int >> xdpmode) >> ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq, >> &uconfig); >> if (ret) { >> - VLOG_ERR("xsk_umem__create failed (%s) mode: %s", >> - ovs_strerror(errno), >> - xdpmode == XDP_COPY ? "SKB": "DRV"); >> + VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno)); >> free(umem); >> return NULL; >> } >> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int >> xdpmode) >> >> static struct xsk_socket_info * >> xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex, >> - uint32_t queue_id, int xdpmode, bool use_need_wakeup) >> + uint32_t queue_id, enum afxdp_mode mode, >> + bool use_need_wakeup, bool report_socket_failures) >> { >> struct xsk_socket_config cfg; >> struct xsk_socket_info *xsk; >> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, >> uint32_t ifindex, >> cfg.rx_size = CONS_NUM_DESCS; >> cfg.tx_size = PROD_NUM_DESCS; >> cfg.libbpf_flags = 0; >> - >> - if (xdpmode == XDP_ZEROCOPY) { >> - cfg.bind_flags = XDP_ZEROCOPY; >> - cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE; >> - } else { >> - cfg.bind_flags = XDP_COPY; >> - cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE; >> - } >> + cfg.bind_flags = xdp_modes[mode].bind_flags; >> + cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; >> >> #ifdef HAVE_XDP_NEED_WAKEUP >> if (use_need_wakeup) { >> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, >> uint32_t ifindex, >> ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem, >> &xsk->rx, &xsk->tx, &cfg); >> if (ret) { >> - VLOG_ERR("xsk_socket__create failed (%s) mode: %s " >> - "use-need-wakeup: %s qid: %d", >> - ovs_strerror(errno), >> - xdpmode == XDP_COPY ? "SKB": "DRV", >> - use_need_wakeup ? "true" : "false", >> - queue_id); >> + VLOG(report_socket_failures ? VLL_ERR : VLL_DBG, >> + "xsk_socket__create failed (%s) mode: %s, " >> + "use-need-wakeup: %s, qid: %d", >> + ovs_strerror(errno), xdp_modes[mode].name, >> + use_need_wakeup ? "true" : "false", queue_id); >> free(xsk); >> return NULL; >> } >> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, >> uint32_t ifindex, >> } >> >> static struct xsk_socket_info * >> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, >> - bool use_need_wakeup) >> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode, >> + bool use_need_wakeup, bool report_socket_failures) >> { >> struct xsk_socket_info *xsk; >> struct xsk_umem_info *umem; >> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, >> memset(bufs, 0, NUM_FRAMES * FRAME_SIZE); >> >> /* Create AF_XDP socket. */ >> - umem = xsk_configure_umem(bufs, >> - NUM_FRAMES * FRAME_SIZE, >> - xdpmode); >> + umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE); >> if (!umem) { >> free_pagealign(bufs); >> return NULL; >> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode, >> >> VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem); >> >> - xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode, >> - use_need_wakeup); >> + xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode, >> + use_need_wakeup, report_socket_failures); >> if (!xsk) { >> /* Clean up umem and xpacket pool. */ >> if (xsk_umem__delete(umem->umem)) { >> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int >> xdpmode, >> return xsk; >> } >> >> +static int >> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id, >> + enum afxdp_mode mode, bool report_socket_failures) >> +{ >> + struct xsk_socket_info *xsk_info; >> + >> + VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.", >> + netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name, >> + dev->use_need_wakeup ? "true" : "false"); >> + xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup, >> + report_socket_failures); >> + if (!xsk_info) { >> + VLOG(report_socket_failures ? VLL_ERR : VLL_DBG, >> + "%s: Failed to create AF_XDP socket on queue %d in %s mode.", >> + netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name); >> + dev->xsks[queue_id] = NULL; >> + return -1; >> + } >> + dev->xsks[queue_id] = xsk_info; >> + atomic_init(&xsk_info->tx_dropped, 0); >> + xsk_info->outstanding_tx = 0; >> + xsk_info->available_rx = PROD_NUM_DESCS; >> + return 0; >> +} >> + >> + >> static int >> xsk_configure_all(struct netdev *netdev) >> { >> struct netdev_linux *dev = netdev_linux_cast(netdev); >> - struct xsk_socket_info *xsk_info; >> int i, ifindex, n_rxq, n_txq; >> + int qid = 0; >> >> ifindex = linux_get_ifindex(netdev_get_name(netdev)); >> >> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev) >> n_rxq = netdev_n_rxq(netdev); >> dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks); >> >> - /* Configure each queue. */ >> - for (i = 0; i < n_rxq; i++) { >> - VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.", >> - netdev_get_name(netdev), i, >> - dev->xdpmode == XDP_COPY ? "SKB" : "DRV", >> - dev->use_need_wakeup ? "true" : "false"); >> - xsk_info = xsk_configure(ifindex, i, dev->xdpmode, >> - dev->use_need_wakeup); >> - if (!xsk_info) { >> - VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i); >> - dev->xsks[i] = NULL; >> + if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { >> + /* Trying to configure first queue with different modes to >> + * find the most suitable. */ >> + for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) { >> + if (!xsk_configure_queue(dev, ifindex, qid, i, >> + i == OVS_AF_XDP_MODE_MAX - 1)) { >> + dev->xdp_mode_in_use = i; >> + VLOG_INFO("%s: %s XDP mode will be in use.", >> + netdev_get_name(netdev), xdp_modes[i].name); >> + break; >> + } >> + } > > What if all modes fail? Guess it will use empty flags in > xsk_configure_socket(). I would suggest to do a check in > xsk_configure_socket() and bailout with a specific error? What about the code on the next line? It is here exactly for that purpose. Am I missing something? > >> + if (i == OVS_AF_XDP_MODE_MAX) { >> + VLOG_ERR("%s: Failed to detect suitable XDP mode.", >> + netdev_get_name(netdev)); >> + goto err; >> + } >> + qid++; >> + } else { >> + dev->xdp_mode_in_use = dev->xdp_mode; >> + } >> + >> + /* Configure remaining queues. */ >> + for (; qid < n_rxq; qid++) { >> + if (xsk_configure_queue(dev, ifindex, qid, >> + dev->xdp_mode_in_use, true)) { >> + VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.", >> + netdev_get_name(netdev), qid); >> goto err; >> } >> - dev->xsks[i] = xsk_info; >> - atomic_init(&xsk_info->tx_dropped, 0); >> - xsk_info->outstanding_tx = 0; >> - xsk_info->available_rx = PROD_NUM_DESCS; >> } <snip> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev