On 19.11.2019 17:16, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
>>
>> Since 'best-effort' is a default mode, users will not need to
>> explicitely set 'xdp-mode' in most cases.
>>
>> TCP related tests enabled back in system afxdp testsuite, because
>> 'best-effort' will choose 'native' mode for veth interfaces
>> and this mode has no issues with TCP.
> Patch in general looks good, two small comments inline.

Thanks for review.

> 
> The only thing that bothers me is the worse performance of the TAP interface 
> with the new default config. Can we somehow keep the old behavior for TAP 
> interfaces?

Could you check if TCP works over tap interfaces in generic mode?
For me the point is that correctness is better than performance.
I also hope that native implementation for tap will be improved
over time.

> 
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
>>
>> With this patch I modified the user-visible API, but I think it's OK
>> since it's still an experimental netdev.  Comments are welcome.
>>
>> Version 2:
>>   * Rebased on current master.
>>
>>  Documentation/intro/install/afxdp.rst |  54 ++++---
>>  NEWS                                  |  12 +-
>>  lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
>>  lib/netdev-afxdp.h                    |   9 ++
>>  lib/netdev-linux-private.h            |   8 +-
>>  tests/system-afxdp-macros.at          |   7 -
>>  vswitchd/vswitch.xml                  |  38 +++--
>>  7 files changed, 227 insertions(+), 124 deletions(-)
>>
>> diff --git a/Documentation/intro/install/afxdp.rst 
>> b/Documentation/intro/install/afxdp.rst
>> index a136db0c9..937770ad0 100644
>> --- a/Documentation/intro/install/afxdp.rst
>> +++ b/Documentation/intro/install/afxdp.rst
>> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>>    make check-afxdp TESTSUITEFLAGS='1'
>>
>>  .. note::
>> -   Not all test cases pass at this time. Currenly all TCP related
>> -   tests, ex: using wget or http, are skipped due to XDP limitations
>> -   on veth. cvlan test is also skipped.
>> +   Not all test cases pass at this time. Currenly all cvlan tests are 
>> skipped
>> +   due to kernel issues.
>>
>>  If a test case fails, check the log at::
>>
>> @@ -177,33 +176,35 @@ in :doc:`general`::
>>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>>
>>  Make sure your device driver support AF_XDP, netdev-afxdp supports
>> -the following additional options (see man ovs-vswitchd.conf.db for
>> +the following additional options (see ``man ovs-vswitchd.conf.db`` for
>>  more details):
>>
>> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
>> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
>> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
>> +   supported modes, so in most cases you don't need to change it.
>>
>> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise 
>> false.
>> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
>> +   otherwise ``false``.
>>
>>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
>> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
>> -The **xdpmode** can be "drv" or "skb"::
>> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
>> +``n_rxq``::
>>
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +                                   other_config:pmd-rxq-affinity="0:4"
>>
>>  Or, use 4 pmds/cores and 4 queues by doing::
>>
>>    ethtool -L enp2s0 combined 4
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=4 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>> +    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>>
>>  .. note::
>> -   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
>> +   ``pmd-rxq-affinity`` is optional. If not specified, system will 
>> auto-assign.
>> +   ``n_rxq`` equals ``1`` by default.
>>
>>  To validate that the bridge has successfully instantiated, you can use the::
>>
>> @@ -214,12 +215,21 @@ Should show something like::
>>    Port "ens802f0"
>>     Interface "ens802f0"
>>        type: afxdp
>> -      options: {n_rxq="1", xdpmode=drv}
>> +      options: {n_rxq="1"}
>>
>>  Otherwise, enable debugging by::
>>
>>    ovs-appctl vlog/set netdev_afxdp::dbg
>>
>> +To check which XDP mode was chosen by ``best-effort``, you can look for
>> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
>> +
>> +  # ovs-appctl dpctl/show
>> +  netdev@ovs-netdev:
>> +    <...>
>> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
>> +                      xdp-mode=best-effort,
>> +                      xdp-mode-in-use=native-with-zerocopy)
>>
>>  References
>>  ----------
>> @@ -323,8 +333,11 @@ Limitations/Known Issues
>>  #. Most of the tests are done using i40e single port. Multiple ports and
>>     also ixgbe driver also needs to be tested.
>>  #. No latency test result (TODO items)
>> -#. Due to limitations of current upstream kernel, TCP and various offloading
>> +#. Due to limitations of current upstream kernel, various offloading
>>     (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
>> +   Also, TCP is not working over virtual interfaces in generic XDP mode.
>> +   Some more information and possible workaround available `here
>> +   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ 
>> .
>>
>>
>>  PVP using tap device
>> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>>
>>  Start a VM with virtio and tap device::
>>
>> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>>
>>  Attach the veth port to br0 (linux kernel mode)::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 options:n_rxq=1
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>>
>> -Or, use AF_XDP with skb mode::
>> +Or, use AF_XDP::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
>>
>>  Setup the OpenFlow rules::
>>
>> diff --git a/NEWS b/NEWS
>> index 88b818948..d5f476d6e 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -5,11 +5,19 @@ Post-v2.12.0
>>         separate project. You can find it at
>>         https://github.com/ovn-org/ovn.git
>>     - Userspace datapath:
>> +     * Add option to enable, disable and query TCP sequence checking in
>> +       conntrack.
>> +   - AF_XDP:
>>       * New option 'use-need-wakeup' for netdev-afxdp to control enabling
>>         of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by 
>> default
>>         if supported by libbpf.
>> -     * Add option to enable, disable and query TCP sequence checking in
>> -       conntrack.
>> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
>> +       Modes also updated.  New values:
>> +         native-with-zerocopy  - former DRV
>> +         native                - new one, DRV without zero-copy
>> +         generic               - former SKB
>> +         best-effort [default] - new one, chooses the best available from
>> +                                 3 above modes
>>
>>  v2.12.0 - 03 Sep 2019
>>  ---------------------
>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>> index af654d498..74dde219d 100644
>> --- a/lib/netdev-afxdp.c
>> +++ b/lib/netdev-afxdp.c
>> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
>>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
>>
>>  static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
>> -                                             int mode, bool 
>> use_need_wakeup);
>> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
>> +                                             enum afxdp_mode mode,
>> +                                             bool use_need_wakeup,
>> +                                             bool report_socket_failures);
>> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
>>  static void xsk_destroy(struct xsk_socket_info *xsk);
>>  static int xsk_configure_all(struct netdev *netdev);
>>  static void xsk_destroy_all(struct netdev *netdev);
>>
>> +static struct {
>> +    const char *name;
>> +    uint32_t bind_flags;
>> +    uint32_t xdp_flags;
>> +} xdp_modes[] = {
>> +    [OVS_AF_XDP_MODE_UNSPEC] = {
>> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
>> +    },
>> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
>> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
>> +    },
> 
> <nitpicking> Maybe format the two entries above the same as below, as it’s 
> more pleasing to my OCD brain </nitpicking>

Not a big deal for me.  I just wanted to save some space.
Could be changed before applying.

> 
>> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
>> +        .name = "native-with-zerocopy",
>> +        .bind_flags = XDP_ZEROCOPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_NATIVE] = {
>> +        .name = "native",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_GENERIC] = {
>> +        .name = "generic",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
>> +    },
>> +};
>> +
>>  struct unused_pool {
>>      struct xsk_umem_info *umem_info;
>>      int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
>> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>>  }
>>
>>  static struct xsk_umem_info *
>> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>> +xsk_configure_umem(void *buffer, uint64_t size)
>>  {
>>      struct xsk_umem_config uconfig;
>>      struct xsk_umem_info *umem;
>> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int 
>> xdpmode)
>>      ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
>>                             &uconfig);
>>      if (ret) {
>> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV");
>> +        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
>>          free(umem);
>>          return NULL;
>>      }
>> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int 
>> xdpmode)
>>
>>  static struct xsk_socket_info *
>>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>> -                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
>> +                     uint32_t queue_id, enum afxdp_mode mode,
>> +                     bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_config cfg;
>>      struct xsk_socket_info *xsk;
>> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
>> uint32_t ifindex,
>>      cfg.rx_size = CONS_NUM_DESCS;
>>      cfg.tx_size = PROD_NUM_DESCS;
>>      cfg.libbpf_flags = 0;
>> -
>> -    if (xdpmode == XDP_ZEROCOPY) {
>> -        cfg.bind_flags = XDP_ZEROCOPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
>> -    } else {
>> -        cfg.bind_flags = XDP_COPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
>> -    }
>> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
>> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>>
>>  #ifdef HAVE_XDP_NEED_WAKEUP
>>      if (use_need_wakeup) {
>> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
>> uint32_t ifindex,
>>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
>>                               &xsk->rx, &xsk->tx, &cfg);
>>      if (ret) {
>> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
>> -                 "use-need-wakeup: %s qid: %d",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV",
>> -                 use_need_wakeup ? "true" : "false",
>> -                 queue_id);
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "xsk_socket__create failed (%s) mode: %s, "
>> +             "use-need-wakeup: %s, qid: %d",
>> +             ovs_strerror(errno), xdp_modes[mode].name,
>> +             use_need_wakeup ? "true" : "false", queue_id);
>>          free(xsk);
>>          return NULL;
>>      }
>> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
>> uint32_t ifindex,
>>  }
>>
>>  static struct xsk_socket_info *
>> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>> -              bool use_need_wakeup)
>> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
>> +              bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_info *xsk;
>>      struct xsk_umem_info *umem;
>> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>>
>>      /* Create AF_XDP socket. */
>> -    umem = xsk_configure_umem(bufs,
>> -                              NUM_FRAMES * FRAME_SIZE,
>> -                              xdpmode);
>> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>>      if (!umem) {
>>          free_pagealign(bufs);
>>          return NULL;
>> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>
>>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>>
>> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
>> -                               use_need_wakeup);
>> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
>> +                               use_need_wakeup, report_socket_failures);
>>      if (!xsk) {
>>          /* Clean up umem and xpacket pool. */
>>          if (xsk_umem__delete(umem->umem)) {
>> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
>> xdpmode,
>>      return xsk;
>>  }
>>
>> +static int
>> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
>> +                    enum afxdp_mode mode, bool report_socket_failures)
>> +{
>> +    struct xsk_socket_info *xsk_info;
>> +
>> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
>> +             dev->use_need_wakeup ? "true" : "false");
>> +    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
>> +                             report_socket_failures);
>> +    if (!xsk_info) {
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
>> +        dev->xsks[queue_id] = NULL;
>> +        return -1;
>> +    }
>> +    dev->xsks[queue_id] = xsk_info;
>> +    atomic_init(&xsk_info->tx_dropped, 0);
>> +    xsk_info->outstanding_tx = 0;
>> +    xsk_info->available_rx = PROD_NUM_DESCS;
>> +    return 0;
>> +}
>> +
>> +
>>  static int
>>  xsk_configure_all(struct netdev *netdev)
>>  {
>>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>> -    struct xsk_socket_info *xsk_info;
>>      int i, ifindex, n_rxq, n_txq;
>> +    int qid = 0;
>>
>>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>>
>> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>>      n_rxq = netdev_n_rxq(netdev);
>>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>>
>> -    /* Configure each queue. */
>> -    for (i = 0; i < n_rxq; i++) {
>> -        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
>> -                 netdev_get_name(netdev), i,
>> -                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
>> -                 dev->use_need_wakeup ? "true" : "false");
>> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
>> -                                 dev->use_need_wakeup);
>> -        if (!xsk_info) {
>> -            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
>> -            dev->xsks[i] = NULL;
>> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
>> +        /* Trying to configure first queue with different modes to
>> +         * find the most suitable. */
>> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
>> +            if (!xsk_configure_queue(dev, ifindex, qid, i,
>> +                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
>> +                dev->xdp_mode_in_use = i;
>> +                VLOG_INFO("%s: %s XDP mode will be in use.",
>> +                          netdev_get_name(netdev), xdp_modes[i].name);
>> +                break;
>> +            }
>> +        }
> 
> What if all modes fail? Guess it will use empty flags in 
> xsk_configure_socket(). I would suggest to do a check in 
> xsk_configure_socket() and bailout with a specific error?

What about the code on the next line?
It is here exactly for that purpose.  Am I missing something?

> 
>> +        if (i == OVS_AF_XDP_MODE_MAX) {
>> +            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
>> +                     netdev_get_name(netdev));
>> +            goto err;
>> +        }
>> +        qid++;
>> +    } else {
>> +        dev->xdp_mode_in_use = dev->xdp_mode;
>> +    }
>> +
>> +    /* Configure remaining queues. */
>> +    for (; qid < n_rxq; qid++) {
>> +        if (xsk_configure_queue(dev, ifindex, qid,
>> +                                dev->xdp_mode_in_use, true)) {
>> +            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
>> +                     netdev_get_name(netdev), qid);
>>              goto err;
>>          }
>> -        dev->xsks[i] = xsk_info;
>> -        atomic_init(&xsk_info->tx_dropped, 0);
>> -        xsk_info->outstanding_tx = 0;
>> -        xsk_info->available_rx = PROD_NUM_DESCS;
>>      }

<snip>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to