On Thu, Apr 25, 2019 at 8:09 AM Ilya Maximets <i.maxim...@samsung.com>
wrote:

> Hi.
>
> This is not a full review. Just a bunch of thoughts.
>
> See inline.
>
> Best regards, Ilya Maximets.
>
> On 25.04.2019 2:47, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP is a new address family working together with eBPF/XDP.
> > A socket with AF_XDP family can receive and send raw packets
> > from an eBPF/XDP program attached to the netdev.
> > For details introduction and configuration, see
> > Documentation/intro/install/afxdp.rst
> >
> > Signed-off-by: William Tu <u9012...@gmail.com>
> > Signed-off-by: Yi-Hung Wei <yihung....@gmail.com>
> > Co-authored-by: Yi-Hung Wei <yihung....@gmail.com>
> > ---
> > v1->v2:
> > - add a list to maintain unused umem elements
> > - remove copy from rx umem to ovs internal buffer
> > - use hugetlb to reduce misses (not much difference)
> > - use pmd mode netdev in OVS (huge performance improve)
> > - remove malloc dp_packet, instead put dp_packet in umem
> >
> > v2->v3:
> > - rebase on the OVS master, 7ab4b0653784
> >   ("configure: Check for more specific function to pull in pthread
> library.")
> > - remove the dependency on libbpf and dpif-bpf.
> >   instead, use the built-in XDP_ATTACH feature.
> > - data structure optimizations for better performance, see[1]
> > - more test cases support
> > v3:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
> >
> > v3->v4:
> > - Use AF_XDP API provided by libbpf
> > - Remove the dependency on XDP_ATTACH kernel patch set
> > - Add documentation, bpf.rst
> >
> > v4->v5:
> > - rebase to master
> > - remove rfc, squash all into a single patch
> > - add --enable-afxdp, so by default, AF_XDP is not compiled
> > - add options: xdpmode=drv,skb
> > - add multiple queue and multiple PMD support, with options: n_rxq
> > - improve documentation, rename bpf.rst to af_xdp.rst
> >
> > v5->v6
> > - rebase to master, commit 0cdd5b13de91b98
> > - address errors from sparse and clang
> > - pass travis-ci test
> > - address feedback from Ben
> > - fix issues reported by 0-day robot
> > - improved documentation
> > ---
> >  Documentation/automake.mk             |   1 +
> >  Documentation/index.rst               |   1 +
> >  Documentation/intro/install/afxdp.rst | 366 +++++++++++++
> >  Documentation/intro/install/index.rst |   1 +
> >  acinclude.m4                          |  23 +
> >  configure.ac                          |   1 +
> >  lib/automake.mk                       |   7 +-
> >  lib/dp-packet.c                       |  16 +
> >  lib/dp-packet.h                       |  35 +-
> >  lib/dpif-netdev-perf.h                |  13 +
> >  lib/netdev-afxdp.c                    | 589 ++++++++++++++++++++
> >  lib/netdev-afxdp.h                    |  47 ++
> >  lib/netdev-linux.c                    |  89 +++-
> >  lib/netdev-linux.h                    |   1 +
> >  lib/netdev-provider.h                 |   1 +
> >  lib/netdev.c                          |   1 +
> >  lib/xdpsock.c                         | 210 ++++++++
> >  lib/xdpsock.h                         | 133 +++++
> >  tests/automake.mk                     |  17 +
> >  tests/system-afxdp-macros.at          | 153 ++++++
> >  tests/system-afxdp-testsuite.at       |  26 +
> >  tests/system-afxdp-traffic.at         | 978
> ++++++++++++++++++++++++++++++++++
> >  22 files changed, 2703 insertions(+), 6 deletions(-)
> >  create mode 100644 Documentation/intro/install/afxdp.rst
> >  create mode 100644 lib/netdev-afxdp.c
> >  create mode 100644 lib/netdev-afxdp.h
> >  create mode 100644 lib/xdpsock.c
> >  create mode 100644 lib/xdpsock.h
> >  create mode 100644 tests/system-afxdp-macros.at
> >  create mode 100644 tests/system-afxdp-testsuite.at
> >  create mode 100644 tests/system-afxdp-traffic.at
> >
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index 082438e09a33..11cc59efc881 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -10,6 +10,7 @@ DOC_SOURCE = \
> >       Documentation/intro/why-ovs.rst \
> >       Documentation/intro/install/index.rst \
> >       Documentation/intro/install/bash-completion.rst \
> > +     Documentation/intro/install/afxdp.rst \
> >       Documentation/intro/install/debian.rst \
> >       Documentation/intro/install/documentation.rst \
> >       Documentation/intro/install/distributions.rst \
> > diff --git a/Documentation/index.rst b/Documentation/index.rst
> > index 46261235c732..aa9e7c49f179 100644
> > --- a/Documentation/index.rst
> > +++ b/Documentation/index.rst
> > @@ -59,6 +59,7 @@ vSwitch? Start here.
> >    :doc:`intro/install/windows` |
> >    :doc:`intro/install/xenserver` |
> >    :doc:`intro/install/dpdk` |
> > +  :doc:`intro/install/afxdp` |
> >    :doc:`Installation FAQs <faq/releases>`
> >
> >  - **Tutorials:** :doc:`tutorials/faucet` |
> > diff --git a/Documentation/intro/install/afxdp.rst
> b/Documentation/intro/install/afxdp.rst
> > new file mode 100644
> > index 000000000000..a1e3317bbdb5
> > --- /dev/null
> > +++ b/Documentation/intro/install/afxdp.rst
> > @@ -0,0 +1,366 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> you may
> > +      not use this file except in compliance with the License. You may
> obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing,
> software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> > +      License for the specific language governing permissions and
> limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +
> > +========================
> > +Open vSwitch with AF_XDP
> > +========================
> > +
> > +This document describes how to build and install Open vSwitch using
> > +AF_XDP netdev.
> > +
> > +.. warning::
> > +  The AF_XDP support of Open vSwitch is considered 'experimental',
> > +  and it is not compiled in by default.
> > +
> > +Introduction
> > +------------
> > +AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket
> type
> > +built upon the eBPF and XDP technology.  It is aims to have comparable
> > +performance to DPDK but cooperate better with existing kernel's
> networking
> > +stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP
> program
> > +attached to the netdev, by-passing a couple of Linux kernel's
> subsystems.
> > +As a result, AF_XDP socket shows much better performance than AF_PACKET.
> > +For more details about AF_XDP, please see linux kernel's
> > +Documentation/networking/af_xdp.rst
> > +
> > +
> > +AF_XDP Netdev
> > +-------------
> > +OVS has a couple of netdev types, i.e., system, tap, or
> > +internal.  The AF_XDP feature adds a new netdev types called
> > +"afxdp", and implement its configuration, packet reception,
> > +and transmit functions.  Since the AF_XDP socket, xsk,
> > +operates in userspace, once ovs-vswitchd receives packets
> > +from xsk, the proposed architecture re-uses the existing
> > +userspace dpif-netdev datapath.  As a result, most of
> > +the packet processing happens at the userspace instead of
> > +linux kernel.
> > +
> > +::
> > +
> > +              |   +-------------------+
> > +              |   |    ovs-vswitchd   |<-->ovsdb-server
> > +              |   +-------------------+
> > +              |   |      ofproto      |<-->OpenFlow controllers
> > +              |   +--------+-+--------+
> > +              |   | netdev | |ofproto-|
> > +    userspace |   +--------+ |  dpif  |
> > +              |   | afxdp  | +--------+
> > +              |   | netdev | |  dpif  |
> > +              |   +---||---+ +--------+
> > +              |       ||     |  dpif- |
> > +              |       ||     | netdev |
> > +              |_      ||     +--------+
> > +                      ||
> > +               _  +---||-----+--------+
> > +              |   | AF_XDP prog +     |
> > +       kernel |   |   xsk_map         |
> > +              |_  +--------||---------+
> > +                           ||
> > +                        physical
> > +                           NIC
> > +
> > +
> > +Build requirements
> > +------------------
> > +
> > +In addition to the requirements described in :doc:`general`, building
> Open
> > +vSwitch with AF_XDP will require the following:
> > +
> > +- libbpf from kernel source tree (kernel 5.0.0 or later)
> > +
> > +- Linux kernel XDP support, with the following options (required)
> > +  ``_CONFIG_BPF=y``
> > +
> > +  ``_CONFIG_BPF_SYSCALL=y``
> > +
> > +  ``_CONFIG_XDP_SOCKETS=y``
> > +
> > +
> > +- The following optional Kconfig options are also recommended, but not
> > +  required:
> > +
> > +  ``_CONFIG_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_HAVE_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_XDP_SOCKETS_DIAG=y`` (Debugging)
> > +
> > +- If possible, run **./xdpsock -r -N -z -i <your device>** under
> > +  linux/samples/bpf.  This is the OVS indepedent benchmark tools for
> AF_XDP.
> > +  It makes sure your basic kernel requirements are met for AF_XDP.
> > +
> > +
> > +Installing
> > +----------
> > +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF
> support.
> > +Frist, clone a recent version of Linux bpf-next tree::
> > +
> > +  git clone git://
> git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> > +
> > +Second, go into the Linux source directory and build libbpf in the tools
> > +directory::
> > +
> > +  cd bpf-next/
> > +  cd tools/lib/bpf/
> > +  make && make install
> > +  make install_headers
> > +
> > +.. note::
> > +   Make sure xsk.h and bpf.h are installed in system's library path,
> > +   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
> > +
> > +Make sure the libbpf.so is installed correctly::
> > +
> > +  ldconfig
> > +  ldconfig -p | grep libbpf
> > +
> > +
> > +Third, ensure the standard OVS requirements are installed and
> > +bootstrap/configure the package::
> > +
> > +  ./boot.sh && ./configure --enable-afxdp
> > +
> > +Finally, build and install OVS::
> > +
> > +  make && make install
> > +
> > +To kick start end-to-end autotesting::
> > +
> > +  uname -a # make sure having 5.0+ kernel
> > +  make check-afxdp
> > +
> > +if a test case fails, check the log at::
> > +
> > +  cat
> tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
> > +
> > +
> > +Setup AF_XDP netdev
> > +-------------------
> > +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> > +set-up right::
> > +
> > +  ldd vswitchd/ovs-vswitchd
> > +
> > +Open vSwitch should be started using userspace datapath as described
> > +in :doc:`general`::
> > +
> > +  ovs-vswitchd --disable-system
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +.. note::
> > +   OVS AF_XDP netdev is using the userspace datapath, the same datapath
> > +   as used by OVS-DPDK.  So it requires --disable-system for
> ovs-vswitchd
> > +   and datapath_type=netdev when adding a new bridge.
> > +
> > +Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
> > +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> > +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> > +
> > +  ethtool -L enp2s0 combined 1
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=1 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:4"
> > +
> > +Or, use 4 pmds/cores and 4 queues by doing::
> > +
> > +  ethtool -L enp2s0 combined 4
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=4 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> > +
> > +To validate that the bridge has successfully instantiated, you can use
> the::
> > +
> > +  ovs-vsctl show
> > +
> > +should show something like::
> > +
> > +  Port "ens802f0"
> > +   Interface "ens802f0"
> > +      type: afxdp
> > +      options: {n_rxq="1", xdpmode=drv}
> > +
> > +Otherwise, enable debug by::
> > +
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +
> > +References
> > +----------
> > +Most of the design details are described in the paper presented at
> > +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
> > +section 4, and slides[2][4].
> > +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
> > +about AF_XDP current and future work.
> > +
> > +
> > +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
> > +
> > +[2]
> http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
> > +
> > +[3]
> http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
> > +
> > +[4]
> https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
> > +
> > +
> > +Performance Tuning
> > +------------------
> > +The name of the game is to keep your CPU running in userspace, allowing
> PMD
> > +to keep polling the AF_XDP queues without any interferences from kernel.
> > +
> > +#. Make sure everything is in the same NUMA node (memory used by
> AF_XDP, pmd
> > +   running cores, device plug-in slot)
> > +
> > +#. Isolate your CPU by doing isolcpu at grub configure.
> > +
> > +#. IRQ should not set to pmd running core.
> > +
> > +#. The Spectre and Meltdown fixes increase the overhead of system calls.
> > +
> > +Debugging performance issue
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +While running the traffic, use linux perf tool to see where your cpu
> > +spends its cycle::
> > +
> > +  cd bpf-next/tools/perf
> > +  make
> > +  ./perf record -p `pidof ovs-vswitchd` sleep 10
> > +  ./perf report
> > +
> > +Measure your system call rate by doing::
> > +
> > +  pstree -p `pidof ovs-vswitchd`
> > +  strace -c -p <your pmd's PID>
> > +
> > +Or, use OVS pmd tool::
> > +
> > +  ovs-appctl dpif-netdev/pmd-stats-show
> > +
> > +
> > +Example Script
> > +--------------
> > +
> > +Below is a script using namespaces and veth peer::
> > +
> > +  #!/bin/bash
> > +  ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
> > +    --disable-system --detach \
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 \
> > +    protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
> > +    fail-mode=secure datapath_type=netdev
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +  ip netns add at_ns0
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +  ip link add p0 type veth peer name afxdp-p0
> > +  ip link set p0 netns at_ns0
> > +  ip link set dev afxdp-p0 up
> > +  ovs-vsctl add-port br0 afxdp-p0 -- \
> > +    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
> > +
> > +  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.1/24" dev p0
> > +  ip link set dev p0 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns add at_ns1
> > +  ip link add p1 type veth peer name afxdp-p1
> > +  ip link set p1 netns at_ns1
> > +  ip link set dev afxdp-p1 up
> > +
> > +  ovs-vsctl add-port br0 afxdp-p1 -- \
> > +    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
> > +  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.2/24" dev p1
> > +  ip link set dev p1 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns exec at_ns0 ping -i .2 10.1.1.2
> > +
> > +
> > +Limitations/Known Issues
> > +------------------------
> > +#. Device's numa ID is always 0, need a way to find numa id from a
> netdev.
> > +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A
> possible
> > +   work-around is to use OpenFlow meter action.
> > +#. AF_XDP device added to bridge, remove, and added again will fail.
> > +#. Most of the tests are done using i40e single port. Multiple ports and
> > +   also ixgbe driver also needs to be tested.
> > +#. No latency test result (TODO items)
> > +
> > +
> > +make check-afxdp
> > +----------------
> > +When executing 'make check-afxdp', OVS creates namespaces, sets up
> AF_XDP on
> > +veth devices and kicks start the testing.  So far we have the following
> test
> > +cases::
> > +
> > + AF_XDP netdev datapath-sanity
> > +
> > +  1: datapath - ping between two ports               ok
> > +  2: datapath - ping between two ports on vlan       ok
> > +  3: datapath - ping6 between two ports              ok
> > +  4: datapath - ping6 between two ports on vlan      ok
> > +  5: datapath - ping over vxlan tunnel               ok
> > +  6: datapath - ping over vxlan6 tunnel              ok
> > +  7: datapath - ping over gre tunnel                 ok
> > +  8: datapath - ping over erspan v1 tunnel           ok
> > +  9: datapath - ping over erspan v2 tunnel           ok
> > + 10: datapath - ping over ip6erspan v1 tunnel        ok
> > + 11: datapath - ping over ip6erspan v2 tunnel        ok
> > + 12: datapath - ping over geneve tunnel              ok
> > + 13: datapath - ping over geneve6 tunnel             ok
> > + 14: datapath - clone action                         ok
> > + 15: datapath - basic truncate action                ok
> > +
> > + conntrack
> > +
> > + 16: conntrack - controller                          ok
> > + 17: conntrack - force commit                        ok
> > + 18: conntrack - ct flush by 5-tuple                 ok
> > + 19: conntrack - IPv4 ping                           ok
> > + 20: conntrack - get_nconns and get/set_maxconns     ok
> > + 21: conntrack - IPv6 ping                           ok
> > +
> > + system-ovn
> > +
> > + 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
> > + 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
> > + 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
> > + 25: ovn -- load-balancing                           ok
> > + 26: ovn -- load-balancing - same subnet.            ok
> > + 27: ovn -- load balancing in gateway router         ok
> > + 28: ovn -- multiple gateway routers, load-balancing ok
> > + 29: ovn -- load balancing in router with gateway router port ok
> > + 30: ovn -- DNAT and SNAT on distributed router - N/S ok
> > + 31: ovn -- DNAT and SNAT on distributed router - E/W ok
> > +
> > +
> > +Bug Reporting
> > +-------------
> > +
> > +Please report problems to d...@openvswitch.org.
> > diff --git a/Documentation/intro/install/index.rst
> b/Documentation/intro/install/index.rst
> > index 3193c736cf17..c27a9c9d16ff 100644
> > --- a/Documentation/intro/install/index.rst
> > +++ b/Documentation/intro/install/index.rst
> > @@ -45,6 +45,7 @@ Installation from Source
> >     xenserver
> >     userspace
> >     dpdk
> > +   afxdp
> >
> >  Installation from Packages
> >  --------------------------
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 301aeb70d82a..d80f2494d514 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
> >    ])
> >  ])
> >
> > +dnl OVS_CHECK_LINUX_AF_XDP
> > +dnl
> > +dnl Check both Linux kernel AF_XDP and libbpf support
> > +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> > +  AC_MSG_CHECKING([whether AF_XDP is supported])
> > +  AC_ARG_ENABLE([afxdp],
> > +                [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
> > +                [], [enable_afxdp=no])
> > +  AC_CHECK_HEADER([bpf/libbpf.h],
> > +                  [HAVE_LIBBPF=yes],
> > +                  [HAVE_LIBBPF=no])
> > +  AC_CHECK_HEADER([linux/if_xdp.h],
> > +                  [HAVE_IF_XDP=yes],
> > +                  [HAVE_IF_XDP=no])
> > +  AM_CONDITIONAL([SUPPORT_AF_XDP],
> > +                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" =
> yes && test "$HAVE_IF_XDP" = yes])
> > +  AM_COND_IF([SUPPORT_AF_XDP], [
> > +    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> > +    LIBBPF_LDADD=" -lbpf -lelf"
> > +    AC_SUBST([LIBBPF_LDADD])
> > +  ])
> > +])
> > +
>
> I think that configure should fail in case we have no required headers.
> It's confusing that I explicitly enabled afxdp, but OVS was built without
> its support.
> One more thing is that AC_MSG_CHECKING requires subsequent AC_MSG_RESULT,
> otherwise it will look not good.
>
> Suggesting following incremental:
>
> diff --git a/acinclude.m4 b/acinclude.m4
> index d80f2494d..c919af570 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -225,23 +225,26 @@ dnl OVS_CHECK_LINUX_AF_XDP
>  dnl
>  dnl Check both Linux kernel AF_XDP and libbpf support
>  AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> -  AC_MSG_CHECKING([whether AF_XDP is supported])
>    AC_ARG_ENABLE([afxdp],
>                  [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
>                  [], [enable_afxdp=no])
> -  AC_CHECK_HEADER([bpf/libbpf.h],
> -                  [HAVE_LIBBPF=yes],
> -                  [HAVE_LIBBPF=no])
> -  AC_CHECK_HEADER([linux/if_xdp.h],
> -                  [HAVE_IF_XDP=yes],
> -                  [HAVE_IF_XDP=no])
> -  AM_CONDITIONAL([SUPPORT_AF_XDP],
> -                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes
> && test "$HAVE_IF_XDP" = yes])
> -  AM_COND_IF([SUPPORT_AF_XDP], [
> -    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> +  AC_MSG_CHECKING([whether AF_XDP is enabled])
> +  if test "$enable_afxdp" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    AC_MSG_RESULT([yes])
> +
> +    AC_CHECK_HEADER([bpf/libbpf.h], [],
> +      [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
> +
> +    AC_CHECK_HEADER([linux/if_xdp.h], [],
> +      [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
> +
> +    AC_DEFINE([HAVE_AF_XDP], [1],
> +              [Define to 1 if AF-XDP support is available and enabled.])
>      LIBBPF_LDADD=" -lbpf -lelf"
>      AC_SUBST([LIBBPF_LDADD])
> -  ])
> +  fi
>  ])
>
>  dnl OVS_CHECK_DPDK
> ---
>
>
> >  dnl OVS_CHECK_DPDK
> >  dnl
> >  dnl Configure DPDK source tree
> > diff --git a/configure.ac b/configure.ac
> > index 505e3d041e93..29c90b73f836 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
> >  OVS_CHECK_DOT
> >  OVS_CHECK_IF_DL
> >  OVS_CHECK_STRTOK_R
> > +OVS_CHECK_LINUX_AF_XDP
> >  AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
> >  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct
> stat.st_mtimensec],
> >    [], [], [[#include <sys/stat.h>]])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index cc5dccf39d6b..8b9df5635bbe 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
> >
> >  lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
> >  lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
> > +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
> >
> >  if WIN32
> >  lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
> > @@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \
> >       lib/lldp/lldpd.c \
> >       lib/lldp/lldpd.h \
> >       lib/lldp/lldpd-structs.c \
> > -     lib/lldp/lldpd-structs.h
> > +     lib/lldp/lldpd-structs.h \
> > +     lib/xdpsock.c \
> > +     lib/xdpsock.h \
> > +     lib/netdev-afxdp.c \
> > +     lib/netdev-afxdp.h
>
> Maybe it's better to move all these files under #ifdef HAVE_AF_XDP ?
>
> >
> >  if WIN32
> >  lib_libopenvswitch_la_SOURCES += \
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index 0976a35e758b..a61552f72988 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -22,6 +22,9 @@
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/dynamic-string.h"
> >  #include "util.h"
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +#endif
> >
> >  static void
> >  dp_packet_init__(struct dp_packet *b, size_t allocated, enum
> dp_packet_source source)
> > @@ -122,6 +125,16 @@ dp_packet_uninit(struct dp_packet *b)
> >               * created as a dp_packet */
> >              free_dpdk_buf((struct dp_packet*) b);
> >  #endif
> > +        } else if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
>
> Why not making the same trick as we have for DPDK few lines above?
> i.e. wrap this part in a function like 'free_afxdp_buf' and move it
> to the netdev-afxdp.c ? You will not need to expose so many internals
> to generic code. dp_packet_cast_afxdp() will also be moved there along
> with 'struct dp_packet_afxdp'.
>
> BTW, I hope, someday, I'll finally implement 'dp-packet-memory-provider'
> abstraction for OVS.
>

Hi Ilya,

Can you share more detail about this idea, dp-packet-memory-provider?
Why do we need it?

Thanks
William


>
> > +            return;
> >          }
> >      }
> >  }
> > @@ -248,6 +261,8 @@ dp_packet_resize__(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom
> >      case DPBUF_STACK:
> >          OVS_NOT_REACHED();
> >
> > +    case DPBUF_AFXDP:
> > +        OVS_NOT_REACHED();
>
> Some space required between cases.
>
> >      case DPBUF_STUB:
> >          b->source = DPBUF_MALLOC;
> >          new_base = xmalloc(new_allocated);
> > @@ -433,6 +448,7 @@ dp_packet_steal_data(struct dp_packet *b)
> >  {
> >      void *p;
> >      ovs_assert(b->source != DPBUF_DPDK);
> > +    ovs_assert(b->source != DPBUF_AFXDP);
> >
> >      if (b->source == DPBUF_MALLOC && dp_packet_data(b) ==
> dp_packet_base(b)) {
> >          p = dp_packet_data(b);
> > diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> > index a5e9ade1244a..774728eef330 100644
> > --- a/lib/dp-packet.h
> > +++ b/lib/dp-packet.h
> > @@ -25,6 +25,10 @@
> >  #include <rte_mbuf.h>
> >  #endif
> >
> > +#ifdef HAVE_AF_XDP
> > +#include "lib/xdpsock.h"
> > +#endif
> > +
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/list.h"
> >  #include "packets.h"
> > @@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
> >      DPBUF_DPDK,                /* buffer data is from DPDK allocated
> memory.
> >                                  * ref to dp_packet_init_dpdk() in
> dp-packet.c.
> >                                  */
> > +    DPBUF_AFXDP,                /* buffer data from XDP frame */
>
> Please, move the comment one space left.
>
> >  };
> >
> >  #define DP_PACKET_CONTEXT_SIZE 64
> > @@ -89,6 +94,20 @@ struct dp_packet {
> >      };
> >  };
> >
> > +struct dp_packet_afxdp {
> > +    struct umem_pool *mpool;
> > +    struct dp_packet packet;
> > +};
> > +
> > +#if HAVE_AF_XDP
> > +static struct dp_packet_afxdp *
> > +dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
> > +{
> > +    ovs_assert(d->source == DPBUF_AFXDP);
> > +    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
> > +}
> > +#endif
> > +
> >  static inline void *dp_packet_data(const struct dp_packet *);
> >  static inline void dp_packet_set_data(struct dp_packet *, void *);
> >  static inline void *dp_packet_base(const struct dp_packet *);
> > @@ -183,7 +202,21 @@ dp_packet_delete(struct dp_packet *b)
> >              free_dpdk_buf((struct dp_packet*) b);
> >              return;
> >          }
> > -
> > +        if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            /* if a packet is received from afxdp port,
> > +             * and tx to a system port. Then we need to
> > +             * push the rx umem back here
> > +             */
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
> > +            return;
> > +        }
> >          dp_packet_uninit(b);
> >          free(b);
> >      }
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> > index 859c05613ddf..e47cf73bf3c9 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s)
> >  {
> >  #ifdef DPDK_NETDEV
> >      return s->last_tsc = rte_get_tsc_cycles();
> > +#elif HAVE_AF_XDP
> > +    union {
> > +        uint64_t tsc_64;
> > +        struct {
> > +            uint32_t lo_32;
> > +            uint32_t hi_32;
> > +        };
> > +    } tsc;
> > +    asm volatile("rdtsc" :
> > +             "=a" (tsc.lo_32),
> > +             "=d" (tsc.hi_32));
>
> We need to check that we're on x86 machine.
> Build should fail, I think. For now, you may add following code
> to the head of netdev-afxdp.c:
>
> #if !defined(__i386__) && !defined(__x86_64__)
> #error AF_XDP supported only for Linux on x86 or x86_64
> #endif
>
> > +
> > +    return s->last_tsc = tsc.tsc_64;
> >  #else
> >      return s->last_tsc = 0;
> >  #endif
> > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> > new file mode 100644
> > index 000000000000..4c71061fc102
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.c
> > @@ -0,0 +1,589 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#ifdef HAVE_AF_XDP
> > +#include "netdev-linux.h"
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <sys/types.h>
> > +#include <netinet/in.h>
> > +#include <arpa/inet.h>
> > +#include <inttypes.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/socket.h>
> > +#include <sys/utsname.h>
> > +#include <netpacket/packet.h>
> > +#include <net/if.h>
> > +#include <net/if_arp.h>
> > +#include <net/route.h>
> > +#include <poll.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +
> > +#include "coverage.h"
> > +#include "dp-packet.h"
> > +#include "dpif-netlink.h"
> > +#include "dpif-netdev.h"
> > +#include "openvswitch/dynamic-string.h"
> > +#include "fatal-signal.h"
> > +#include "hash.h"
> > +#include "openvswitch/hmap.h"
> > +#include "netdev-provider.h"
> > +#include "netdev-tc-offloads.h"
> > +#include "netdev-vport.h"
> > +#include "netlink-notifier.h"
> > +#include "netlink-socket.h"
> > +#include "netlink.h"
> > +#include "netnsid.h"
> > +#include "openvswitch/ofpbuf.h"
> > +#include "openflow/openflow.h"
> > +#include "ovs-atomic.h"
> > +#include "packets.h"
> > +#include "openvswitch/poll-loop.h"
> > +#include "rtnetlink.h"
> > +#include "openvswitch/shash.h"
> > +#include "socket-util.h"
> > +#include "sset.h"
> > +#include "tc.h"
> > +#include "timer.h"
> > +#include "unaligned.h"
> > +#include "openvswitch/vlog.h"
> > +#include "util.h"
> > +#include "netdev-afxdp.h"
> > +
> > +#include <linux/if_ether.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/types.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/mii.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/sockios.h>
> > +#include <linux/if_xdp.h>
> > +#include "xdpsock.h"
> > +
> > +#ifndef SOL_XDP
> > +#define SOL_XDP 283
> > +#endif
> > +#ifndef AF_XDP
> > +#define AF_XDP 44
> > +#endif
> > +#ifndef PF_XDP
> > +#define PF_XDP AF_XDP
> > +#endif
> > +
> > +VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > +
> > +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> > +#define UMEM2XPKT(base, i) \
> > +    ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
> > +    i * sizeof(struct dp_packet_afxdp))
> > +
> > +static uint32_t opt_xdp_bind_flags = XDP_COPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> > +#ifdef USE_DRVMODE_DEFAULT
>
> If I'll define this, build will fail.
> Should there be ifdef-else-end ?
>
> > +static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> > +#endif
> > +static uint32_t prog_id;
> > +
> > +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t
> size)
> > +{
> > +    struct xsk_umem_info *umem;
> > +    int ret;
> > +    int i;
> > +
> > +    umem = xcalloc(1, sizeof(*umem));
> > +    if (!umem) {
> > +        VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq,
> &umem->cq,
> > +                           NULL);
> > +
> > +    if (ret) {
> > +        VLOG_FATAL("xsk umem create failed (%s) mode: %s",
> > +            ovs_strerror(errno),
> > +            opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");
>
> Why so FATAL? Can we just return NULL and fail the
> netdev_linux_rxq_construct?
>
> > +    }
> > +
> > +    umem->buffer = buffer;
> > +
> > +    /* set-up umem pool */
> > +    umem_pool_init(&umem->mpool, NUM_FRAMES);
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct umem_elem *elem;
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)umem->buffer + i * FRAME_SIZE);
> > +        umem_elem_push(&umem->mpool, elem);
> > +    }
> > +
> > +    /* set-up metadata */
> > +    xpacket_pool_init(&umem->xpool, NUM_FRAMES);
> > +
> > +    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
> > +              umem->xpool.array,
> > +              (char *)umem->xpool.array +
> > +              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        xpacket = UMEM2XPKT(umem->xpool.array, i);
> > +        xpacket->mpool = &umem->mpool;
> > +
> > +        packet = &xpacket->packet;
> > +        packet->source = DPBUF_AFXDP;
> > +    }
> > +
> > +    return umem;
> > +}
> > +
> > +static struct xsk_socket_info *
> > +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> > +                     uint32_t queue_id)
> > +{
> > +    struct xsk_socket_config cfg;
> > +    struct xsk_socket_info *xsk;
> > +    char devname[IF_NAMESIZE];
> > +    uint32_t idx;
> > +    int ret;
> > +    int i;
> > +
> > +    xsk = xcalloc(1, sizeof(*xsk));
> > +    if (!xsk) {
> > +        VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    xsk->umem = umem;
> > +    cfg.rx_size = CONS_NUM_DESCS;
> > +    cfg.tx_size = PROD_NUM_DESCS;
> > +    cfg.libbpf_flags = 0;
> > +    cfg.xdp_flags = opt_xdp_flags;
> > +    cfg.bind_flags = opt_xdp_bind_flags;
> > +
> > +    if (if_indextoname(ifindex, devname) == NULL) {
> > +        VLOG_FATAL("ifindex %d devname failed (%s)",
> > +                   ifindex, ovs_strerror(errno));
>
> Every little misconfiguration will lead to aborting. It's probably OK
> for the experimantal feature, but I don't like this.
>
> > +    }
> > +
> > +    ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> > +                             &xsk->rx, &xsk->tx, &cfg);
> > +    if (ret) {
> > +        VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
> > +                   ovs_strerror(errno),
> > +                   opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
> > +                   queue_id);
> > +    }
> > +
> > +    /* make sure the XDP program is there */
> > +    ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
> > +    if (ret) {
> > +        VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> > +                                 PROD_NUM_DESCS,
> > +                                 &idx);
> > +    if (ret != PROD_NUM_DESCS) {
> > +        VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    for (i = 0;
> > +         i < PROD_NUM_DESCS * FRAME_SIZE;
> > +         i += FRAME_SIZE) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        addr = UMEM2DESC(elem, xsk->umem->buffer);
> > +
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
> > +    }
> > +
> > +    xsk_ring_prod__submit(&xsk->umem->fq,
> > +                          PROD_NUM_DESCS);
> > +    return xsk;
> > +}
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex, int xdp_queue_id)
> > +{
> > +    struct xsk_socket_info *xsk;
> > +    struct xsk_umem_info *umem;
> > +    void *bufs;
> > +    int ret;
> > +
> > +    ret = posix_memalign(&bufs, getpagesize(),
> > +                         NUM_FRAMES * FRAME_SIZE);
>
> In the future we'll need to use HAVE_POSIX_MEMALIGN, probably.
>
> Do we need to clear the allocated memory?
>
> > +    ovs_assert(!ret);
> > +
> > +    /* Create sockets... */
> > +    umem = xsk_configure_umem(bufs,
> > +                              NUM_FRAMES * FRAME_SIZE);
> > +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
> > +    return xsk;
> > +}
> > +
> > +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
> > +{
> > +    struct ds ds = DS_EMPTY_INITIALIZER;
> > +    ds_put_hex_dump(&ds, buf, count, 0, false);
> > +    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
> > +    ds_destroy(&ds);
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk)
> > +{
> > +    struct xsk_umem *umem;
> > +
> > +    if (!xsk) {
> > +        return;
> > +    }
> > +
> > +    umem = xsk->umem->umem;
> > +    xsk_socket__delete(xsk->xsk);
> > +    (void)xsk_umem__delete(umem);
> > +
> > +    /* cleanup umem pool */
> > +    umem_pool_cleanup(&xsk->umem->mpool);
> > +
> > +    /* cleanup metadata pool */
> > +    xpacket_pool_cleanup(&xsk->umem->xpool);
> > +}
> > +
> > +static inline void OVS_UNUSED
> > +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
> > +    struct xdp_statistics stat;
> > +    socklen_t optlen;
> > +
> > +    optlen = sizeof(stat);
>
> please don't paranthesize the argument of sizeof if it's name of variable.
>
> > +    ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP,
> XDP_STATISTICS,
> > +                &stat, &optlen) == 0);
> > +
> > +    VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid
> %llu",
> > +                     stat.rx_dropped,
> > +                     stat.rx_invalid_descs,
> > +                     stat.tx_invalid_descs);
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    const char *xdpmode;
> > +    int new_n_rxq;
> > +
> > +    /* TODO: add mutex lock */
> > +    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> > +
> > +    if (netdev->n_rxq != new_n_rxq) {
> > +
> > +        if (new_n_rxq > MAX_XSKQ) {
> > +            VLOG_WARN("set n_rxq %d too large", new_n_rxq);
> > +            goto out;
>
> Just return EINVAL.
>
> > +        }
> > +
> > +        netdev->n_rxq = new_n_rxq;
>
> This is wrong. You must not update netdev->n_rxq here. This should
> be done on reconfiguration.
>
> > +        VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name,
> new_n_rxq);
> > +        netdev_request_reconfigure(netdev);
> > +    }
> > +
> > +    xdpmode = smap_get(args, "xdpmode");
> > +    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
> > +        if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
> > +            opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +            opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_DRV_MODE;
> > +        }
> > +        VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
> > +    } else {
> > +        opt_xdp_bind_flags = XDP_COPY;
> > +        opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_SKB_MODE;
> > +        VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
> > +    }
>
> Looks like changing "xdpmode" while port already added will
> lead to incorrect work. You, probably, need to forbid this or
> prepare the proper reconfiguration process.
>
> > +
> > +out:
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
> > +{
> > +    /* TODO: add mutex lock */
> > +    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> > +    smap_add_format(args, "xdpmode", "%s",
> > +        opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
> > +
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev)
> > +{
> > +    /* FIXME: Get netdev's PCIe device ID, then find
> > +     * its NUMA node id.
> > +     */
> > +    VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
> > +    return 0;
> > +}
> > +
> > +void
> > +xsk_remove_xdp_program(uint32_t ifindex)
> > +{
> > +    uint32_t curr_prog_id = 0;
> > +
> > +    /* remove_xdp_program() */
> > +    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    }
> > +    if (prog_id == curr_prog_id) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    } else if (!curr_prog_id) {
> > +        VLOG_WARN("couldn't find a prog id on a given interface");
> > +    } else {
> > +        VLOG_WARN("program on interface changed, not removing");
> > +    }
> > +}
> > +
> > +/* Receive packet from AF_XDP socket */
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                     struct dp_packet_batch *batch)
> > +{
> > +    unsigned int rcvd, i;
> > +    uint32_t idx_rx = 0, idx_fq = 0;
> > +    int ret = 0;
> > +
> > +    /* See if there is any packet on RX queue,
> > +     * if yes, idx_rx is the index having the packet.
> > +     */
> > +    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> > +    if (!rcvd) {
> > +        return 0;
> > +    }
> > +
> > +    /* Form a dp_packet batch from descriptor in RX queue */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> > +        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
> > +        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> > +        uint64_t index;
> > +
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        index = addr >> FRAME_SHIFT;
> > +        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
> > +
> > +        packet = &xpacket->packet;
> > +        xpacket->mpool = &xsk->umem->mpool;
> > +
> > +        if (packet->source != DPBUF_AFXDP) {
> > +            /* FIXME: might be a bug */
>
> Need to log something here. Rate-limited.
>
> > +            continue;
> > +        }
> > +
> > +        /* Initialize the struct dp_packet */
> > +        if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
> > +            dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
> > +        } else {
> > +            /* SKB mode */
> > +            dp_packet_set_base(packet, pkt);
> > +        }
> > +        dp_packet_set_data(packet, pkt);
> > +        dp_packet_set_size(packet, len);
> > +
> > +        /* Add packet into batch, increase batch->count */
> > +        dp_packet_batch_add(batch, packet);
> > +
> > +        idx_rx++;
> > +    }
> > +
> > +    /* We've consume rcvd packets in RX, now re-fill the
> > +     * same number back to FILL queue.
> > +     */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t index;
> > +        struct umem_elem *elem;
> > +
> > +        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        while (ret == 0) {
> > +            /* The FILL queue is full, so retry. (or skip)? */
> > +            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        }
> > +
> > +        /* Get one free umem, program it into FILL queue */
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
> > +
> > +        idx_fq++;
> > +    }
> > +    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> > +
> > +    /* Release the RX queue */
> > +    xsk_ring_cons__release(&xsk->rx, rcvd);
> > +    xsk->rx_npkts += rcvd;
> > +
> > +#ifdef AFXDP_DEBUG
> > +    print_xsk_stat(xsk);
> > +#endif
> > +    return 0;
> > +}
> > +
> > +static void kick_tx(struct xsk_socket_info *xsk)
> > +{
> > +    int ret;
> > +
> > +    ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL,
> 0);
> > +    if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno ==
> EBUSY) {
> > +        return;
> > +    }
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                              struct dp_packet_batch *batch)
> > +{
> > +    uint32_t tx_done, idx_cq = 0;
> > +    struct dp_packet *packet;
> > +    uint32_t idx;
> > +    int j;
> > +
> > +    /* Make sure we have enough TX descs */
> > +    if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
> > +        return -EAGAIN;
> > +    }
> > +
> > +    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct umem_elem *elem;
> > +        uint64_t index;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        if (!elem) {
> > +            return -EAGAIN;
> > +        }
> > +
> > +        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> > +
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> > +            = dp_packet_size(packet);
> > +
> > +        if (packet->source == DPBUF_AFXDP) {
> > +            xpacket = dp_packet_cast_afxdp(packet);
> > +            umem_elem_push(xpacket->mpool, dp_packet_base(packet));
> > +             /* Avoid freeing it twice at dp_packet_uninit */
> > +            xpacket->mpool = NULL;
>
> Why you're freeing packets here? 'netdev_linux_send' will do that for you.
>
> > +        }
> > +    }
> > +    xsk_ring_prod__submit(&xsk->tx, batch->count);
> > +    xsk->outstanding_tx += batch->count;
> > +
> > +retry:
> > +    kick_tx(xsk);
> > +
> > +    /* Process CQ */
>
> Maybe it's better to process CQ on rx ?
> It's unknown when we'll be here next time, but we'll definitely
> call rx function soon.
>
> > +    tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count,
> &idx_cq);
> > +    if (tx_done > 0) {
> > +        xsk->outstanding_tx -= tx_done;
> > +        xsk->tx_npkts += tx_done;
> > +    }
> > +
> > +    /* Recycle back to umem pool */
> > +    for (j = 0; j < tx_done; j++) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)xsk->umem->buffer + addr);
> > +        umem_elem_push(&xsk->umem->mpool, elem);
> > +    }
> > +    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> > +
> > +    if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> > +        /* If there are still a lot not transmitted,
> > +         * try harder.
> > +         */
> > +        goto retry;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +#else
> > +#include "openvswitch/compiler.h"
> > +#include "netdev-afxdp.h"
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
> > +{
> > +}
> > +
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                     struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                              struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
> > +                        const struct smap *args OVS_UNUSED,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
> > +                        struct smap *args OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +#endif
> > diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> > new file mode 100644
> > index 000000000000..ea05612a7c0f
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.h
> > @@ -0,0 +1,47 @@
> > +/*
> > + * Copyright (c) 2018 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef NETDEV_AFXDP_H
> > +#define NETDEV_AFXDP_H 1
> > +
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +
> > +/* These functions are Linux AF_XDP specific, so they should be used
> directly
> > + * only by Linux-specific code. */
> > +#define MAX_XSKQ 16
> > +struct netdev;
> > +struct xsk_socket_info;
> > +struct xdp_umem;
> > +struct dp_packet_batch;
> > +struct smap;
> > +
> > +struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
> > +void xsk_destroy(struct xsk_socket_info *xsk);
> > +
> > +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                         struct dp_packet_batch *batch);
> > +
> > +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                                  struct dp_packet_batch *batch);
> > +
> > +void xsk_remove_xdp_program(uint32_t ifindex);
> > +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap
> *args,
> > +                            char **errp);
> > +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap
> *args);
> > +int netdev_afxdp_get_numa_id(const struct netdev *netdev);
> > +
> > +#endif /* netdev-afxdp.h */
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index f75d73fd39f8..337760ca3333 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -75,6 +75,7 @@
> >  #include "unaligned.h"
> >  #include "openvswitch/vlog.h"
> >  #include "util.h"
> > +#include "netdev-afxdp.h"
> >
> >  VLOG_DEFINE_THIS_MODULE(netdev_linux);
> >
> > @@ -531,6 +532,7 @@ struct netdev_linux {
> >
> >      /* LAG information. */
> >      bool is_lag_master;         /* True if the netdev is a LAG master.
> */
> > +    struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
> >  };
> >
> >  struct netdev_rxq_linux {
> > @@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class
> *netdev_class)
> >  }
> >
> >  static bool
> > +is_afxdp_netdev(const struct netdev *netdev)
> > +{
> > +    return netdev_get_class(netdev) == &netdev_afxdp_class;
> > +}
> > +
> > +static bool
> >  is_tap_netdev(const struct netdev *netdev)
> >  {
> >      return netdev_get_class(netdev) == &netdev_tap_class;
> >  }
> >
> > -static struct netdev_linux *
> > +struct netdev_linux *
> >  netdev_linux_cast(const struct netdev *netdev)
> >  {
> >      ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
> > @@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_)
> >          atomic_count_dec(&miimon_cnt);
> >      }
> >
> > +#if HAVE_AF_XDP
> > +    if (is_afxdp_netdev(netdev_)) {
> > +        int ifindex;
> > +        int ret, i;
> > +
> > +        ret = get_ifindex(netdev_, &ifindex);
> > +        if (ret) {
> > +            VLOG_ERR("get ifindex error");
> > +        } else {
> > +            for (i = 0; i < MAX_XSKQ; i++) {
> > +                if (netdev->xsk[i]) {
> > +                    VLOG_INFO("destroy xsk[%d]", i);
> > +                    xsk_destroy(netdev->xsk[i]);
> > +                }
> > +            }
> > +            xsk_remove_xdp_program(ifindex);
> > +        }
> > +    }
> > +#endif
> >      ovs_mutex_destroy(&netdev->mutex);
> >  }
> >
> > @@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq
> *rxq_)
> >      rx->is_tap = is_tap_netdev(netdev_);
> >      if (rx->is_tap) {
> >          rx->fd = netdev->tap_fd;
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +#if HAVE_AF_XDP
> > +        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> > +        int ifindex;
> > +        int xdp_queue_id = rxq_->queue_id;
> > +        struct xsk_socket_info *xsk;
> > +
> > +        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> > +            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
> > +                      ovs_strerror(errno));
> > +            ovs_assert(0);
> > +        }
> > +
> > +        VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
> > +                  __func__, netdev_->name, xdp_queue_id);
> > +
> > +        /* Get ethernet device index. */
> > +        error = get_ifindex(&netdev->up, &ifindex);
> > +        if (error) {
> > +            goto error;
> > +        }
> > +
> > +        xsk = xsk_configure(ifindex, xdp_queue_id);
> > +        netdev->xsk[xdp_queue_id] = xsk;
> > +        rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll
> */
> > +#endif
> >      } else {
> >          struct sockaddr_ll sll;
> >          int ifindex, val;
> > @@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >  {
> >      struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
> >      struct netdev *netdev = rx->up.netdev;
> > -    struct dp_packet *buffer;
> > +    struct dp_packet *buffer = NULL;
> >      ssize_t retval;
> >      int mtu;
> > +    struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
> > +
> > +    if (is_afxdp_netdev(netdev)) {
> > +        int qid = rxq_->queue_id;
> > +
> > +        return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
> > +    }
> >
> >      if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
> >          mtu = ETH_PAYLOAD_MAX;
> > @@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >      /* Assume Ethernet port. No need to set packet_type. */
> >      buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
> >                                             DP_NETDEV_HEADROOM);
> > +
> >      retval = (rx->is_tap
> >                ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
> >                : netdev_linux_rxq_recv_sock(rx->fd, buffer));
> > @@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev
> *netdev_,
> >   * The kernel maintains a packet transmission queue, so the caller is
> not
> >   * expected to do additional queuing of packets. */
> >  static int
> > -netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> > +netdev_linux_send(struct netdev *netdev_, int qid,
> >                    struct dp_packet_batch *batch,
> >                    bool concurrent_txq OVS_UNUSED)
> >  {
> >      int error = 0;
> >      int sock = 0;
> >
> > -    if (!is_tap_netdev(netdev_)) {
> > +    if (!is_tap_netdev(netdev_) &&
> > +        !is_afxdp_netdev(netdev_)) {
> >          if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_)))
> {
> >              error = EOPNOTSUPP;
> >              goto free_batch;
> > @@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid
> OVS_UNUSED,
> >          }
> >
> >          error = netdev_linux_sock_batch_send(sock, ifindex, batch);
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +        struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> > +
> > +        error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
> >      } else {
> >          error = netdev_linux_tap_batch_send(netdev_, batch);
> >      }
> > @@ -3323,6 +3389,7 @@ const struct netdev_class netdev_linux_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "system",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_linux_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3333,6 +3400,7 @@ const struct netdev_class netdev_linux_class = {
> >  const struct netdev_class netdev_tap_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      .type = "tap",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct_tap,
> >      .get_stats = netdev_tap_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3343,10 +3411,23 @@ const struct netdev_class netdev_internal_class
> = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "internal",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_internal_get_stats,
> >      .get_status = netdev_internal_get_status,
> >  };
> > +
> > +const struct netdev_class netdev_afxdp_class = {
> > +    NETDEV_LINUX_CLASS_COMMON,
> > +    .type = "afxdp",
> > +    .is_pmd = true,
> > +    .construct = netdev_linux_construct,
> > +    .get_stats = netdev_linux_get_stats,
> > +    .get_status = netdev_linux_get_status,
> > +    .set_config = netdev_afxdp_set_config,
> > +    .get_config = netdev_afxdp_get_config,
> > +    .get_numa_id = netdev_afxdp_get_numa_id,
> > +};
> >
> >
> >  #define CODEL_N_QUEUES 0x0000
> > diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
> > index 17ca9120168a..afcb20ee8d0a 100644
> > --- a/lib/netdev-linux.h
> > +++ b/lib/netdev-linux.h
> > @@ -28,6 +28,7 @@ struct netdev;
> >  int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
> >                                    const char *flag_name, bool enable);
> >  int linux_get_ifindex(const char *netdev_name);
> > +struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
> >
> >  #define LINUX_FLOW_OFFLOAD_API                          \
> >     .flow_flush = netdev_tc_flow_flush,                  \
> > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> > index fb0c27e6e8e8..5bf041316503 100644
> > --- a/lib/netdev-provider.h
> > +++ b/lib/netdev-provider.h
> > @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class;
> >  #endif
> >  extern const struct netdev_class netdev_internal_class;
> >  extern const struct netdev_class netdev_tap_class;
> > +extern const struct netdev_class netdev_afxdp_class;
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 7d7ecf6f0946..c30016b34033 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -145,6 +145,7 @@ netdev_initialize(void)
> >          netdev_register_provider(&netdev_linux_class);
> >          netdev_register_provider(&netdev_internal_class);
> >          netdev_register_provider(&netdev_tap_class);
> > +        netdev_register_provider(&netdev_afxdp_class);
> >          netdev_vport_tunnel_register();
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__NetBSD__)
> > diff --git a/lib/xdpsock.c b/lib/xdpsock.c
> > new file mode 100644
> > index 000000000000..f9fe94b9e36a
> > --- /dev/null
> > +++ b/lib/xdpsock.c
> > @@ -0,0 +1,210 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#include <config.h>
> > +#include <ctype.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <stdarg.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <syslog.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include "openvswitch/vlog.h"
> > +#include "async-append.h"
> > +#include "coverage.h"
> > +#include "dirs.h"
> > +#include "ovs-thread.h"
> > +#include "sat-math.h"
> > +#include "socket-util.h"
> > +#include "svec.h"
> > +#include "syslog-direct.h"
> > +#include "syslog-libc.h"
> > +#include "syslog-provider.h"
> > +#include "timeval.h"
> > +#include "unixctl.h"
> > +#include "util.h"
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/compiler.h"
> > +#include "dp-packet.h"
> > +
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +
> > +static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
> > +{
> > +    sl->locked = 0;
> > +}
> > +
> > +static inline void ovs_spin_lock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +
> > +    while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
> > +                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> > +        while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {
>
>
> These atomics are compiler specific. Please use:
>
>     while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                     memory_order_acquire,
>                                                     memory_order_relaxed))
> {
>         locked = 1;
>         while (locked) {
>             atomic_read_relaxed(&sl->locked, &locked);
>         }
>         exp = 0;
>     }
>
> > +            ;
> > +        }
> > +        exp = 0;
> > +    }
> > +}
> > +
> > +static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
> > +{
> > +    __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
>
>     atomic_store_explicit(&sl->locked, 0, memory_order_release);
>
> > +}
> > +
> > +static inline int OVS_UNUSED ovs_spin_trylock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +    return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
> > +              0, /* disallow spurious failure */
> > +               __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
>
>
>     return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                    memory_order_acquire,
>                                                    memory_order_relaxed);
>
>
> > +}
> > +
> > +void
> > +__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs,
> int n)
> > +{
> > +    void *ptr;
> > +
> > +    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(ptr, addrs, n * sizeof(void *));
> > +    umemp->index += n;
> > +}
> > +
> > +inline void
> > +__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +    umemp->array[umemp->index++] = addr;
> > +}
> > +
> > +void
> > +umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +
> > +    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
> > +        /* stack is full */
> > +        /* it's possible that one umem gets pushed twice,
> > +         * because actions=1,2,3... multiple ports?
> > +        */
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    __umem_elem_push(umemp, addr);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +}
> > +
> > +void
> > +__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int
> n)
> > +{
> > +    void *ptr;
> > +
> > +    umemp->index -= n;
> > +
> > +    if (OVS_UNLIKELY(umemp->index < 0)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(addrs, ptr, n * sizeof(void *));
> > +}
> > +
> > +inline void *
> > +__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    return umemp->array[--umemp->index];
> > +}
> > +
> > +void *
> > +umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    void *ptr;
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    ptr = __umem_elem_pop(umemp);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +
> > +    return ptr;
> > +}
> > +
> > +void **
> > +__umem_pool_alloc(unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(void *)) == 0);
> > +    memset(bufs, 0, size * sizeof(void *));
> > +    return (void **)bufs;
> > +}
> > +
> > +unsigned int
> > +umem_elem_count(struct umem_pool *mpool)
> > +{
> > +    return mpool->index;
> > +}
> > +
> > +int
> > +umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
> > +{
> > +    umemp->array = __umem_pool_alloc(size);
> > +    if (!umemp->array) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    umemp->size = size;
> > +    umemp->index = 0;
> > +    ovs_spinlock_init(&umemp->mutex);
> > +    return 0;
> > +}
> > +
> > +void
> > +umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    free(umemp->array);
> > +}
> > +
> > +/* AF_XDP metadata init/destroy */
> > +int
> > +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(struct dp_packet_afxdp)) ==
> 0);
> > +    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
> > +
> > +    xp->array = bufs;
> > +    xp->size = size;
> > +    return 0;
> > +}
> > +
> > +void
> > +xpacket_pool_cleanup(struct xpacket_pool *xp)
> > +{
> > +    free(xp->array);
> > +}
> > +#else   /* !HAVE_AF_XDP below */
> > +#endif
> > diff --git a/lib/xdpsock.h b/lib/xdpsock.h
> > new file mode 100644
> > index 000000000000..cb64befe7dba
> > --- /dev/null
> > +++ b/lib/xdpsock.h
> > @@ -0,0 +1,133 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#ifndef XDPSOCK_H
> > +#define XDPSOCK_H 1
> > +#include <errno.h>
> > +#include <getopt.h>
> > +#include <libgen.h>
> > +#include <linux/bpf.h>
> > +#include <linux/if_link.h>
> > +#include <linux/if_xdp.h>
> > +#include <linux/if_ether.h>
> > +#include <net/if.h>
> > +#include <signal.h>
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <net/ethernet.h>
> > +#include <sys/resource.h>
> > +#include <sys/socket.h>
> > +#include <sys/mman.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include <pthread.h>
> > +#include <locale.h>
> > +#include <sys/types.h>
> > +#include <poll.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/thread.h"
> > +
> > +/* bpf/xsk.h uses the following macros not defined in OVS,
> > + * so re-define them before include.
> > + */
> > +#define unlikely OVS_UNLIKELY
> > +#define likely OVS_LIKELY
> > +#define barrier() __asm__ __volatile__("": : :"memory")
> > +#define smp_rmb() barrier()
> > +#define smp_wmb() barrier()
>
> These barriers also x86 specific. We'll need to fix that in
> the future before removing build constraints.
>
> > +#include <bpf/xsk.h>
> > +
> > +#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
> > +#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
> > +#define BATCH_SIZE      NETDEV_MAX_BURST
> > +#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
> > +#define FRAME_SHIFT_MASK    ((1<<FRAME_SHIFT)-1)
> > +
> > +#define NUM_FRAMES  1024
> > +#define PROD_NUM_DESCS 128
> > +#define CONS_NUM_DESCS 128
> > +
> > +#ifdef USE_XSK_DEFAULT
> > +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
> > +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
> > +#endif
> > +
> > +typedef struct {
> > +    volatile int locked;
>
> atomic_int locked;
>
> or atomic_bool.
>
> > +} ovs_spinlock_t;
> > +
> > +/* LIFO ptr_array */
> > +struct umem_pool {
> > +    int index;      /* point to top */
> > +    unsigned int size;
> > +    ovs_spinlock_t mutex;
> > +    void **array;   /* a pointer array */
> > +};
> > +
> > +/* array-based dp_packet_afxdp */
> > +struct xpacket_pool {
> > +    unsigned int size;
> > +    struct dp_packet_afxdp **array;
> > +};
> > +
> > +struct xsk_umem_info {
> > +    struct umem_pool mpool;
> > +    struct xpacket_pool xpool;
> > +    struct xsk_ring_prod fq;
> > +    struct xsk_ring_cons cq;
> > +    struct xsk_umem *umem;
> > +    void *buffer;
> > +};
> > +
> > +struct xsk_socket_info {
> > +    struct xsk_ring_cons rx;
> > +    struct xsk_ring_prod tx;
> > +    struct xsk_umem_info *umem;
> > +    struct xsk_socket *xsk;
> > +    unsigned long rx_npkts;
> > +    unsigned long tx_npkts;
> > +    unsigned long prev_rx_npkts;
> > +    unsigned long prev_tx_npkts;
> > +    uint32_t outstanding_tx;
> > +};
> > +
> > +struct umem_elem_head {
> > +    unsigned int index;
> > +    struct ovs_mutex mutex;
> > +    uint32_t n;
> > +};
> > +
> > +struct umem_elem {
> > +    struct umem_elem *next;
> > +};
> > +
> > +void __umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void *__umem_elem_pop(struct umem_pool *umemp);
> > +void *umem_elem_pop(struct umem_pool *umemp);
> > +void **__umem_pool_alloc(unsigned int size);
> > +int umem_pool_init(struct umem_pool *umemp, unsigned int size);
> > +void umem_pool_cleanup(struct umem_pool *umemp);
> > +unsigned int umem_elem_count(struct umem_pool *mpool);
> > +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
> > +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
> > +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
> > +void xpacket_pool_cleanup(struct xpacket_pool *xp);
> > +
> > +#endif
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index ea16532dd2a0..715cef9a6b3b 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -4,12 +4,14 @@ EXTRA_DIST += \
> >       $(SYSTEM_TESTSUITE_AT) \
> >       $(SYSTEM_KMOD_TESTSUITE_AT) \
> >       $(SYSTEM_USERSPACE_TESTSUITE_AT) \
> > +     $(SYSTEM_AFXDP_TESTSUITE_AT) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
> >       $(SYSTEM_DPDK_TESTSUITE_AT) \
> >       $(OVSDB_CLUSTER_TESTSUITE_AT) \
> >       $(TESTSUITE) \
> >       $(SYSTEM_KMOD_TESTSUITE) \
> >       $(SYSTEM_USERSPACE_TESTSUITE) \
> > +     $(SYSTEM_AFXDP_TESTSUITE) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE) \
> >       $(SYSTEM_DPDK_TESTSUITE) \
> >       $(OVSDB_CLUSTER_TESTSUITE) \
> > @@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
> >       tests/system-userspace-macros.at \
> >       tests/system-userspace-packet-type-aware.at
> >
> > +SYSTEM_AFXDP_TESTSUITE_AT = \
> > +     tests/system-afxdp-testsuite.at \
> > +     tests/system-afxdp-traffic.at \
> > +     tests/system-afxdp-macros.at
> > +
> >  SYSTEM_TESTSUITE_AT = \
> >       tests/system-common-macros.at \
> >       tests/system-ovn.at \
> > @@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
> >  TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
> >  SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
> >  SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
> > +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
> >  SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
> >  SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
> >  OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
> > @@ -315,6 +323,11 @@ check-system-userspace: all
> >       set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> >
> > +check-afxdp: all
> > +     $(MAKE) install
> > +     set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> > +     "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> > +
> >  check-offloads: all
> >       set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> > @@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4
> $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> >
> > +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
> > +     $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> > +     $(AM_V_at)mv $@.tmp $@
> > +
> >  $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> > diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> > new file mode 100644
> > index 000000000000..2c58c2d6554b
> > --- /dev/null
> > +++ b/tests/system-afxdp-macros.at
> > @@ -0,0 +1,153 @@
> > +# _ADD_BR([name])
> > +#
> > +# Expands into the proper ovs-vsctl commands to create a bridge with the
> > +# appropriate type and properties
> > +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev
> protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
> fail-mode=secure ]])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
> > +#
> > +# Creates a database and starts ovsdb-server, starts ovs-vswitchd
> > +# connected to that database, calls ovs-vsctl to create a bridge named
> > +# br0 with predictable settings, passing 'vsctl-args' as additional
> > +# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
> > +# output (e.g. because it includes "create" commands) then
> 'vsctl-output'
> > +# specifies the expected output after filtering through uuidfilt.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_START],
> > +  [
> > +   export OVS_PKGDATADIR=$(`pwd`)
> > +   _OVS_VSWITCHD_START([--disable-system])
> > +   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
> uuidfilt])], [0], [$2])
> > +])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
> > +#
> > +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log
> files
> > +# for messages with severity WARN or higher and signaling an error if
> any
> > +# is present.  The optional WHITELIST may contain shell-quoted "sed"
> > +# commands to delete any warnings that are actually expected, e.g.:
> > +#
> > +#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
> > +#
> > +# 'extra_cmds' are shell commands to be executed afte
> OVS_VSWITCHD_STOP() is
> > +# invoked. They can be used to perform additional cleanups such as name
> space
> > +# removal.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
> > +  [OVS_VSWITCHD_STOP([dnl
> > +$1";/netdev_linux.*obtaining netdev stats via vport failed/d
> > +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist.
> The Open vSwitch kernel module is probably not loaded./d
> > +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
> > +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
> > +"])
> > +   AT_CHECK([:; $2])
> > +  ])
> > +
> > +m4_define([ADD_VETH_AFXDP],
> > +    [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
> > +      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
> > +      AT_CHECK([ip link set $1 netns $2])
> > +      AT_CHECK([ip link set dev ovs-$1 up])
> > +      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
> > +                set interface ovs-$1 external-ids:iface-id="$1"
> type="afxdp"])
> > +      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
> > +      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
> > +      if test -n "$5"; then
> > +        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
> > +      fi
> > +      if test -n "$6"; then
> > +        NS_CHECK_EXEC([$2], [ip route add default via $6])
> > +      fi
> > +      on_exit 'ip link del ovs-$1'
> > +    ]
> > +)
> > +
> > +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
> > +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> > +    ]
> > +)
> > +
> > +# CONFIGURE_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads for veths.  The userspace datapath uses the
> AF_PACKET
> > +# socket to receive packets for veths.  Unfortunately, the AF_PACKET
> socket
> > +# doesn't play well with offloads:
> > +# 1. GSO packets are received without segmentation and therefore
> discarded.
> > +# 2. Packets with offloaded partial checksum are received with the wrong
> > +#    checksum, therefore discarded by the receiver.
> > +#
> > +# By disabling tx offloads in the non-OVS side of the veth peer we make
> sure
> > +# that the AF_PACKET socket will not receive bad packets.
> > +#
> > +# This is a workaround, and should be removed when offloads are properly
> > +# supported in netdev-linux.
> > +m4_define([CONFIGURE_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
> > +)
> > +
> > +# CHECK_CONNTRACK()
> > +#
> > +# Perform requirements checks for running conntrack tests.
> > +#
> > +m4_define([CHECK_CONNTRACK],
> > +    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
> > +)
> > +
> > +# CHECK_CONNTRACK_ALG()
> > +#
> > +# Perform requirements checks for running conntrack ALG tests. The
> userspace
> > +# supports FTP and TFTP.
> > +#
> > +m4_define([CHECK_CONNTRACK_ALG])
> > +
> > +# CHECK_CONNTRACK_FRAG()
> > +#
> > +# Perform requirements checks for running conntrack fragmentations
> tests.
> > +# The userspace doesn't support fragmentation yet, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_FRAG],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_LOCAL_STACK()
> > +#
> > +# Perform requirements checks for running conntrack tests with local
> stack.
> > +# While the kernel connection tracker automatically passes all the
> connection
> > +# tracking state from an internal port to the OpenvSwitch kernel
> module, there
> > +# is simply no way of doing that with the userspace, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_NAT()
> > +#
> > +# Perform requirements checks for running conntrack NAT tests. The
> userspace
> > +# datapath supports NAT.
> > +#
> > +m4_define([CHECK_CONNTRACK_NAT])
> > +
> > +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
> > +#
> > +# Perform requirements checks for running ovs-dpctl flush-conntrack by
> > +# conntrack 5-tuple test. The userspace datapath does not support
> > +# this feature yet.
> > +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
> > +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this
> feature.
> > +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
> > +
> > +# CHECK_CT_DPIF_GET_NCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
> > +# userspace datapath does support this feature.
> > +m4_define([CHECK_CT_DPIF_GET_NCONNS])
> > diff --git a/tests/system-afxdp-testsuite.at b/tests/
> system-afxdp-testsuite.at
> > new file mode 100644
> > index 000000000000..538c0d15d556
> > --- /dev/null
> > +++ b/tests/system-afxdp-testsuite.at
> > @@ -0,0 +1,26 @@
> > +AT_INIT
> > +
> > +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
> > +
> > +Licensed under the Apache License, Version 2.0 (the "License");
> > +you may not use this file except in compliance with the License.
> > +You may obtain a copy of the License at:
> > +
> > +    http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +Unless required by applicable law or agreed to in writing, software
> > +distributed under the License is distributed on an "AS IS" BASIS,
> > +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > +See the License for the specific language governing permissions and
> > +limitations under the License.])
> > +
> > +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
> > +
> > +m4_include([tests/ovs-macros.at])
> > +m4_include([tests/ovsdb-macros.at])
> > +m4_include([tests/ofproto-macros.at])
> > +m4_include([tests/system-afxdp-macros.at])
> > +m4_include([tests/system-common-macros.at])
> > +
> > +m4_include([tests/system-afxdp-traffic.at])
> > +m4_include([tests/system-ovn.at])
> > diff --git a/tests/system-afxdp-traffic.at b/tests/
> system-afxdp-traffic.at
> > new file mode 100644
> > index 000000000000..26f72acf48ef
> > --- /dev/null
> > +++ b/tests/system-afxdp-traffic.at
> > @@ -0,0 +1,978 @@
> > +AT_BANNER([AF_XDP netdev datapath-sanity])
> > +
> > +AT_SETUP([datapath - ping between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ulimit -l unlimited
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
> > +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
> > +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan tunnel])
> > +OVS_CHECK_VXLAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [
> 10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [id 0 dstport 4789])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan6 tunnel])
> > +OVS_CHECK_VXLAN_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over gre tunnel])
> > +OVS_CHECK_GRE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1
> options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=123 options:erspan_ver=1
> options:erspan_idx=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24], [local fc00:100::1 seq key 123
> erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=121 options:erspan_ver=2
> options:erspan_dir=0 options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24],
> > +                   [local fc00:100::1 seq key 121 erspan_ver 2
> erspan_dir ingress erspan_hwid 0x7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve tunnel])
> > +OVS_CHECK_GENEVE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [vni 0])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve6 tunnel])
> > +OVS_CHECK_GENEVE_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - clone action])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
> > +                    -- set interface ovs-p1 ofport_request=2])
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1 actions=NORMAL
> > +priority=10
> in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst),
> output:2
> > +priority=10
> in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst,
> controller), output:1
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - basic truncate action])
> > +AT_SKIP_IF([test $HAVE_NC = no])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +
> > +dnl Create p0 and ovs-p0(1)
> > +ADD_NAMESPACES(at_ns0)
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
> > +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
> > +
> > +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will
> appear in p1
> > +AT_CHECK([ip link add p1 type veth peer name ovs-p1])
> > +on_exit 'ip link del ovs-p1'
> > +AT_CHECK([ip link set dev ovs-p1 up])
> > +AT_CHECK([ip link set dev p1 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1
> ofport_request=2])
> > +dnl Use p1 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1
> ofport_request=3])
> > +
> > +dnl Create p2(5) and ovs-p2(4)
> > +AT_CHECK([ip link add p2 type veth peer name ovs-p2])
> > +on_exit 'ip link del ovs-p2'
> > +AT_CHECK([ip link set dev ovs-p2 up])
> > +AT_CHECK([ip link set dev p2 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2
> ofport_request=4])
> > +dnl Use p2 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2
> ofport_request=5])
> > +
> > +dnl basic test
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +dnl use this file as payload file for ncat
> > +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2>
> /dev/null])
> > +on_exit 'rm -f payload200.bin'
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl packet with truncated size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" |  sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=100
> > +])
> > +dnl packet with original size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=242
> > +])
> > +
> > +dnl more complicated output actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +dnl SLOW_ACTION: disable kernel datapath truncate support
> > +dnl Repeat the test above, but exercise the SLOW_ACTION code path
> > +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
> > +
> > +dnl SLOW_ACTION test1: check datapatch actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-appctl ofproto/trace br0
> "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"],
> [0], [stdout])
> > +AT_CHECK([tail -3 stdout], [0],
> > +[Datapath actions:
> trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
> > +This flow is handled by the userspace slow path because it:
> > +  - Uses action(s) not supported by datapath.
> > +])
> > +
> > +dnl SLOW_ACTION test2: check actual packet truncate
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> > +AT_BANNER([conntrack])
> > +
> > +AT_SETUP([conntrack - controller])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> > +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1
> ct\(commit\),controller
> '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action)
> data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42
> ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - force commit])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(force,commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> >
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
> > +table=1,in_port=2,ct_state=+trk,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via
> action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42
> ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +dnl
> > +dnl Check that the directionality has been changed by force commit.
> > +dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
> > +])
> > +
> > +dnl OK, now send another packet from port 1 and see that it switches
> again
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - ct flush by 5-tuple])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),2
> > +priority=100,in_port=2,udp,action=ct(zone=5,commit),1
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Test UDP from port 1
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [1], [dnl
> > +])
> > +
> > +dnl Test UDP from port 2
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5
> 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > +])
> > +
> > +dnl Test ICMP traffic
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [stdout])
> > +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
> >
> +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
> > +])
> > +
> > +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
> >
> +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [1], [dnl
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv4 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - get_nconns and get/set_maxconns])
> > +CHECK_CONNTRACK()
> > +CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +CHECK_CT_DPIF_GET_NCONNS()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +1
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +3000000
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
> > +setting maxconns successful
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +0
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv6 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +
> > +dnl ICMPv6 echo request and reply go to table 1.  The rest of the
> traffic goes
> > +dnl through normal action.
> > +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
> > +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
> > +table=0,priority=1,action=normal
> > +
> > +dnl Allow everything from ns0->ns1. Only allow return traffic from
> ns1->ns0.
> > +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
> > +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
> > +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
> > +table=1,priority=1,action=drop
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +dnl The above ping creates state in the connection tracker.  We're not
> > +dnl interested in that state.
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0],
> [dnl
> >
> +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> >
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to