The patch series introduces AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF.
In short, a socket with AF_XDP family can receive and send
packets from an eBPF/XDP program attached to the netdev.
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst

OVS has a couple of netdev types, i.e., system, tap, or
internal.  The patch first adds a new netdev types called
"afxdp", and implement its configuration, packet reception,
and transmit functions.  Since the AF_XDP socket, xsk,
operates in userspace, once ovs-vswitchd receives packets
from xsk, the proposed architecture re-uses the existing
userspace dpif-netdev datapath.  As a result, most of
the packet processing happens at the userspace instead of
linux kernel.

Architecure
===========
               _
              |   +-------------------+
              |   |    ovs-vswitchd   |<-->ovsdb-server
              |   +-------------------+
              |   |      ofproto      |<-->OpenFlow controllers
              |   +--------+-+--------+ 
              |   | netdev | |ofproto-|
    userspace |   +--------+ |  dpif  |
              |   | netdev | +--------+
              |   |provider| |  dpif  |
              |   +---||---+ +--------+
              |       ||     |  dpif- |
              |       ||     | netdev |
              |_      ||     +--------+  
                      ||         
               _  +---||-----+--------+
              |   | af_xdp prog +     |
       kernel |   |   xsk_map         |
              |_  +--------||---------+
                           ||
                        physical
                           NIC

To simply start, create a ovs userspace bridge using dpif-netdev
by setting the datapath_type to netdev:
# ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev

And attach a linux netdev with type afxdp:
# ovs-vsctl add-port br0 afxdp-p0 -- \
    set interface afxdp-p0 type="afxdp"

Most of the implementation follows the AF_XDP sample code
in Linux kernel under samples/bpf/xdpsock_user.c.

Configuration
=============
When a new afxdp netdev is added to OVS, the patch does
the following configuration
1) attach the afxdp program and map to the netdev (see bpf/xdp.h)
2) create an AF_XDP socket (XSK) for thie netdev
3) allocate a virtual contiguous memory region, called umem, and
   register this memory to the XSK
4) setup the rx/tx ring, and umem's fill/completion ring    

Packet Flow
===========
Currently, the af_xdp program loaded to the netdev does nothing
but simply forwards the packet to queue id 0.

The patch simplifies the buffer/ring management by introducing
a copy from umem to ovs's internal buffer, when receiving a
packet.  And when sending the packet out to another netdev,
copying the packet to the netdev's umem.

An AF_XDP packet forwarding from one netdev (ovs-p0) to another
netdev (ovs-p1) goes through the following path:
1) xdp program at ovs-p0 copies packet to kernel (SKB_MODE)
2) kernel maps the packet to userspace umem
3) ovs-vswitchd receive the packet from ovs-p0, copy to internal
   packet buffer
4) ovs-vswitchd copies the pachet to umem of ovs-p1, kick_tx
5) kernel copies the packet from umem to ovs-p1 tx queue

Since the total number of copies between two ports is 4,
the performance will be bad so I don't want to do it.
Hopefully by using AF_XDP zero copy mode, 1) and 5) will
be removed and in ovs-vswitchd, it's possible to combine
the 3) and 4) to only one copy. So the best case will be
one copy between two netdev.

Test Framework
==============
# make check-afxdp
will kick start two end-to-end tests using veth peer
and namespaces:

AFXDP netdev datapath-sanity
  1: datapath - ping between two ports               ok
  2: datapath - http between two ports               ok

The patch series is based on the ovs-ebpf implementaion.
A copy is put at: https://github.com/williamtu/ovs-ebpf/
branch afxdp-v1

William Tu (3):
  afxdp: add ebpf code for afxdp and xskmap.
  netdev-linux: add new netdev type afxdp.
  tests: add afxdp test cases.

 acinclude.m4                    |   1 +
 bpf/api.h                       |   6 +
 bpf/helpers.h                   |   2 +
 bpf/maps.h                      |  12 +
 bpf/xdp.h                       |  34 +-
 lib/automake.mk                 |   3 +-
 lib/bpf.c                       |  41 ++-
 lib/bpf.h                       |   6 +-
 lib/dpif-netdev.c               |  74 +++-
 lib/if_xdp.h                    |  79 +++++
 lib/netdev-dummy.c              |   1 +
 lib/netdev-linux.c              | 741 +++++++++++++++++++++++++++++++++++++++-
 lib/netdev-provider.h           |   2 +
 lib/netdev-vport.c              |   4 +
 lib/netdev.c                    |  11 +
 lib/netdev.h                    |   1 +
 tests/automake.mk               |  17 +
 tests/ofproto-macros.at         |   1 +
 tests/system-afxdp-macros.at    | 148 ++++++++
 tests/system-afxdp-testsuite.at |  25 ++
 tests/system-afxdp-traffic.at   |  38 +++
 vswitchd/bridge.c               |   1 +
 22 files changed, 1228 insertions(+), 20 deletions(-)
 create mode 100644 lib/if_xdp.h
 create mode 100644 tests/system-afxdp-macros.at
 create mode 100644 tests/system-afxdp-testsuite.at
 create mode 100644 tests/system-afxdp-traffic.at

-- 
2.7.4


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1456): https://lists.iovisor.org/g/iovisor-dev/message/1456
Mute This Topic: https://lists.iovisor.org/mt/24652715/21656
Group Owner: iovisor-dev+ow...@lists.iovisor.org
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to