Hi Eelco, Thanks for all the feedbacks. There are some issues in driver, some in libbpf, and some in my implementation. I will work on it ASAP.
On Fri, May 17, 2019 at 3:23 AM Eelco Chaudron <echau...@redhat.com> wrote: > > Hi William, > > First a list of issues I found during some basic testing... > > - When I restart or stop OVS (using the systemctl interface as found in > RHEL) it does not clean up the BFP program causing the restart to fail: > > 2019-05-10T09:12:11.384Z|00042|netdev_afxdp|ERR|AF_XDP device eno1 > reconfig fails > 2019-05-10T09:12:11.384Z|00043|dpif_netdev|ERR|Failed to set > interface eno1 new configuration > > I need to manually run "ip link set dev eno1 xdp off" to make it > recover. I think this is a bug in libbpf, see [PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown > > > - When I remove a bridge, I get an emer in the revalidator: > > 2019-05-10T09:40:34.401Z|00045|netdev_afxdp|INFO|remove xdp program > > 2019-05-10T09:40:34.652Z|00001|util(revalidator49)|EMER|lib/poll-loop.c:111: > assertion !fd != !wevent failed in poll_create_node() > > Easy to replicate with this: > > $ ovs-vsctl add-br ovs_pvp_br0 -- set bridge ovs_pvp_br0 > datapath_type=netdev > $ ovs-vsctl add-port ovs_pvp_br0 eno1 -- set interface eno1 > type="afxdp" options:xdpmode=drv > $ ovs-vsctl del-br ovs_pvp_br0 > Thanks I can reproduce it. Will make sure to fix it . > > - High pmd usage on the statistics, even with no packets is this > expected? > > $ ovs-appctl dpif-netdev/pmd-rxq-show > pmd thread numa_id 0 core_id 1: > isolated : false > port: dpdk0 queue-id: 0 pmd usage: 0 % > port: eno1 queue-id: 0 pmd usage: 49 % > > It goes up slowly and gets stuck at 49% > > > - When doing the PVP testing I noticed that the physical port has odd/no > tx statistics: > > $ ovs-ofctl dump-ports ovs_pvp_br0 > OFPST_PORT reply (xid=0x2): 3 ports > port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, > crc=0 > tx pkts=0, bytes=0, drop=0, errs=0, coll=0 > port eno1: rx pkts=103256197, bytes=6195630508, drop=0, errs=0, > frame=0, over=0, crc=0 > tx pkts=0, bytes=19789272440056, drop=0, errs=0, coll=0 > port tapVM: rx pkts=4043, bytes=501278, drop=0, errs=0, frame=0, > over=0, crc=0 > tx pkts=4058, bytes=502504, drop=0, errs=0, coll=0 > I think the ixgbe driver has some issue. If you run skb-mode, I think the stats are correct. See patch [1/2] ixgbe: fix AF_XDP tx byte count > > - Packets larger than 1028 bytes are dropped. Guess this needs to be > fixed, and we need to state that jumbo frames are not supported. Are you > planning on adding this? > > Currently I can find not mentioning of MTU limitation in the > documentation, or any code to prevent it from being changed above the > supported limit. > > > - ovs-vswitchd is still crashing or stops forwarding packets when trying > to do > PVP testing with Qemu that has a TAP interface doing XDP and running > packets > at wire speed to the 10G interface. > > When trying with lower volume packets it seems to work, so with 1% > traffic > rate, it forwards packets without any problems (148,771 pps). If I go > to > 10% the first couple of packet pass, then it stops forwarding. If > it's not > crashing I still see packets being received by eno1 flow rules, but > no > packets make it to the VM. > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0, > batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654 > 654 ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, > (void **)elems_pop); > [Current thread is 1 (Thread 0x7fc95e734700 (LWP 3926))] > Missing separate debuginfos, use: dnf debuginfo-install > openvswitch2.11-2.11.0-0.20190129gitd3a10db.el8fdb.x86_64 > (gdb) bt > #0 0x00000000009b2505 in netdev_linux_afxdp_batch_send (xsk=0x0, > batch=batch@entry=0x7fc928005570) at lib/netdev-afxdp.c:654 > #1 0x00000000009a1850 in netdev_linux_send (netdev_=0x2f7f540, > qid=<optimized out>, batch=0x7fc928005570, concurrent_txq=<optimized > out>) at lib/netdev-linux.c:1486 > #2 0x0000000000906051 in netdev_send (netdev=<optimized out>, > qid=qid@entry=0, batch=batch@entry=0x7fc928005570, > concurrent_txq=concurrent_txq@entry=true) > at lib/netdev.c:797 > #3 0x00000000008d2c94 in dp_netdev_pmd_flush_output_on_port > (pmd=pmd@entry=0x7fc95e735010, p=p@entry=0x7fc928005540) at > lib/dpif-netdev.c:4185 > #4 0x00000000008d2faf in dp_netdev_pmd_flush_output_packets > (pmd=pmd@entry=0x7fc95e735010, force=force@entry=false) at > lib/dpif-netdev.c:4225 > #5 0x00000000008db317 in dp_netdev_pmd_flush_output_packets > (force=false, pmd=0x7fc95e735010) at lib/dpif-netdev.c:4280 > #6 dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fc95e735010, > rxq=0x2f36c50, port_no=1) at lib/dpif-netdev.c:4280 > #7 0x00000000008db67d in pmd_thread_main (f_=<optimized out>) at > lib/dpif-netdev.c:5446 > #8 0x000000000095c96d in ovsthread_wrapper (aux_=<optimized out>) > at lib/ovs-thread.c:352 > #9 0x00007fc9789d62de in start_thread () from > /lib64/libpthread.so.0 > #10 0x00007fc97817ba63 in clone () from /lib64/libc.so.6 > > > - make check-afxpd is failing for me, however, make check-kernel works > fine. > Did not dive into it too much, but it fails here for all test cases, > this is the same build I use for testing. > > ./system-afxdp-traffic.at:4: ovs-vsctl -- add-br br0 -- set Bridge > br0 datapath_type=netdev > protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 > fail-mode=secure -- > --- /dev/null 2019-05-16 09:09:33.445562692 -0400 > +++ > /root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/at-groups/1/stderr > 2019-05-17 > 05:46:20.506814939 -0400 > @@ -0,0 +1,2 @@ > +ovs-vsctl: Error detected while setting up 'br0'. See ovs-vswitchd > log for details. > +ovs-vsctl: The default log directory is > "/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01". > ovsdb-server.log: > > 2019-05-17T09:46:20.437Z|00001|vlog|INFO|opened log file > /root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/ovsdb-server.log > > 2019-05-17T09:46:20.441Z|00002|ovsdb_server|INFO|ovsdb-server (Open > vSwitch) 2.11.90 > ovs-vswitchd.log: > > 2019-05-17T09:46:20.461Z|00001|vlog|INFO|opened log file > /root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/ovs-vswitchd.log > > 2019-05-17T09:46:20.462Z|00002|ovs_numa|INFO|Discovered 28 CPU cores > on NUMA node 0 > > 2019-05-17T09:46:20.462Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes > and 28 CPU cores > > > 2019-05-17T09:46:20.462Z|00004|reconnect|INFO|unix:/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/db.sock: > connecting... > > > 2019-05-17T09:46:20.462Z|00005|reconnect|INFO|unix:/root/home/OVS_master_DPDK_v18.11/ovs_github/tests/system-afxdp-testsuite.dir/01/db.sock: > connected > > 2019-05-17T09:46:20.465Z|00006|bridge|INFO|ovs-vswitchd (Open > vSwitch) 2.11.90 > > 2019-05-17T09:46:20.505Z|00007|netdev_linux|WARN|ovs-netdev: > creating tap device failed: Device or resource busy I think the tap device is not cleared from previous settings. Or do #ip link del dev ovs-netdev then check again. > > 2019-05-17T09:46:20.508Z|00008|dpif|WARN|datapath ovs-netdev already > exists but cannot be opened: No such device > > 2019-05-17T09:46:20.508Z|00009|ofproto_dpif|ERR|failed to open > datapath of type netdev: No such device > > 2019-05-17T09:46:20.508Z|00010|ofproto|ERR|failed to open datapath > br0: No such device > > 2019-05-17T09:46:20.508Z|00011|bridge|ERR|failed to create bridge > br0: No such device > 1. system-afxdp-traffic.at:3: FAILED (system-afxdp-traffic.at:4) > > > > > The following might be useful when combining DPDK and AF_XDP: > > Currently, DPDK and AF_XDP polling can be combined on a single PMD > thread, it > might be nice to have an option to not do this, i.e. have separate > PMD > threads for each type. I know we can do this with assigning specific > PMDs to > queues, but this will disable auto-balancing. This will also help > later if > we would add poll() mode support for AF_XDP. > > > Other review comments see inline below. I reviewed the code, not the > unit tests or automake changes. > <snip> > > +.. note:: > > + OVS AF_XDP netdev is using the userspace datapath, the same > > datapath > > + as used by OVS-DPDK. So it requires --disable-system for > > ovs-vswitchd > > + and datapath_type=netdev when adding a new bridge. > > As mentioned earlier offline I think --disable-system can be removed as > the Kernel and userspace datapath can be run at the same time. > Yes, thanks > > + > > +Make sure your device driver support AF_XDP, and to use 1 PMD (on > > core 4) > > +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask, > > +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or > > "skb":: > > Wondering how options:xdpmode should operate without it being specified? > I would prefer that if the option is not specified it would try drv, and > if it fails fallback to skb. > by default it is using skb mode. I prefer skb-mode as default since it has less driver-related issues. > We need to add these new options to the vswitch.xml file OK > > > + > > + ethtool -L enp2s0 combined 1 > > + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10 > > + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" > > \ > > + options:n_rxq=1 options:xdpmode=drv \ > > + other_config:pmd-rxq-affinity="0:4" > > + > > +Or, use 4 pmds/cores and 4 queues by doing:: > > + > > + ethtool -L enp2s0 combined 4 > > + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 > > + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" > > \ > > + options:n_rxq=4 options:xdpmode=drv \ > > + other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4" > > + > > Add some text that pmd-rxq-affinity is not a requirement, the system > will auto (re)assign. > Also, note that cores used by pmd-rxq-affinity are not shared/used by > floating PMDs. Good point, thanks > > > +To validate that the bridge has successfully instantiated, you can > > use the:: > > + > > + ovs-vsctl show > > + > > +should show something like:: > > + > > + Port "ens802f0" > > + Interface "ens802f0" > > + type: afxdp > > + options: {n_rxq="1", xdpmode=drv} > > + > > +Otherwise, enable debug by:: > > + > > + ovs-appctl vlog/set netdev_afxdp::dbg > > + > > +References > > +---------- > > +Most of the design details are described in the paper presented at > > +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1], > > +section 4, and slides[2][4]. > > +"The Path to DPDK Speeds for AF XDP"[3] gives a very good > > introduction > > +about AF_XDP current and future work. > > + > > + > > +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf > > + > > +[2] > > http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf > > + > > +[3] > > http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf > > + > > +[4] > > https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp > > + > > + > > +Performance Tuning > > +------------------ > > +The name of the game is to keep your CPU running in userspace, > > allowing PMD > > +to keep polling the AF_XDP queues without any interferences from > > kernel. > > + > > +#. Make sure everything is in the same NUMA node (memory used by > > AF_XDP, pmd > > + running cores, device plug-in slot) > > How can you do this? The code is not taking care of NUMA, and memory is > allocated with posix_memalign so no idea which NUMA node it gets > allocated. right... I'm hoping that users can be aware of this and run ovs-vswitchd using taskset -p <cpu mask> So running the process on the correct NUMA node. > > > +#. Isolate your CPU by doing isolcpu at grub configure. > > + > > +#. IRQ should not set to pmd running core. > > + > > +#. The Spectre and Meltdown fixes increase the overhead of system > > calls. > > + > > Maybe be more consistent, either one or two newlines before a heading? OK <snip> > > + > > +Create OpenFlow rules:: > > + > > + ovs-vsctl add-port br0 tap0 > > Maybe add tap as XDP or else it will be an AF_PACKET interface polling > in the main thread. I think it should work (XDP on tap). Let me try. What's your concern here about polling AF_PACKET? > > > + ovs-ofctl del-flows br0 > > + ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0" > > + ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0" > > + > > + > > +Attach the veth port to br0 (linux kernel mode):: > > + > > + ovs-vsctl add-port br0 afxdp-p0 -- \ > > + set interface afxdp-p0 options:n_rxq=1 options:xdpmode=skb > > + > > Remove the xdpmode=skb above... Also, see above on the PF_PACKET > interface in the bridge_run(), > I would advise against using this, and you might want to remove it. OK > > > + > > +Or, use AF_XDP with skb mode:: > > + > > + ovs-vsctl add-port br0 afxdp-p0 -- \ > > + set interface afxdp-p0 type="afxdp" options:n_rxq=1 > > options:xdpmode=skb > > + > > +Setup the OpenFlow rules:: > > + <snip> > > diff --git a/lib/dp-packet.c b/lib/dp-packet.c > > index 0976a35e758b..7d086dc5e860 100644 > > --- a/lib/dp-packet.c > > +++ b/lib/dp-packet.c > > @@ -22,6 +22,9 @@ > > #include "netdev-dpdk.h" > > #include "openvswitch/dynamic-string.h" > > #include "util.h" > > +#ifdef HAVE_AF_XDP > > +#include "netdev-afxdp.h" > > +#endif > > Why the protection above? You do not do this in netdev-linux.c. > Maybe you should move the #ifdef HAVE_AF_XDP inside the include file? > OK will fix it > > static void > > dp_packet_init__(struct dp_packet *b, size_t allocated, enum > > dp_packet_source source) > > @@ -59,6 +62,27 @@ dp_packet_use(struct dp_packet *b, void *base, > > size_t allocated) > > dp_packet_use__(b, base, allocated, DPBUF_MALLOC); > > } > > <snip> > > --- /dev/null > > +++ b/lib/netdev-afxdp.c > > @@ -0,0 +1,727 @@ > > +/* > > + * Copyright (c) 2018, 2019 Nicira, Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, > > software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > implied. > > + * See the License for the specific language governing permissions > > and > > + * limitations under the License. > > + */ > > + > > +#if !defined(__i386__) && !defined(__x86_64__) > > +#error AF_XDP supported only for Linux on x86 or x86_64 > > Any reason why we do not support PPC and ARM? > > > +#endif > > + > > +#include <config.h> > > + > > +#include "netdev-linux-private.h" > > +#include "netdev-linux.h" > > Swap the two above, see comment in netdev-linux-private.h > > > +#include "netdev-afxdp.h" > > + > > +#include <arpa/inet.h> > > +#include <errno.h> > > +#include <fcntl.h> > > +#include <inttypes.h> > > +#include <linux/if_ether.h> > > +#include <linux/if_tun.h> > > +#include <linux/types.h> > > +#include <linux/ethtool.h> > > +#include <linux/mii.h> > > +#include <linux/rtnetlink.h> > > +#include <linux/sockios.h> > > +#include <linux/if_xdp.h> > > +#include <net/if.h> > > +#include <net/if_arp.h> > > +#include <net/route.h> > > +#include <netinet/in.h> > > +#include <netpacket/packet.h> > > +#include <poll.h> > > +#include <stdlib.h> > > +#include <string.h> > > +#include <sys/ioctl.h> > > +#include <sys/types.h> > > +#include <sys/socket.h> > > +#include <sys/utsname.h> > > +#include <unistd.h> > > + > > Some of these includes are included by netdev-linux(-private).h already > so why not remove them? OK will remove them. > > > +#include "coverage.h" > > +#include "dp-packet.h" > > +#include "dpif-netlink.h" > > +#include "dpif-netdev.h" > > +#include "fatal-signal.h" > > +#include "hash.h" > > +#include "netdev-provider.h" > > +#include "netdev-tc-offloads.h" > > +#include "netdev-vport.h" > > +#include "netlink-notifier.h" > > +#include "netlink-socket.h" > > +#include "netlink.h" > > +#include "netnsid.h" > > +#include "openflow/openflow.h" > > +#include "openvswitch/dynamic-string.h" > > +#include "openvswitch/hmap.h" > > +#include "openvswitch/ofpbuf.h" > > +#include "openvswitch/poll-loop.h" > > +#include "openvswitch/vlog.h" > > +#include "openvswitch/shash.h" > > +#include "ovs-atomic.h" > > +#include "packets.h" > > +#include "rtnetlink.h" > > +#include "socket-util.h" > > +#include "sset.h" > > +#include "tc.h" > > +#include "timer.h" > > +#include "unaligned.h" > > +#include "util.h" > > +#include "xdpsock.h" > > + > > +#ifndef SOL_XDP > > +#define SOL_XDP 283 > > +#endif > > +#ifndef AF_XDP > > +#define AF_XDP 44 > > +#endif > > +#ifndef PF_XDP > > +#define PF_XDP AF_XDP > > +#endif > > Do we really need to include the above? Or should we update the install > instruction to move them over from the kernel headers? > I think we only need SOL_XDP. > > + > > +VLOG_DEFINE_THIS_MODULE(netdev_afxdp); > > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > + > > +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char > > *)base)) > > +#define UMEM2XPKT(base, i) \ > > + ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base > > + \ > > + i * sizeof(struct dp_packet_afxdp)) > > + <snip> Some comments here about umem and queue are discussed later with Ilya. Will address them together. > > +int > > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk, > > + struct dp_packet_batch *batch) > > +{ > > See Ilya's comment on thread safety on the ring APIs. > > > + struct umem_elem *elems_pop[BATCH_SIZE]; > > + struct umem_elem *elems_push[BATCH_SIZE]; > > + uint32_t tx_done, idx_cq = 0; > > + struct dp_packet *packet; > > + uint32_t idx = 0; > > + int j, ret, retry_count = 0; > > + const int max_retry = 4; > > + > > + ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void > > **)elems_pop); > > + if (OVS_UNLIKELY(ret)) { > > + return EAGAIN; > > + } > > + > > + /* Make sure we have enough TX descs */ > > + ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx); > > + if (OVS_UNLIKELY(ret == 0)) { > > + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void > > **)elems_pop); > > + return EAGAIN; > > + } > > + > > + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > + struct umem_elem *elem; > > + uint64_t index; > > + > > + elem = elems_pop[i]; > > + /* Copy the packet to the umem we just pop from umem pool. > > + * We can avoid this copy if the packet and the pop umem > > + * are located in the same umem. > > + */ > > The comment mentions the copy can be avoided, but it's not implemented > in the code, is this correct or was something removed? > > > + memcpy(elem, dp_packet_data(packet), dp_packet_size(packet)); it's not implemented yet. now it's always making a copy > > + > > + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer); > > + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index; > > + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len > > + = dp_packet_size(packet); > > + } > > + xsk_ring_prod__submit(&xsk->tx, batch->count); > > + xsk->outstanding_tx += batch->count; > > + > > + ret = kick_tx(xsk); > > + if (OVS_UNLIKELY(ret)) { > > + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void > > **)elems_pop); > > + VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s", > > + ovs_strerror(ret)); > > + return ret; > > I think we should still try to recover the CQ below, even on failure. > > > + } > > + > > +retry: > > + /* Process CQ */ > > + tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, > > &idx_cq); > > + if (tx_done > 0) { > > + xsk->outstanding_tx -= tx_done; > > + xsk->tx_npkts += tx_done; > > + } > > + > > + /* Recycle back to umem pool */ > > + for (j = 0; j < tx_done; j++) { > > + struct umem_elem *elem; > > + uint64_t addr; > > + > > + addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++); > > + > > + elem = ALIGNED_CAST(struct umem_elem *, > > + (char *)xsk->umem->buffer + addr); > > + elems_push[j] = elem; > > + } > > + > > + ret = umem_elem_push_n(&xsk->umem->mpool, tx_done, (void > > **)elems_push); > > + ovs_assert(ret == 0); > > + > > + xsk_ring_cons__release(&xsk->umem->cq, tx_done); > > + > > + if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) > > { > > + /* If there are still a lot not transmitted, try harder. */ > > + if (retry_count++ > max_retry) { > > + return 0; > > + } > > + goto retry; > > + } > > + > > I think the code above is causing my lockup at wire speed mentioned > above... > I guess the retry_count expires every transmit sending packets to the > TAP interface. > No all buffers are used... This is causing the umem_elem_pop_n() in the > beginning to fail, hence the buffers are never returned! > > Guess we might need some reclaim in the beginning, or maybe even in the > rx loop? right, let me re-work this part of code. > > + return 0; > > +} > > diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h > > new file mode 100644 > > index 000000000000..6518d8fca0b5 > > --- /dev/null > > +++ b/lib/netdev-afxdp.h > > @@ -0,0 +1,53 @@ > > +/* > > + * Copyright (c) 2018 Nicira, Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, > > software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > implied. > > + * See the License for the specific language governing permissions > > and > > + * limitations under the License. > > + */ > > + > > +#ifndef NETDEV_AFXDP_H > > +#define NETDEV_AFXDP_H 1 > > + > > +#include <stdint.h> > > +#include <stdbool.h> > > + > > +/* These functions are Linux AF_XDP specific, so they should be used > > directly > > + * only by Linux-specific code. */ > > Extra enter? > > > +#define MAX_XSKQ 16 > > Extra enter? > OK > > +struct netdev; > > +struct xsk_socket_info; > > +struct xdp_umem; > > +struct dp_packet_batch; > > +struct smap; > > +struct dp_packet; > > + > > +struct dp_packet_afxdp * dp_packet_cast_afxdp(const struct dp_packet > > *d); > > + > > +int xsk_configure_all(struct netdev *netdev); > > + > > +void xsk_destroy_all(struct netdev *netdev); > > + > > +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk, > > + struct dp_packet_batch *batch); > > + > > +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk, > > + struct dp_packet_batch *batch); > > + > > +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap > > *args, > > + char **errp); > > +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap > > *args); > > +int netdev_afxdp_get_numa_id(const struct netdev *netdev); > > + > > +void free_afxdp_buf(struct dp_packet *p); > > +void free_afxdp_buf_batch(struct dp_packet_batch *batch); > > +int netdev_afxdp_reconfigure(struct netdev *netdev); > > +#endif /* netdev-afxdp.h */ > > diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h > > new file mode 100644 > > index 000000000000..3dd3d902b3c4 > > --- /dev/null > > +++ b/lib/netdev-linux-private.h > > @@ -0,0 +1,124 @@ > > +/* > > + * Copyright (c) 2019 Nicira, Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, > > software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > implied. > > + * See the License for the specific language governing permissions > > and > > + * limitations under the License. > > + */ > > + > > +#ifndef NETDEV_LINUX_PRIVATE_H > > +#define NETDEV_LINUX_PRIVATE_H 1 > > + > > +#include <config.h> > > + > > +#include <linux/filter.h> > > +#include <linux/gen_stats.h> > > +#include <linux/if_ether.h> > > +#include <linux/if_tun.h> > > +#include <linux/types.h> > > +#include <linux/ethtool.h> > > +#include <linux/mii.h> > > +#include <stdint.h> > > +#include <stdbool.h> > > + > > +#include "netdev-provider.h" > > +#include "netdev-tc-offloads.h" > > +#include "netdev-vport.h" > > +#include "openvswitch/thread.h" > > +#include "ovs-atomic.h" > > +#include "timer.h" > > Why include all the above? They where just added to netdev-linux.h, so > if you make sure you include netdev-lunux.h before -private it should > work out. > > > + > > +#if HAVE_AF_XDP > > +#include "netdev-afxdp.h" > > +#endif > > See earlier comment > > > + > > +/* These functions are Linux specific, so they should be used > > directly only by > > + * Linux-specific code. */ > > + > > +struct netdev; > > + > > +int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t > > flag, > > + const char *flag_name, bool > > enable); > > +int linux_get_ifindex(const char *netdev_name); > > + > > These functions are now both specified in netdev-linux.h and > netdev-linux-private.h > > > +#define LINUX_FLOW_OFFLOAD_API \ > > + .flow_flush = netdev_tc_flow_flush, \ > > + .flow_dump_create = netdev_tc_flow_dump_create, \ > > + .flow_dump_destroy = netdev_tc_flow_dump_destroy, \ > > + .flow_dump_next = netdev_tc_flow_dump_next, \ > > + .flow_put = netdev_tc_flow_put, \ > > + .flow_get = netdev_tc_flow_get, \ > > + .flow_del = netdev_tc_flow_del, \ > > + .init_flow_api = netdev_tc_init_flow_api > > + > > Same here, this define is in both include files. > > > +struct netdev_linux { > > + struct netdev up; > > + > > + /* Protects all members below. */ > > + struct ovs_mutex mutex; > > + > > + unsigned int cache_valid; > > + > > + bool miimon; /* Link status of last poll. */ > > + long long int miimon_interval; /* Miimon Poll rate. Disabled if > > <= 0. */ > > + struct timer miimon_timer; > > + > > + int netnsid; /* Network namespace ID. */ > > + /* The following are figured out "on demand" only. They are only > > valid > > + * when the corresponding VALID_* bit in 'cache_valid' is set. */ > > + int ifindex; > > + struct eth_addr etheraddr; > > + int mtu; > > + unsigned int ifi_flags; > > + long long int carrier_resets; > > + uint32_t kbits_rate; /* Policing data. */ > > + uint32_t kbits_burst; > > + int vport_stats_error; /* Cached error code from > > vport_get_stats(). > > + 0 or an errno value. */ > > + int netdev_mtu_error; /* Cached error code from SIOCGIFMTU > > + * or SIOCSIFMTU. > > + */ > > + int ether_addr_error; /* Cached error code from set/get > > etheraddr. */ > > + int netdev_policing_error; /* Cached error code from set > > policing. */ > > + int get_features_error; /* Cached error code from > > ETHTOOL_GSET. */ > > + int get_ifindex_error; /* Cached error code from > > SIOCGIFINDEX. */ > > + > > + enum netdev_features current; /* Cached from ETHTOOL_GSET. */ > > + enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */ > > + enum netdev_features supported; /* Cached from ETHTOOL_GSET. */ > > + > > + struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. > > */ > > + struct tc *tc; > > + > > + /* For devices of class netdev_tap_class only. */ > > + int tap_fd; > > + bool present; /* If the device is present in the > > namespace */ > > + uint64_t tx_dropped; /* tap device can drop if the iface > > is down */ > > + > > + /* LAG information. */ > > + bool is_lag_master; /* True if the netdev is a LAG > > master. */ > > + > > + /* AF_XDP information */ > > +#ifdef HAVE_AF_XDP > > + struct xsk_socket_info *xsk[MAX_XSKQ]; > > + int requested_n_rxq; > > + int xdpmode, requested_xdpmode; /* detect mode changed */ > > + int xdp_flags, xdp_bind_flags; > > +#endif > > +}; > > + > > +static struct netdev_linux * > > +netdev_linux_cast(const struct netdev *netdev) > > +{ > > In the original definition there was an assert() here, was it removed by > accident? > netdev_linux_rxq_xsk > > + return CONTAINER_OF(netdev, struct netdev_linux, up); > > +} > > + > > +#endif /* netdev-linux-private.h */ > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > index f75d73fd39f8..1f190406d145 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -17,6 +17,7 @@ > > #include <config.h> > > > > #include "netdev-linux.h" > > +#include "netdev-linux-private.h" > > > > #include <errno.h> > > #include <fcntl.h> > > @@ -54,6 +55,7 @@ > > #include "fatal-signal.h" > > #include "hash.h" > > #include "openvswitch/hmap.h" > > +#include "netdev-afxdp.h" > > #include "netdev-provider.h" > > #include "netdev-tc-offloads.h" > > #include "netdev-vport.h" > > @@ -487,51 +489,6 @@ static int tc_calc_cell_log(unsigned int mtu); > > static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int > > mtu); > > static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t > > burst_bytes); > > > > -struct netdev_linux { > > - struct netdev up; > > - > > - /* Protects all members below. */ > > - struct ovs_mutex mutex; > > - > > - unsigned int cache_valid; > > - > > - bool miimon; /* Link status of last poll. */ > > - long long int miimon_interval; /* Miimon Poll rate. Disabled if > > <= 0. */ > > - struct timer miimon_timer; > > - > > - int netnsid; /* Network namespace ID. */ > > - /* The following are figured out "on demand" only. They are only > > valid > > - * when the corresponding VALID_* bit in 'cache_valid' is set. */ > > - int ifindex; > > - struct eth_addr etheraddr; > > - int mtu; > > - unsigned int ifi_flags; > > - long long int carrier_resets; > > - uint32_t kbits_rate; /* Policing data. */ > > - uint32_t kbits_burst; > > - int vport_stats_error; /* Cached error code from > > vport_get_stats(). > > - 0 or an errno value. */ > > - int netdev_mtu_error; /* Cached error code from SIOCGIFMTU > > or SIOCSIFMTU. */ > > - int ether_addr_error; /* Cached error code from set/get > > etheraddr. */ > > - int netdev_policing_error; /* Cached error code from set > > policing. */ > > - int get_features_error; /* Cached error code from > > ETHTOOL_GSET. */ > > - int get_ifindex_error; /* Cached error code from > > SIOCGIFINDEX. */ > > - > > - enum netdev_features current; /* Cached from ETHTOOL_GSET. */ > > - enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */ > > - enum netdev_features supported; /* Cached from ETHTOOL_GSET. */ > > - > > - struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. > > */ > > - struct tc *tc; > > - > > - /* For devices of class netdev_tap_class only. */ > > - int tap_fd; > > - bool present; /* If the device is present in the > > namespace */ > > - uint64_t tx_dropped; /* tap device can drop if the iface > > is down */ > > - > > - /* LAG information. */ > > - bool is_lag_master; /* True if the netdev is a LAG > > master. */ > > -}; > > > > struct netdev_rxq_linux { > > struct netdev_rxq up; > > @@ -579,18 +536,23 @@ is_netdev_linux_class(const struct netdev_class > > *netdev_class) > > return netdev_class->run == netdev_linux_run; > > } > > > > +#if HAVE_AF_XDP > > static bool > > -is_tap_netdev(const struct netdev *netdev) > > +is_afxdp_netdev(const struct netdev *netdev) > > { > > - return netdev_get_class(netdev) == &netdev_tap_class; > > + return netdev_get_class(netdev) == &netdev_afxdp_class; > > } > > - > > -static struct netdev_linux * > > -netdev_linux_cast(const struct netdev *netdev) > > +#else > > +static bool > > +is_afxdp_netdev(const struct netdev *netdev OVS_UNUSED) > > { > > - ovs_assert(is_netdev_linux_class(netdev_get_class(netdev))); > > - > > - return CONTAINER_OF(netdev, struct netdev_linux, up); > > + return false; > > +} > > +#endif > > +static bool > > +is_tap_netdev(const struct netdev *netdev) > > +{ > > + return netdev_get_class(netdev) == &netdev_tap_class; > > } > > > > static struct netdev_rxq_linux * > > @@ -1084,6 +1046,11 @@ netdev_linux_destruct(struct netdev *netdev_) > > atomic_count_dec(&miimon_cnt); > > } > > > > +#if HAVE_AF_XDP > > + if (is_afxdp_netdev(netdev_)) { > > + xsk_destroy_all(netdev_); > > + } > > +#endif > > Think you can remove the HAVE_AF_XDP here, as you do not use it below > either. Yes > > > ovs_mutex_destroy(&netdev->mutex); > > } > > > > @@ -1113,7 +1080,7 @@ netdev_linux_rxq_construct(struct netdev_rxq > > *rxq_) > > rx->is_tap = is_tap_netdev(netdev_); > > if (rx->is_tap) { > > rx->fd = netdev->tap_fd; > > - } else { > > + } else if (!is_afxdp_netdev(netdev_)) { > > struct sockaddr_ll sll; > > int ifindex, val; > > /* Result of tcpdump -dd inbound */ > > @@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, > > struct dp_packet_batch *batch, > > { > > struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_); > > struct netdev *netdev = rx->up.netdev; > > - struct dp_packet *buffer; > > + struct dp_packet *buffer = NULL; > > ssize_t retval; > > int mtu; > > > > +#if HAVE_AF_XDP > > Think this #if HAVE_AF_XDP can be removed as the compiler should > optimize out the if (false). > > > + if (is_afxdp_netdev(netdev)) { > > + struct netdev_linux *dev = netdev_linux_cast(netdev); > > + int qid = rxq_->queue_id; > > + > > + return netdev_linux_rxq_xsk(dev->xsk[qid], batch); > > + } > > +#endif > > if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) { > > mtu = ETH_PAYLOAD_MAX; > > } > > @@ -1329,6 +1304,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, > > struct dp_packet_batch *batch, > > /* Assume Ethernet port. No need to set packet_type. */ > > buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, > > DP_NETDEV_HEADROOM); > > + > > retval = (rx->is_tap > > ? netdev_linux_rxq_recv_tap(rx->fd, buffer) > > : netdev_linux_rxq_recv_sock(rx->fd, buffer)); > > @@ -1480,7 +1456,8 @@ netdev_linux_send(struct netdev *netdev_, int > > qid OVS_UNUSED, > > int error = 0; > > int sock = 0; > > > > - if (!is_tap_netdev(netdev_)) { > > + if (!is_tap_netdev(netdev_) && > > + !is_afxdp_netdev(netdev_)) { > > if > > (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) { > > error = EOPNOTSUPP; > > goto free_batch; > > @@ -1499,6 +1476,36 @@ netdev_linux_send(struct netdev *netdev_, int > > qid OVS_UNUSED, > > } > > > > error = netdev_linux_sock_batch_send(sock, ifindex, batch); > > +#if HAVE_AF_XDP > > Same here remove the #if HAVE_AF_XDP > > > + } else if (is_afxdp_netdev(netdev_)) { > > + struct netdev_linux *dev = netdev_linux_cast(netdev_); > > + struct dp_packet_afxdp *xpacket; > > + struct umem_pool *first_mpool; > > + struct dp_packet *packet; > > + > > + error = netdev_linux_afxdp_batch_send(dev->xsk[qid], batch); > > + > > + /* all packets must come frome the same umem pool > > + * and has DPBUF_AFXDP type, otherwise free on-by-one > > + */ > > + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > + if (packet->source != DPBUF_AFXDP) { > > + goto free_batch; > > + } > > + > > + xpacket = dp_packet_cast_afxdp(packet); > > + if (i == 0) { > > + first_mpool = xpacket->mpool; > > + continue; > > + } > > + if (xpacket->mpool != first_mpool) { > > + goto free_batch; > > + } > > + } > > Why do not we not move all the packet type checks to > free_afxdp_buf_batch()? Here I plan to move them into the afxdp-specific send function. So it won't be part of netdev_linux_send(), hopefully it's more clean. > > > + /* free in batch */ > > + free_afxdp_buf_batch(batch); > > + return error; > > +#endif > > } else { > > error = netdev_linux_tap_batch_send(netdev_, batch); > > } > > @@ -3323,6 +3330,7 @@ const struct netdev_class netdev_linux_class = { > > NETDEV_LINUX_CLASS_COMMON, > > LINUX_FLOW_OFFLOAD_API, > > .type = "system", > > + .is_pmd = false, > > .construct = netdev_linux_construct, > > .get_stats = netdev_linux_get_stats, > > .get_features = netdev_linux_get_features, > > @@ -3333,6 +3341,7 @@ const struct netdev_class netdev_linux_class = { > > const struct netdev_class netdev_tap_class = { > > NETDEV_LINUX_CLASS_COMMON, > > .type = "tap", > > + .is_pmd = false, > > .construct = netdev_linux_construct_tap, > > .get_stats = netdev_tap_get_stats, > > .get_features = netdev_linux_get_features, > > @@ -3343,10 +3352,26 @@ const struct netdev_class > > netdev_internal_class = { > > NETDEV_LINUX_CLASS_COMMON, > > LINUX_FLOW_OFFLOAD_API, > > .type = "internal", > > + .is_pmd = false, > > .construct = netdev_linux_construct, > > .get_stats = netdev_internal_get_stats, > > .get_status = netdev_internal_get_status, > > }; > > + > > +#ifdef HAVE_AF_XDP > > +const struct netdev_class netdev_afxdp_class = { > > + NETDEV_LINUX_CLASS_COMMON, > > + .type = "afxdp", > > + .is_pmd = true, > > + .construct = netdev_linux_construct, > > + .get_stats = netdev_linux_get_stats, > > + .get_status = netdev_linux_get_status, > > + .set_config = netdev_afxdp_set_config, > > + .get_config = netdev_afxdp_get_config, > > + .reconfigure = netdev_afxdp_reconfigure, > > + .get_numa_id = netdev_afxdp_get_numa_id, > > +}; > > +#endif > > > > > > #define CODEL_N_QUEUES 0x0000 > > diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h > > index 17ca9120168a..b812e64cb078 100644 > > --- a/lib/netdev-linux.h > > +++ b/lib/netdev-linux.h > > @@ -19,6 +19,20 @@ > > > > #include <stdint.h> > > #include <stdbool.h> > > +#include <linux/filter.h> > > +#include <linux/gen_stats.h> > > +#include <linux/if_ether.h> > > +#include <linux/if_tun.h> > > +#include <linux/types.h> > > +#include <linux/ethtool.h> > > +#include <linux/mii.h> > > + > > +#include "netdev-provider.h" > > +#include "netdev-tc-offloads.h" > > +#include "netdev-vport.h" > > +#include "openvswitch/thread.h" > > +#include "ovs-atomic.h" > > +#include "timer.h" > > Is there a reason why you move all these includes here? If there is you > might as well remove the duplicates from .c files that include > netdev-linux.h, for example, netdev-linux.c <snip> Will work on this part later. > > +static inline void > > +ovs_spin_unlock(ovs_spinlock_t *sl) > > +{ > > + atomic_store_explicit(&sl->locked, 0, memory_order_release); > > +} > > + > > +static inline int OVS_UNUSED > > +ovs_spin_trylock(ovs_spinlock_t *sl) > > +{ > > + int exp = 0; > > + return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, > > 1, > > + memory_order_acquire, > > + memory_order_relaxed); > > +} > > Move spinlock function out to a common file > OK, I plan to add lib/spinlock.h and move to it. > > + > > +inline int > > +__umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs) > > +{ > > + void *ptr; > > + > > + if (OVS_UNLIKELY(umemp->index + n > umemp->size)) { > > This is a stack overflow > > > + return -ENOMEM; > > + } > > + > > + ptr = &umemp->array[umemp->index]; > > + memcpy(ptr, addrs, n * sizeof(void *)); > > + umemp->index += n; > > + > > + return 0; > > +} > > + > > +int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs) > > +{ > > + int ret; > > + > > + ovs_spin_lock(&umemp->mutex); > > + ret = __umem_elem_push_n(umemp, n, addrs); > > + ovs_spin_unlock(&umemp->mutex); > > + > > + return ret; > > +} > > + > > +inline void > > +__umem_elem_push(struct umem_pool *umemp, void *addr) > > +{ > > + umemp->array[umemp->index++] = addr; > > +} > > + > > +void > > +umem_elem_push(struct umem_pool *umemp, void *addr) > > +{ > > + > > + if (OVS_UNLIKELY(umemp->index >= umemp->size)) { > > + /* stack is overflow, this should not happen */ > > + OVS_NOT_REACHED(); > > + } > > Should this not be moved after the spinlock, i.e. to __umem_elem_push OK > > > + > > + ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0); > > + > > + ovs_spin_lock(&umemp->mutex); > > + __umem_elem_push(umemp, addr); > > + ovs_spin_unlock(&umemp->mutex); > > +} > > + > > +inline int > > +__umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs) > > +{ > > + void *ptr; > > + > > + if (OVS_UNLIKELY(umemp->index - n < 0)) { > > + return -ENOMEM; > > + } > > + > > + umemp->index -= n; > > + ptr = &umemp->array[umemp->index]; > > + memcpy(addrs, ptr, n * sizeof(void *)); > > + > > + return 0; > > +} > > + > > +int > > +umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs) > > +{ > > + int ret; > > + > > + ovs_spin_lock(&umemp->mutex); > > + ret = __umem_elem_pop_n(umemp, n, addrs); > > + ovs_spin_unlock(&umemp->mutex); > > + > > + return ret; > > +} > > + > > +inline void * > > +__umem_elem_pop(struct umem_pool *umemp) > > +{ > > There is no check here to see if there are actual any elements left, > like there is for pop_n, > so we could corrupt memory/umem_pool > Yes, I will add a check. > > + return umemp->array[--umemp->index]; > > +} > > + > > +void * > > +umem_elem_pop(struct umem_pool *umemp) > > +{ > > + void *ptr; > > + > > + ovs_spin_lock(&umemp->mutex); > > + ptr = __umem_elem_pop(umemp); > > + ovs_spin_unlock(&umemp->mutex); > > + > > + return ptr; > > +} > > + > > +void ** > > +__umem_pool_alloc(unsigned int size) > > +{ > > + void *bufs; > > + > > + ovs_assert(posix_memalign(&bufs, getpagesize(), > > + size * sizeof(void *)) == 0); > > We should not assert, just return NULL here. > > > + memset(bufs, 0, size * sizeof(void *)); > > + return (void **)bufs; > > +} > > + > > +unsigned int > > +umem_elem_count(struct umem_pool *mpool) > > +{ > > + return mpool->index; > > +} > > + > > +int > > +umem_pool_init(struct umem_pool *umemp, unsigned int size) > > +{ > > + umemp->array = __umem_pool_alloc(size); > > + if (!umemp->array) { > > + OVS_NOT_REACHED(); > > If NULL is returned return ENOMEM > > > + } > > + > > + umemp->size = size; > > + umemp->index = 0; > > + ovs_spinlock_init(&umemp->mutex); > > + return 0; > > +} > > + > > +void > > +umem_pool_cleanup(struct umem_pool *umemp) > > +{ > > + free(umemp->array); > umemp->array = NULL; > > +} > > + > > +/* AF_XDP metadata init/destroy */ > > +int > > +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size) > > +{ > > + void *bufs; > > + > > + /* TODO: check HAVE_POSIX_MEMALIGN */ > > Guess the above needs to be done > > > + ovs_assert(posix_memalign(&bufs, getpagesize(), > > + size * sizeof(struct dp_packet_afxdp)) > > == 0); > > We should not assert, just return false > > > + memset(bufs, 0, size * sizeof(struct dp_packet_afxdp)); > > + > > + xp->array = bufs; > > + xp->size = size; > > + return 0; > > +} > > + > > +void > > +xpacket_pool_cleanup(struct xpacket_pool *xp) > > +{ > > + free(xp->array); > xp->array = NULL; > > +} > > diff --git a/lib/xdpsock.h b/lib/xdpsock.h > > new file mode 100644 > > index 000000000000..aabaa8e5df24 > > --- /dev/null > > +++ b/lib/xdpsock.h > > @@ -0,0 +1,123 @@ > > +/* > > + * Copyright (c) 2018, 2019 Nicira, Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, > > software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > > implied. > > + * See the License for the specific language governing permissions > > and > > + * limitations under the License. > > + */ > > + > > +#ifndef XDPSOCK_H > > +#define XDPSOCK_H 1 > > + > > +#include <bpf/libbpf.h> > > +#include <bpf/xsk.h> > > +#include <errno.h> > > +#include <getopt.h> > > +#include <libgen.h> > > +#include <linux/bpf.h> > > +#include <linux/if_link.h> > > +#include <linux/if_xdp.h> > > +#include <linux/if_ether.h> > > +#include <locale.h> > > +#include <net/if.h> > > +#include <poll.h> > > +#include <pthread.h> > > +#include <signal.h> > > +#include <stdbool.h> > > +#include <stdio.h> > > +#include <stdlib.h> > > +#include <string.h> > > +#include <sys/resource.h> > > +#include <sys/socket.h> > > +#include <sys/types.h> > > +#include <sys/mman.h> > > +#include <time.h> > > +#include <unistd.h> > > + > > +#include "openvswitch/thread.h" > > +#include "ovs-atomic.h" > > + > > +#define FRAME_HEADROOM XDP_PACKET_HEADROOM > > +#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE > > +#define BATCH_SIZE NETDEV_MAX_BURST > > Move this item to the bottom, so you have FRAME specific define's first > > > +#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT > > +#define FRAME_SHIFT_MASK ((1 << FRAME_SHIFT) - 1) > > + > > +#define NUM_FRAMES 4096 > > Should we add a note/check to make sure this value is a power of 2? Sure, will do it. > > > +#define PROD_NUM_DESCS 512 > > +#define CONS_NUM_DESCS 512 > > + > > +#ifdef USE_XSK_DEFAULT > > +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS > > +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS > > +#endif > > Any reason for having this? Should we use the default values? They are > 4x larger than you have, did it make any difference in performance > results? > We could make it configurable like for DPDK, using the > n_txq_desc/n_rxq_desc option. Will add this option later. > > > + > > +typedef struct { > > + atomic_int locked; > > +} ovs_spinlock_t; > > + > > Think we should move the ovs_spinlock code and includes to some global > place, maybe util or thread ok, will move to lib/spinlock.h > > > +/* LIFO ptr_array */ > > +struct umem_pool { > > + int index; /* point to top */ > > + unsigned int size; > > + ovs_spinlock_t mutex; > > + void **array; /* a pointer array, point to umem buf */ > > +}; > > + > > +/* array-based dp_packet_afxdp */ > > +struct xpacket_pool { > > + unsigned int size; > > + struct dp_packet_afxdp **array; > > +}; > > + > > +struct xsk_umem_info { > > + struct umem_pool mpool; > > + struct xpacket_pool xpool; > > + struct xsk_ring_prod fq; > > + struct xsk_ring_cons cq; > > + struct xsk_umem *umem; > > + void *buffer; > > +}; > > + > > +struct xsk_socket_info { > > + struct xsk_ring_cons rx; > > + struct xsk_ring_prod tx; > > + struct xsk_umem_info *umem; > > + struct xsk_socket *xsk; > > + unsigned long rx_npkts; > > + unsigned long tx_npkts; > > + unsigned long prev_rx_npkts; > > + unsigned long prev_tx_npkts; > > + uint32_t outstanding_tx; > > +}; > > + > > +struct umem_elem { > > + struct umem_elem *next; > > +}; > > + > > +void __umem_elem_push(struct umem_pool *umemp, void *addr); > > +void umem_elem_push(struct umem_pool *umemp, void *addr); > > +int __umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs); > > +int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs); > > + > > +void *__umem_elem_pop(struct umem_pool *umemp); > > +void *umem_elem_pop(struct umem_pool *umemp); > > +int __umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs); > > +int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs); > > + > > +void **__umem_pool_alloc(unsigned int size); > > +int umem_pool_init(struct umem_pool *umemp, unsigned int size); > > +void umem_pool_cleanup(struct umem_pool *umemp); > > +unsigned int umem_elem_count(struct umem_pool *mpool); > > +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size); > > +void xpacket_pool_cleanup(struct xpacket_pool *xp); > > + > > Think all the __umem_* function are only used internally so they should > be come static and be removed here. > OK. Regards, William _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev