Thank you Darrell and Ilya! yes, I thought about wrapping pthread_spin_ using a generic API.
I'll review my patch with your tips and I'll send it again correctly, thanks again! Alessandro 2018-03-28 18:36 GMT+02:00 Darrell Ball <dlu...@gmail.com>: > I hit send too quick Alessandro; one clarification inline > > On Wed, Mar 28, 2018 at 9:13 AM, Darrell Ball <dlu...@gmail.com> wrote: > >> Another aspect (besides what Ilya mentioned) you might want to check is >> to look at OVS patchwork for your patches, >> after you submit, and check that they are there, firstly. >> Also check that they look like other accepted patches overall and for >> chunks of similar code constructs. >> >> https://patchwork.ozlabs.org/project/openvswitch/list/ >> >> Check that your patches can be applied on top of an updated master branch >> of OVS. >> >> I did a quick pass over the raw diff and noticed that in many cases you >> are already using lots of OVS apis which good. >> >> A few pointers: >> 1/ Try to use inline functions as much as possible, instead of macros >> 2/ Think about portability - Don't use direct calls to pthread_ apis for >> example >> > > I am specifically referring to the locking apis, like pthread_spin_ > > 3/ Create wrappers for new locks that use generic OVS lock apis >> 4/ Clearly describe any build dependencies, if any, in the install guide >> documentation. >> 5/ Think about portability for parts of the code and look how that is >> handled in other cases. >> 6/ I think it would be helpful for you to describe one or more use cases >> for netmap, for the general user. >> 7/ Think about testing and see what we can do to automate - we have >> system tests that run with >> make check-kmod and make check-system-userspace >> Existing files are tests/system-traffic.at and tests/system-ovn.at, >> which is shared for Linux and userspace datapath >> 8/ You might want to describe some tests results, including performance >> numbers in the cover letter. >> >> Cheers Darrell >> >> >> On Wed, Mar 28, 2018 at 1:50 AM, Alessandro Rosetti < >> alessandro.rose...@gmail.com> wrote: >> >>> Hi Darrell, Ilya and everyone else, >>> >>> I'm contacting you since you were interested. >>> I've posted the patch that implements netmap in OVS attaching the file >>> in the mail, did I do it wrong? >>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/345371.html >>> >>> I'm posting it inline now, >>> sorry for the mess! >>> >>> Alessandro. >>> >>> ---------------------------------------------------------------------- >>> >>> diff --git a/acinclude.m4 b/acinclude.m4 >>> index d61e37a5e..d9dd9fbd1 100644 >>> --- a/acinclude.m4 >>> +++ b/acinclude.m4 >>> @@ -341,6 +341,36 @@ AC_DEFUN([OVS_CHECK_DPDK], [ >>> AM_CONDITIONAL([DPDK_NETDEV], test "$DPDKLIB_FOUND" = true) >>> ]) >>> >>> +dnl OVS_CHECK_NETMAP >>> +dnl >>> +dnl Check netmap >>> +AC_DEFUN([OVS_CHECK_NETMAP], [ >>> + AC_ARG_WITH([netmap], >>> + [AC_HELP_STRING([--with-netmap], [Enable NETMAP])], >>> + [have_netmap=true]) >>> + AC_MSG_CHECKING([whether netmap datapath is enabled]) >>> + >>> + if test "$have_netmap" != true || test "$with_netmap" = no; then >>> + AC_MSG_RESULT([no]) >>> + else >>> + AC_MSG_RESULT([yes]) >>> + NETMAP_FOUND=false >>> + AC_LINK_IFELSE( >>> + [AC_LANG_PROGRAM([#include <net/if.h> >>> + #include<netinet/in.h> >>> + #include<net/netmap.h> >>> + #include<net/netmap_user.h>], [])], >>> + [NETMAP_FOUND=true]) >>> + if $NETMAP_FOUND; then >>> + AC_DEFINE([NETMAP_NETDEV], [1], [NETMAP datapath is enabled.]) >>> + else >>> + AC_MSG_ERROR([Could not find NETMAP headers]) >>> + fi >>> + fi >>> + >>> + AM_CONDITIONAL([NETMAP_NETDEV], test "$NETMAP_FOUND" = true) >>> +]) >>> + >>> dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH]) >>> dnl >>> dnl Greps FILE for REGEX. If it matches, runs IF-MATCH, otherwise >>> IF-NO-MATCH. >>> @@ -900,7 +930,7 @@ dnl with or without modifications, as long as this >>> notice is preserved. >>> >>> AC_DEFUN([_OVS_CHECK_CC_OPTION], [dnl >>> m4_define([ovs_cv_name], [ovs_cv_[]m4_translit([$1], [-= ], [__])])dnl >>> - AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name], >>> + AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name], >>> [ovs_save_CFLAGS="$CFLAGS" >>> dnl Include -Werror in the compiler options, because without >>> -Werror >>> dnl clang's GCC-compatible compiler driver does not return a >>> failure >>> @@ -951,7 +981,7 @@ dnl OVS_ENABLE_OPTION([OPTION]) >>> dnl Check whether the given C compiler OPTION is accepted. >>> dnl If so, add it to WARNING_FLAGS. >>> dnl Example: OVS_ENABLE_OPTION([-Wdeclaration-after-statement]) >>> -AC_DEFUN([OVS_ENABLE_OPTION], >>> +AC_DEFUN([OVS_ENABLE_OPTION], >>> [OVS_CHECK_CC_OPTION([$1], [WARNING_FLAGS="$WARNING_FLAGS $1"]) >>> AC_SUBST([WARNING_FLAGS])]) >>> >>> diff --git a/configure.ac b/configure.ac >>> index 9940a1a45..24cd4718c 100644 >>> --- a/configure.ac >>> +++ b/configure.ac >>> @@ -180,6 +180,7 @@ AC_SUBST(KARCH) >>> OVS_CHECK_LINUX >>> OVS_CHECK_LINUX_TC >>> OVS_CHECK_DPDK >>> +OVS_CHECK_NETMAP >>> OVS_CHECK_PRAGMA_MESSAGE >>> AC_SUBST([OVS_CFLAGS]) >>> AC_SUBST([OVS_LDFLAGS]) >>> diff --git a/lib/automake.mk b/lib/automake.mk >>> index 5c26e0f33..4ccd9e22a 100644 >>> --- a/lib/automake.mk >>> +++ b/lib/automake.mk >>> @@ -134,12 +134,14 @@ lib_libopenvswitch_la_SOURCES = \ >>> lib/namemap.c \ >>> lib/netdev-dpdk.h \ >>> lib/netdev-dummy.c \ >>> + lib/netdev-netmap.h \ >>> lib/netdev-provider.h \ >>> lib/netdev-vport.c \ >>> lib/netdev-vport.h \ >>> lib/netdev-vport-private.h \ >>> lib/netdev.c \ >>> lib/netdev.h \ >>> + lib/netmap.h \ >>> lib/netflow.h \ >>> lib/netlink.c \ >>> lib/netlink.h \ >>> @@ -403,6 +405,15 @@ lib_libopenvswitch_la_SOURCES += \ >>> lib/dpdk-stub.c >>> endif >>> >>> +if NETMAP_NETDEV >>> +lib_libopenvswitch_la_SOURCES += \ >>> + lib/netmap.c \ >>> + lib/netdev-netmap.c >>> +else >>> +lib_libopenvswitch_la_SOURCES += \ >>> + lib/netmap-stub.c >>> +endif >>> + >>> if WIN32 >>> lib_libopenvswitch_la_SOURCES += \ >>> lib/dpif-netlink.c \ >>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c >>> index 443c22504..e917e6d6a 100644 >>> --- a/lib/dp-packet.c >>> +++ b/lib/dp-packet.c >>> @@ -92,6 +92,7 @@ dp_packet_use_const(struct dp_packet *b, const void >>> *data, size_t size) >>> dp_packet_set_size(b, size); >>> } >>> >>> + >>> /* Initializes 'b' as an empty dp_packet that contains the 'allocated' >>> bytes. >>> * DPDK allocated dp_packet and *data is allocated from one continous >>> memory >>> * region as part of memory pool, so in memory data start right after >>> @@ -105,6 +106,19 @@ dp_packet_init_dpdk(struct dp_packet *b, size_t >>> allocated) >>> b->source = DPBUF_DPDK; >>> } >>> >>> +/* Initializes 'b' as a dp_packet whose data points to a netmap buffer >>> of size >>> + * 'size' bytes. */ >>> +#ifdef NETMAP_NETDEV >>> +void >>> +dp_packet_init_netmap(struct dp_packet *b, void *data, size_t size) >>> +{ >>> + b->source = DPBUF_NETMAP; >>> + dp_packet_set_base(b, data); >>> + dp_packet_set_data(b, data); >>> + dp_packet_set_size(b, size); >>> +} >>> +#endif >>> + >>> /* Initializes 'b' as an empty dp_packet with an initial capacity of >>> 'size' >>> * bytes. */ >>> void >>> @@ -125,6 +139,11 @@ dp_packet_uninit(struct dp_packet *b) >>> /* If this dp_packet was allocated by DPDK it must have been >>> * created as a dp_packet */ >>> free_dpdk_buf((struct dp_packet*) b); >>> +#endif >>> + } else if (b->source == DPBUF_NETMAP) { >>> +#ifdef NETMAP_NETDEV >>> + /* If this dp_packet was allocated by NETMAP, release it. */ >>> + netmap_free_packet(b); >>> #endif >>> } >>> } >>> @@ -241,6 +260,9 @@ dp_packet_resize__(struct dp_packet *b, size_t >>> new_headroom, size_t new_tailroom >>> case DPBUF_DPDK: >>> OVS_NOT_REACHED(); >>> >>> + case DPBUF_NETMAP: >>> + OVS_NOT_REACHED(); >>> + >>> case DPBUF_MALLOC: >>> if (new_headroom == dp_packet_headroom(b)) { >>> new_base = xrealloc(dp_packet_base(b), new_allocated); >>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h >>> index 21c8ca525..bd7832533 100644 >>> --- a/lib/dp-packet.h >>> +++ b/lib/dp-packet.h >>> @@ -26,6 +26,7 @@ >>> #endif >>> >>> #include "netdev-dpdk.h" >>> +#include "netdev-netmap.h" >>> #include "openvswitch/list.h" >>> #include "packets.h" >>> #include "util.h" >>> @@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source { >>> DPBUF_DPDK, /* buffer data is from DPDK allocated >>> memory. >>> * ref to dp_packet_init_dpdk() in >>> dp-packet.c. >>> */ >>> + DPBUF_NETMAP, /* Buffers are from netmap allocated >>> memory. */ >>> }; >>> >>> #define DP_PACKET_CONTEXT_SIZE 64 >>> @@ -60,6 +62,9 @@ struct dp_packet { >>> uint32_t size_; /* Number of bytes in use. */ >>> uint32_t rss_hash; /* Packet hash. */ >>> bool rss_hash_valid; /* Is the 'rss_hash' valid? */ >>> +#endif >>> +#ifdef NETMAP_NETDEV >>> + uint32_t buf_idx; /* Netmap slot index. */ >>> #endif >>> enum dp_packet_source source; /* Source of memory allocated as >>> 'base'. */ >>> >>> @@ -115,6 +120,7 @@ void dp_packet_use_stub(struct dp_packet *, void *, >>> size_t); >>> void dp_packet_use_const(struct dp_packet *, const void *, size_t); >>> >>> void dp_packet_init_dpdk(struct dp_packet *, size_t allocated); >>> +void dp_packet_init_netmap(struct dp_packet *, void *, size_t); >>> >>> void dp_packet_init(struct dp_packet *, size_t); >>> void dp_packet_uninit(struct dp_packet *); >>> @@ -173,6 +179,13 @@ dp_packet_delete(struct dp_packet *b) >>> * created as a dp_packet */ >>> free_dpdk_buf((struct dp_packet*) b); >>> return; >>> + } else if (b->source == DPBUF_NETMAP) { >>> + /* It was allocated by a netdev_netmap, it will be marked >>> + * for reuse. */ >>> +#ifdef NETMAP_NETDEV >>> + netmap_free_packet(b); >>> +#endif >>> + return; >>> } >>> >>> dp_packet_uninit(b); >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index b07fc6b8b..af81c992b 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -4119,11 +4119,14 @@ reload: >>> >>> /* List port/core affinity */ >>> for (i = 0; i < poll_cnt; i++) { >>> - VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", >>> - pmd->core_id, netdev_rxq_get_name(poll_list[ >>> i].rxq->rx), >>> - netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); >>> - /* Reset the rxq current cycles counter. */ >>> - dp_netdev_rxq_set_cycles(poll_list[i].rxq, >>> RXQ_CYCLES_PROC_CURR, 0); >>> + VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n", >>> + pmd->core_id, netdev_rxq_get_name(poll_list[ >>> i].rxq->rx), >>> + netdev_rxq_get_queue_id(poll_list[i].rxq->rx)); >>> + /* Reset the rxq current cycles counter. */ >>> + dp_netdev_rxq_set_cycles(poll_list[i].rxq, >>> RXQ_CYCLES_PROC_CURR, 0); >>> +#ifdef NETMAP_NETDEV >>> + netmap_init_port(poll_list[i].rxq->rx); >>> +#endif >>> } >>> >>> if (!poll_cnt) { >>> diff --git a/lib/netdev-netmap.c b/lib/netdev-netmap.c >>> new file mode 100644 >>> index 000000000..87b292895 >>> --- /dev/null >>> +++ b/lib/netdev-netmap.c >>> @@ -0,0 +1,1014 @@ >>> +#include <config.h> >>> + >>> +#include <errno.h> >>> +#include <math.h> >>> +#include <net/if.h> >>> +#include <netinet/in.h> >>> +#include <net/netmap.h> >>> +#define NETMAP_WITH_LIBS >>> +#include <net/netmap_user.h> >>> +#include <sys/ioctl.h> >>> +#include <sys/syscall.h> >>> + >>> +#include "dpif.h" >>> +#include "netdev.h" >>> +#include "netdev-provider.h" >>> +#include "netmap.h" >>> +#include "netdev-netmap.h" >>> +#include "openvswitch/list.h" >>> +#include "openvswitch/poll-loop.h" >>> +#include "openvswitch/vlog.h" >>> +#include "ovs-thread.h" >>> +#include "packets.h" >>> +#include "smap.h" >>> + >>> +#define DP_BLOCK_SIZE NETDEV_MAX_BURST * 2 >>> +#define DEFAULT_RSYNC_INTVAL 5 >>> + >>> +VLOG_DEFINE_THIS_MODULE(netdev_netmap); >>> + >>> +static struct vlog_rate_limit rl OVS_UNUSED = VLOG_RATE_LIMIT_INIT(5, >>> 100); >>> + >>> +struct netdev_netmap { >>> + struct netdev up; >>> + struct nm_desc *nmd; >>> + >>> + uint64_t timestamp; >>> + uint32_t rxsync_intval; >>> + >>> + struct ovs_list list_node; >>> + long tid; >>> + struct nm_alloc *nma; >>> + >>> + struct ovs_mutex mutex OVS_ACQ_AFTER(netmap_mutex); >>> + pthread_spinlock_t tx_lock; >>> + >>> + struct netdev_stats stats; >>> + struct eth_addr hwaddr; >>> + enum netdev_flags flags; >>> + >>> + int mtu; >>> + int requested_mtu; >>> +}; >>> + >>> +struct netdev_rxq_netmap { >>> + struct netdev_rxq up; >>> +}; >>> + >>> +static void netdev_netmap_destruct(struct netdev *netdev); >>> + >>> +static bool >>> +is_netmap_class(const struct netdev_class *class) >>> +{ >>> + return class->destruct == netdev_netmap_destruct; >>> +} >>> + >>> +static struct netdev_netmap * >>> +netdev_netmap_cast(const struct netdev *netdev) >>> +{ >>> + ovs_assert(is_netmap_class(netdev_get_class(netdev))); >>> + return CONTAINER_OF(netdev, struct netdev_netmap, up); >>> +} >>> + >>> +static struct netdev_rxq_netmap * >>> +netdev_rxq_netmap_cast(const struct netdev_rxq *rx) >>> +{ >>> + ovs_assert(is_netmap_class(netdev_get_class(rx->netdev))); >>> + return CONTAINER_OF(rx, struct netdev_rxq_netmap, up); >>> +} >>> + >>> +static struct ovs_mutex netmap_mutex = OVS_MUTEX_INITIALIZER; >>> + >>> +/* Blocks are used to store DP_BLOCK_SIZE preallocated netmap >>> dp_packets. >>> + * During receive operation, dp_packets are allocated by moving them >>> from a >>> + * block to a dp_batch. A block is refilled when packets are freed. >>> + * Each netmap dp_packet has source type set to DPBUF_NETMAP, with >>> buf_idx >>> + * identifying a netmap buffer. Packets in the blocks (or in flight >>> within OVS) >>> + * are not attached to any netmap ring, i.e. their buf_idx is not >>> stored in >>> + * any netmap slot. On receive or transmit, the netmap buffer owned by a >>> + * dp_packet is swapped with one attached to a receive/transmit ring >>> slot, >>> + * by simply swapping the buf_idx values. */ >>> +struct nm_block { >>> + struct ovs_list node; /* Blocks can be chained >>> + * in a list. */ >>> + struct dp_packet* packets[DP_BLOCK_SIZE]; /* Array of dp_packets. */ >>> + uint16_t idx; /* Array index of the >>> current >>> + * packet. */ >>> +}; >>> + >>> +enum nm_block_type { >>> + NM_BLOCK_TYPE_PUT = 0, >>> + NM_BLOCK_TYPE_GET = 1, >>> +}; >>> + >>> +/* Global data structures of the netmap dp_packet allocator. */ >>> +static struct nm_runtime { >>> + struct ovs_list port_list; /* List of all netmap netdevs. */ >>> + struct ovs_list block_list[2]; /* Lists for dp_packet blocks: one >>> for >>> + * empty and one for full ones. */ >>> + void *mem; >>> + uint16_t memid; >>> + uint32_t memsize; >>> + uint32_t nextrabufs; >>> +} nmr = { 0 }; >>> + >>> +/* Each thread uses a pair of blocks for allocations and deallocations. >>> */ >>> +struct nm_alloc { >>> + struct nm_block *block[2]; /* Blocks used by TX/RX to >>> allocate/dealloacte >>> + * dp_packets. */ >>> +}; >>> + >>> +/* Thread local allocators for packet allocations/dellocations */ >>> +DEFINE_STATIC_PER_THREAD_DATA(struct nm_alloc, nma, { 0 }); >>> +#define NMA nma_get() >>> +#define PUTB nma_get()->block[NM_BLOCK_TYPE_PUT] >>> +#define GETB nma_get()->block[NM_BLOCK_TYPE_GET] >>> + >>> +/* Creates a new block. >>> + * The block can be empty or initialized with new dp_packets associated >>> to >>> + * netmap buffers not attached to a netmap ring. */ >>> +static struct nm_block* >>> +nm_block_new(struct nm_desc *nmd) { >>> + struct nm_block *block; >>> + >>> + block = xmalloc(sizeof(struct nm_block)); >>> + block->idx = 0; >>> + ovs_list_init(&block->node); >>> + >>> + if (nmd) { >>> + struct dp_packet *packet; >>> + struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0); >>> + uint32_t idx = nmd->nifp->ni_bufs_head; >>> + >>> + for (int i = 0; idx && i < DP_BLOCK_SIZE; >>> + i++, idx = *(uint32_t *)NETMAP_BUF(ring, idx)) { >>> + packet = dp_packet_new(0); >>> + packet->buf_idx = idx; >>> + packet->source = DPBUF_NETMAP; >>> + block->packets[block->idx++] = packet; >>> + } >>> + >>> + nmd->nifp->ni_bufs_head = idx; >>> + } >>> + >>> + return block; >>> +} >>> + >>> +/* Swaps blocks from nm_runtime in order to replace the current block >>> with >>> + * an empty or full block. >>> + * if we want GETB to be swapped with a block filled with dp_packets we >>> will >>> + * speciry NM_BLOCK_TYPE_GET. >>> + * if we want PUTB to be swapped with a block filled with dp_packets we >>> will >>> + * speciry NM_BLOCK_TYPE_PUT. */ >>> +static void >>> +nm_block_swap_global(enum nm_block_type type) { >>> + struct nm_block **bselect = NULL; >>> + struct nm_block *bswap = NULL, *btmp; >>> + >>> + ovs_mutex_lock(&netmap_mutex); >>> + >>> + bselect = &(NMA->block[type]); >>> + >>> + /* Try to pop a block form the correct list */ >>> + if (!ovs_list_is_empty(&nmr.block_list[type])) { >>> + bswap = CONTAINER_OF(ovs_list_pop_front(&nmr.block_list[type]), >>> + struct nm_block, node); >>> + } else { >>> + bswap = nm_block_new(NULL); >>> + } >>> + >>> + /* Swap blocks. */ >>> + if (OVS_LIKELY(bswap)) { >>> + btmp = *bselect; >>> + *bselect = bswap; >>> + /* If the current block is empty it will be pushed to the empty >>> list >>> + * and viceversa if it not empty. */ >>> + type = btmp->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT; >>> + ovs_list_push_back(&nmr.block_list[type], &btmp->node); >>> + } >>> + >>> + ovs_mutex_unlock(&netmap_mutex); >>> +} >>> + >>> +/* Swap the two blocks of the local allocator. */ >>> +static void >>> +nm_block_swap_local(void) { >>> + struct nm_block* block = GETB; >>> + GETB = PUTB; >>> + PUTB = block; >>> +} >>> + >>> +/* Frees a block from memory. >>> + * If nmd is specified we will return extra buffers to this >>> + * nm_desc if the block contains any dp_packet. */ >>> +static void >>> +nm_block_free(struct nm_block* b, struct nm_desc *nmd) { >>> + if (b) { >>> + if (nmd) { >>> + struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0); >>> + >>> + for (int i = 0; i < b->idx; i++) { >>> + struct dp_packet *packet = b->packets[i]; >>> + if (packet) { >>> + uint32_t *e = (uint32_t *) NETMAP_BUF(ring, >>> packet->buf_idx); >>> + *e = nmd->nifp->ni_bufs_head; >>> + nmd->nifp->ni_bufs_head = packet->buf_idx; >>> + free(packet); >>> + } >>> + } >>> + } >>> + >>> + free(b); >>> + } >>> +} >>> + >>> +/* Set up the port by checking if any other port has already been >>> opened. >>> + * Prepare blocks of dp_packets. */ >>> +static int >>> +netmap_setup_port(struct nm_desc *nmd) { >>> + ovs_mutex_lock(&netmap_mutex); >>> + >>> + if (ovs_list_size(&nmr.port_list)) { >>> + /* Netmap memory has already been set up, check if the new port >>> uses >>> + * the same memid */ >>> + if (nmr.memid != nmd->req.nr_arg2) { >>> + VLOG_WARN("unable to add this port, it has a new mem_id >>> (%x->%x)", >>> + nmr.memid, nmd->req.nr_arg2); >>> + ovs_mutex_unlock(&netmap_mutex); >>> + return 1; >>> + } >>> + } else { >>> + /* We are initializing the first Netmap port: setup Netmap >>> memory >>> + * to this process. */ >>> + nmr.memid = nmd->req.nr_arg2; >>> + nmr.memsize = nmd->req.nr_memsize; >>> + nmr.mem = mmap(0, nmr.memsize, PROT_WRITE | PROT_READ, >>> + MAP_SHARED, nmd->fd, 0); >>> + >>> + if (nmr.mem == MAP_FAILED) { >>> + VLOG_WARN("mmap has failed!"); >>> + ovs_mutex_unlock(&netmap_mutex); >>> + return 1; >>> + } >>> + } >>> + >>> + /* Now we can set up the following nmd fields */ >>> + { >>> + struct netmap_if *nifp; >>> + >>> + nmd->memsize = nmr.memsize; >>> + nmd->mem = nmr.mem; >>> + nifp = NETMAP_IF(nmd->mem, nmd->req.nr_offset); >>> + *(struct netmap_if **)(uintptr_t)&(nmd->nifp) = nifp; >>> + } >>> + >>> + /* Allocate a number of blocks containing dp_packets. The total >>> number >>> + * of extrabuffers to be used is multiple of the blocksize */ >>> + uint32_t nextrabufs = nmd->req.nr_arg3 & ~(DP_BLOCK_SIZE-1); >>> + struct nm_block *block; >>> + for (int i = 0 ; i < (nextrabufs/DP_BLOCK_SIZE); i++) { >>> + block = nm_block_new(nmd); >>> + ovs_list_push_back(&nmr.block_list[NM_BLOCK_TYPE_GET], >>> &block->node); >>> + } >>> + >>> + ovs_mutex_unlock(&netmap_mutex); >>> + >>> + return 0; >>> +} >>> + >>> +/* This function initializes some variables and has to be called in the >>> pmd >>> + * thread reload. >>> + * Thanks to this we can initialize thread local blocks and recognize >>> + * if there are other ports using our thread-id. */ >>> +void >>> +netmap_init_port(struct netdev_rxq *rxq) { >>> + >>> + ovs_mutex_lock(&netmap_mutex); >>> + >>> + if(is_netmap_class(netdev_get_class(rxq->netdev))) { >>> + struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev); >>> + dev->tid = syscall(SYS_gettid); >>> + dev->nma = NMA; >>> + } >>> + >>> + /* We need to initialize new blocks in the local allocator */ >>> + if (!GETB) { >>> + GETB = nm_block_new(NULL); >>> + } >>> + >>> + if (!PUTB) { >>> + PUTB = nm_block_new(NULL); >>> + } >>> + >>> + ovs_mutex_unlock(&netmap_mutex); >>> +} >>> + >>> +/* This function is called upon dp_packet deallocation. The pointer is >>> not >>> + * dellocated but saved in a nm_block that has free space. */ >>> +void >>> +netmap_free_packet(struct dp_packet* packet) { >>> + struct nm_block* block = PUTB; >>> + >>> + if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) { >>> + block = GETB; >>> + if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) { >>> + nm_block_swap_global(NM_BLOCK_TYPE_PUT); >>> + block = PUTB; >>> + } >>> + } >>> + >>> + block->packets[block->idx++] = packet; >>> +} >>> + >>> +/* Allocate 'n' dp_packets to the batch. This operation might require >>> + * multiple memcpy operations. If no thread local nm_block has data we >>> need >>> + * to ask for a new block to the nm_runtime. */ >>> +static int >>> +netmap_alloc_packets(struct dp_packet_batch* b, size_t n) { >>> + struct nm_block* block; >>> + size_t step, tot = 0, s; >>> + >>> + for (step = 0; step < 3; step++) { >>> + block = GETB; >>> + s = MIN(n, block->idx); >>> + memcpy(&b->packets[tot], &block->packets[block->idx - s], >>> + s * sizeof(struct dp_packet*)); >>> + block->idx -= s; >>> + tot += s; >>> + n -= s; >>> + >>> + if (n == 0) { >>> + break; >>> + } else if (OVS_LIKELY(step == 0)) { >>> + nm_block_swap_local(); >>> + } else { >>> + nm_block_swap_global(NM_BLOCK_TYPE_GET); >>> + } >>> + } >>> + >>> + return tot; >>> +} >>> + >>> +/* Set up some values from the configuration. */ >>> +void >>> +netmap_init_config(const struct smap *ovs_other_config) { >>> + nmr.nextrabufs = (uint32_t) >>> + smap_get_int(ovs_other_config, "netmap-nextrabufs", >>> DP_BLOCK_SIZE); >>> + >>> + nmr.nextrabufs &= ~(DP_BLOCK_SIZE-1); >>> + >>> + VLOG_INFO("nextrabufs: %d", nmr.nextrabufs); >>> +} >>> + >>> +static struct netdev_rxq * >>> +netdev_netmap_rxq_alloc(void) >>> +{ >>> + struct netdev_rxq_netmap *rx = xzalloc(sizeof *rx); >>> + return &rx->up; >>> +} >>> + >>> +static int >>> +netdev_netmap_rxq_construct(struct netdev_rxq *rxq OVS_UNUSED) >>> +{ >>> + /* Nothing to do here */ >>> + return 0; >>> +} >>> + >>> +static void >>> +netdev_netmap_rxq_destruct(struct netdev_rxq *rxq OVS_UNUSED) >>> +{ >>> + /* Nothing to do here */ >>> + return; >>> +} >>> + >>> +static void >>> +netdev_netmap_rxq_dealloc(struct netdev_rxq *rxq) >>> +{ >>> + struct netdev_rxq_netmap *rx = netdev_rxq_netmap_cast(rxq); >>> + free(rx); >>> +} >>> + >>> +static struct netdev * >>> +netdev_netmap_alloc(void) >>> +{ >>> + struct netdev_netmap *dev; >>> + >>> + dev = (struct netdev_netmap *) xzalloc(sizeof *dev); >>> + if (dev) { >>> + return &dev->up; >>> + } >>> + >>> + return NULL; >>> +} >>> + >>> +static int >>> +netdev_netmap_construct(struct netdev *netdev) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + const char *ifname = netdev_get_name(netdev); >>> + >>> + struct nmreq req; >>> + memset(&req, 0 , sizeof(req)); >>> + req.nr_arg3 = nmr.nextrabufs; >>> + >>> + /* Open Netmap port requesting a number of extrabuffers. We also >>> avoid to >>> + * mmap netmap memory here. */ >>> + dev->nmd = nm_open(ifname, &req, NM_OPEN_NO_MMAP, NULL); >>> + >>> + if (!dev->nmd) { >>> + if (!errno) { >>> + VLOG_WARN("opening port \"%s\" failed: not a netmap port", >>> ifname); >>> + } else { >>> + VLOG_WARN("opening port \"%s\" failed: %s", ifname, >>> + ovs_strerror(errno)); >>> + } >>> + return EINVAL; >>> + } else { >>> + VLOG_INFO("opening port \"%s\"", ifname); >>> + } >>> + >>> + /* Check if we have enough extra buffers to create a nm_block. */ >>> + if (dev->nmd->req.nr_arg3 < DP_BLOCK_SIZE) { >>> + VLOG_WARN("not enough extra buffers(%d/%d), closing port", >>> + dev->nmd->req.nr_arg3, DP_BLOCK_SIZE); >>> + nm_close(dev->nmd); >>> + return EINVAL; >>> + } >>> + >>> + /* Possibly mmap netmap memory, initialize the nm_desc, nm_runtime. >>> + * Allocate some nm_blocks using the extrabuffers given to this >>> port. */ >>> + if (netmap_setup_port(dev->nmd)) { >>> + VLOG_WARN("could not setup \"%s\" port", ifname); >>> + nm_close(dev->nmd); >>> + return EINVAL; >>> + } >>> + >>> + ovs_list_init(&dev->list_node); >>> + ovs_mutex_lock(&netmap_mutex); >>> + ovs_list_push_front(&nmr.port_list, &dev->list_node); >>> + ovs_mutex_unlock(&netmap_mutex); >>> + >>> + ovs_mutex_init(&dev->mutex); >>> + pthread_spin_init(&dev->tx_lock, PTHREAD_PROCESS_SHARED); >>> + eth_addr_random(&dev->hwaddr); >>> + dev->flags = NETDEV_UP | NETDEV_PROMISC; >>> + dev->timestamp = netmap_rdtsc(); >>> + dev->rxsync_intval = DEFAULT_RSYNC_INTVAL; >>> + dev->requested_mtu = NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size; >>> + netdev_request_reconfigure(netdev); >>> + >>> + return 0; >>> +} >>> + >>> +static void >>> +netdev_netmap_destruct(struct netdev *netdev) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + struct nm_block* b; >>> + >>> + ovs_mutex_lock(&netmap_mutex); >>> + VLOG_INFO("closing port \"%s\"", (const char*) >>> netdev_get_name(netdev)); >>> + >>> + ovs_list_remove(&dev->list_node); >>> + >>> + /* A netmap netdev is being removed. >>> + * If this is the last netmap port we remove all blocks. */ >>> + if (!ovs_list_size(&nmr.port_list)) { >>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_PUT]) >>> { >>> + nm_block_free(b, dev->nmd); >>> + } >>> + >>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET]) >>> { >>> + nm_block_free(b, dev->nmd); >>> + } >>> + } else { >>> + struct netdev_netmap *d; >>> + enum nm_block_type type; >>> + int last_thread_port = true; >>> + >>> + /* Check if there are other netmap ports using the same thread >>> id. */ >>> + LIST_FOR_EACH(d, list_node, &nmr.port_list) { >>> + if (dev->tid == d->tid) { >>> + last_thread_port = false; >>> + break; >>> + } >>> + } >>> + >>> + /* If there are no ports using this thread id we return thread >>> local >>> + * blocks to the global allocator nm_runtime. */ >>> + if (last_thread_port) { >>> + b = dev->nma->block[NM_BLOCK_TYPE_PUT]; >>> + type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT; >>> + ovs_list_push_front(&nmr.block_list[type], &b->node); >>> + dev->nma->block[NM_BLOCK_TYPE_PUT] = NULL; >>> + >>> + b = dev->nma->block[NM_BLOCK_TYPE_GET]; >>> + type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT; >>> + ovs_list_push_front(&nmr.block_list[type], &b->node); >>> + dev->nma->block[NM_BLOCK_TYPE_GET] = NULL; >>> + } >>> + >>> + /* We will now try to free a number of blocks equal to the >>> blocks >>> + * allocated when the port was created. >>> + * Each block is then freed returning the extra bufs to the >>> nm_desc. */ >>> + int nblocks = nmr.nextrabufs / DP_BLOCK_SIZE; >>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET]) >>> { >>> + nm_block_free(b, dev->nmd); >>> + if (!--nblocks) { >>> + break; >>> + } >>> + } >>> + >>> + if (!ovs_list_is_empty(&nmr.block_list[NM_BLOCK_TYPE_PUT])) { >>> + struct ovs_list *list_node = ovs_list_pop_front( >>> + &nmr.block_list[NM_BLOCK_TYPE_ >>> PUT]); >>> + b = CONTAINER_OF(list_node, struct nm_block, node); >>> + nm_block_free(b, dev->nmd); >>> + } >>> + } >>> + >>> + ovs_mutex_unlock(&netmap_mutex); >>> + >>> + /* Now we can close the port. */ >>> + nm_close(dev->nmd); >>> +} >>> + >>> +static void >>> +netdev_netmap_dealloc(struct netdev *netdev) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_destroy(&dev->mutex); >>> + pthread_spin_destroy(&dev->tx_lock); >>> + >>> + free(dev); >>> +} >>> + >>> +static int >>> +netdev_netmap_class_init(void) >>> +{ >>> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; >>> + >>> + if (ovsthread_once_start(&once)) { >>> + ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_PUT]); >>> + ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_GET]); >>> + ovs_list_init(&nmr.port_list); >>> + ovsthread_once_done(&once); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_reconfigure(struct netdev *netdev) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + int err = 0; >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + >>> + if (dev->mtu == dev->requested_mtu) { >>> + /* Reconfiguration is unnecessary */ >>> + goto out; >>> + } >>> + >>> + dev->mtu = dev->requested_mtu; >>> + netdev_change_seq_changed(netdev); >>> + >>> +out: >>> + ovs_mutex_unlock(&dev->mutex); >>> + return err; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_config(const struct netdev *netdev, struct smap >>> *args) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + smap_add_format(args, "mtu", "%d", dev->mtu); >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_set_config(struct netdev *netdev, const struct smap >>> *args, >>> + char **errp OVS_UNUSED) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + dev->rxsync_intval = smap_get_int(args, "rxsync-intval", >>> + DEFAULT_RSYNC_INTVAL); >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static inline void >>> +netmap_rxsync(struct netdev_netmap *dev) >>> +{ >>> + uint64_t now = netmap_rdtsc(); >>> + unsigned int diff = TSC2US(now - dev->timestamp); >>> + >>> + if (diff < dev->rxsync_intval) { >>> + /* skipping rxsync */ >>> + return; >>> + } >>> + >>> + ioctl(dev->nmd->fd, NIOCRXSYNC, NULL); >>> + >>> + /* update current timestamp */ >>> + dev->timestamp = now; >>> +} >>> + >>> +static inline void >>> +netmap_swap_slot(struct dp_packet *packet, struct netmap_slot *s) { >>> + uint32_t idx; >>> + >>> + idx = s->buf_idx; >>> + s->buf_idx = packet->buf_idx; >>> + s->flags |= NS_BUF_CHANGED; >>> + packet->buf_idx = idx; >>> +} >>> + >>> +static int >>> +netdev_netmap_send(struct netdev *netdev, int qid OVS_UNUSED, >>> + struct dp_packet_batch *batch, bool concurrent_txq) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + struct nm_desc *nmd = dev->nmd; >>> + uint16_t r, nrings = dev->nmd->nifp->ni_tx_rings; >>> + uint32_t budget = batch->count, count = 0; >>> + bool again = false; >>> + >>> + if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { >>> + dp_packet_delete_batch(batch, true); >>> + return 0; >>> + } >>> + >>> + if (OVS_UNLIKELY(concurrent_txq)) { >>> + pthread_spin_lock(&dev->tx_lock); >>> + } >>> + >>> +try_again: >>> + for (r = 0; r < nrings; r++) { >>> + struct netmap_ring *ring; >>> + uint32_t head, space; >>> + >>> + ring = NETMAP_TXRING(nmd->nifp, nmd->cur_tx_ring); >>> + space = nm_ring_space(ring); /* Available slots in this ring. */ >>> + head = ring->head; >>> + >>> + if (space > budget) { >>> + space = budget; >>> + } >>> + budget -= space; >>> + >>> + /* Transmit as much as possible in this ring. */ >>> + while (space--) { >>> + struct netmap_slot *ts = &ring->slot[head]; >>> + struct dp_packet *packet = batch->packets[count++]; >>> + >>> + ts->len = dp_packet_get_send_len(packet); >>> + >>> + if (OVS_UNLIKELY(packet->source != DPBUF_NETMAP)) { >>> + /* send packet copying data to the netmap slot */ >>> + memcpy(NETMAP_BUF(ring, ts->buf_idx), >>> + dp_packet_data(packet), ts->len); >>> + } else { >>> + /* send packet using zerocopy */ >>> + netmap_swap_slot(packet, ts); >>> + } >>> + >>> + head = nm_ring_next(ring, head); >>> + } >>> + >>> + ring->head = ring->cur = head; >>> + >>> + /* We may have exhausted the budget */ >>> + if (OVS_LIKELY(!budget)) { >>> + break; >>> + } >>> + >>> + /* We still have packets to send, select next ring. */ >>> + if (OVS_UNLIKELY(++dev->nmd->cur_tx_ring == nrings)) { >>> + nmd->cur_tx_ring = 0; >>> + } >>> + } >>> + >>> + ioctl(dev->nmd->fd, NIOCTXSYNC, NULL); >>> + >>> + if (OVS_UNLIKELY(!count && !again)) { >>> + again = true; >>> + goto try_again; >>> + } >>> + >>> + dp_packet_delete_batch(batch, true); >>> + >>> + if (OVS_UNLIKELY(concurrent_txq)) { >>> + pthread_spin_unlock(&dev->tx_lock); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch >>> *batch) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev); >>> + struct nm_desc *nmd = dev->nmd; >>> + uint16_t r, nrings = nmd->nifp->ni_rx_rings; >>> + uint32_t budget = 0; >>> + >>> + if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { >>> + return EAGAIN; >>> + } >>> + >>> + /* check how much we can receive */ >>> + for (r = nmd->first_rx_ring; r < nrings; r++) { >>> + budget += nm_ring_space(NETMAP_RXRING(nmd->nifp, r)); >>> + } >>> + >>> + /* sync if there is no packet */ >>> + if (budget == 0) { >>> + netmap_rxsync(dev); >>> + return EAGAIN; >>> + } >>> + >>> + /* allocate the batch */ >>> + budget = netmap_alloc_packets(batch, MIN(budget, NETDEV_MAX_BURST)); >>> + >>> + for (r = 0; r < nrings; r++) { >>> + struct netmap_ring *ring; >>> + uint32_t head, space; >>> + >>> + ring = NETMAP_RXRING(nmd->nifp, nmd->cur_rx_ring); >>> + head = ring->head; >>> + space = nm_ring_space(ring); >>> + >>> + if (space > budget) { >>> + space = budget; >>> + } >>> + budget -= space; >>> + >>> + /* Receive as much as possible from this ring. */ >>> + while (space--) { >>> + struct netmap_slot *rs = &ring->slot[head]; >>> + struct dp_packet *packet = batch->packets[batch->count++]; >>> + dp_packet_init_netmap(packet, NETMAP_BUF(ring, rs->buf_idx), >>> + rs->len); >>> + /* receiving from a netmap port we can always zero copy >>> here. */ >>> + netmap_swap_slot(packet, rs); >>> + head = nm_ring_next(ring, head); >>> + } >>> + >>> + ring->cur = ring->head = head; >>> + >>> + /* check if the batch has been filled. */ >>> + if (!budget) { >>> + break; >>> + } >>> + >>> + /* batch isn't full, try to receive on other rings. */ >>> + if (OVS_UNLIKELY(++nmd->cur_rx_ring == nrings)) { >>> + nmd->cur_rx_ring = 0; >>> + } >>> + } >>> + >>> + dp_packet_batch_init_packet_fields(batch); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_ifindex(const struct netdev *netdev) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + /* Calculate hash from the netdev name. Ensure that ifindex is a >>> 24-bit >>> + * postive integer to meet RFC 2863 recommendations. >>> + */ >>> + int ifindex = hash_string(netdev->name, 0) % 0xfffffe + 1; >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return ifindex; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_mtu(const struct netdev *netdev, int *mtu) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + *mtu = dev->mtu; >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_set_mtu(struct netdev *netdev, int mtu) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + if (mtu > NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size >>> + || mtu < ETH_HEADER_LEN) { >>> + VLOG_WARN("%s: unsupported MTU %d\n", dev->up.name, mtu); >>> + return EINVAL; >>> + } >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + if (dev->requested_mtu != mtu) { >>> + dev->requested_mtu = mtu; >>> + netdev_request_reconfigure(netdev); >>> + } >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_set_etheraddr(struct netdev *netdev, const struct >>> eth_addr mac) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + dev->hwaddr = mac; >>> + netdev_change_seq_changed(netdev); >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_etheraddr(const struct netdev *netdev, struct >>> eth_addr *mac) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + *mac = dev->hwaddr; >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_update_flags(struct netdev *netdev, >>> + enum netdev_flags off, enum netdev_flags on, >>> + enum netdev_flags *old_flagsp) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + >>> + if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { >>> + return EINVAL; >>> + } >>> + >>> + *old_flagsp = dev->flags; >>> + dev->flags |= on; >>> + dev->flags &= ~off; >>> + >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_carrier(const struct netdev *netdev, bool *carrier) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + *carrier = true; >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_stats(const struct netdev *netdev, struct >>> netdev_stats *stats) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + stats->tx_packets = dev->stats.tx_packets; >>> + stats->tx_bytes = dev->stats.tx_bytes; >>> + stats->rx_packets = dev->stats.rx_packets; >>> + stats->rx_bytes = dev->stats.rx_bytes; >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +static int >>> +netdev_netmap_get_status(const struct netdev *netdev, struct smap >>> *args) >>> +{ >>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev); >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + smap_add_format(args, "mtu", "%d", dev->mtu); >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + return 0; >>> +} >>> + >>> +#define NETDEV_NETMAP_CLASS(NAME, PMD, INIT, CONSTRUCT, DESTRUCT, >>> SET_CONFIG, \ >>> + SET_TX_MULTIQ, SEND, SEND_WAIT, GET_CARRIER, GET_STATS, >>> GET_FEATURES, \ >>> + GET_STATUS, RECONFIGURE, RXQ_RECV, RXQ_WAIT) \ >>> +{ \ >>> + NAME, \ >>> + PMD, /* is_pmd */ \ >>> + INIT, /* init */ \ >>> + NULL, /* netdev_netmap_run */ \ >>> + NULL, /* netdev_netmap_wait */ \ >>> + netdev_netmap_alloc, \ >>> + CONSTRUCT, \ >>> + DESTRUCT, \ >>> + netdev_netmap_dealloc, \ >>> + netdev_netmap_get_config, \ >>> + SET_CONFIG, \ >>> + NULL, /* get_tunnel_config */ \ >>> + NULL, /* build header */ \ >>> + NULL, /* push header */ \ >>> + NULL, /* pop header */ \ >>> + NULL, /* get numa id */ \ >>> + SET_TX_MULTIQ, /* tx multiq */ \ >>> + SEND, /* send */ \ >>> + SEND_WAIT, \ >>> + netdev_netmap_set_etheraddr, \ >>> + netdev_netmap_get_etheraddr, \ >>> + netdev_netmap_get_mtu, \ >>> + netdev_netmap_set_mtu, \ >>> + netdev_netmap_get_ifindex, \ >>> + GET_CARRIER, \ >>> + NULL, /* get_carrier_resets */ \ >>> + NULL, /* get_miimon */ \ >>> + GET_STATS, \ >>> + NULL, /* get_custom_stats */ \ >>> + \ >>> + NULL, /* get_features */ \ >>> + NULL, /* set_advertisements */ \ >>> + NULL, /* get_pt_mode */ \ >>> + \ >>> + NULL, /* set_policing */ \ >>> + NULL, /* get_qos_types */ \ >>> + NULL, /* get_qos_capabilities */ \ >>> + NULL, /* get_qos */ \ >>> + NULL, /* set_qos */ \ >>> + NULL, /* get_queue */ \ >>> + NULL, /* set_queue */ \ >>> + NULL, /* delete_queue */ \ >>> + NULL, /* get_queue_stats */ \ >>> + NULL, /* queue_dump_start */ \ >>> + NULL, /* queue_dump_next */ \ >>> + NULL, /* queue_dump_done */ \ >>> + NULL, /* dump_queue_stats */ \ >>> + \ >>> + NULL, /* set_in4 */ \ >>> + NULL, /* get_addr_list */ \ >>> + NULL, /* add_router */ \ >>> + NULL, /* get_next_hop */ \ >>> + GET_STATUS, \ >>> + NULL, /* arp_lookup */ \ >>> + \ >>> + netdev_netmap_update_flags, \ >>> + RECONFIGURE, \ >>> + \ >>> + netdev_netmap_rxq_alloc, \ >>> + netdev_netmap_rxq_construct, \ >>> + netdev_netmap_rxq_destruct, \ >>> + netdev_netmap_rxq_dealloc, \ >>> + RXQ_RECV, \ >>> + RXQ_WAIT, \ >>> + NULL, /* rxq_drain */ \ >>> + NO_OFFLOAD_API \ >>> +} >>> + >>> +static const struct netdev_class netmap_class = >>> + NETDEV_NETMAP_CLASS( >>> + "netmap", >>> + true, >>> + netdev_netmap_class_init, >>> + netdev_netmap_construct, >>> + netdev_netmap_destruct, >>> + netdev_netmap_set_config, >>> + NULL, >>> + netdev_netmap_send, >>> + NULL, >>> + netdev_netmap_get_carrier, >>> + netdev_netmap_get_stats, >>> + NULL, >>> + netdev_netmap_get_status, >>> + netdev_netmap_reconfigure, >>> + netdev_netmap_rxq_recv, >>> + NULL); >>> + >>> +void >>> +netdev_netmap_register(void) >>> +{ >>> + netdev_register_provider(&netmap_class); >>> +} >>> diff --git a/lib/netdev-netmap.h b/lib/netdev-netmap.h >>> new file mode 100644 >>> index 000000000..49fe8c319 >>> --- /dev/null >>> +++ b/lib/netdev-netmap.h >>> @@ -0,0 +1,13 @@ >>> +#ifndef NETDEV_NETMAP_H >>> +#define NETDEV_NETMAP_H >>> + >>> +struct netdev_rxq; >>> +struct smap; >>> +struct dp_packet; >>> + >>> +void netmap_init_port(struct netdev_rxq *); >>> +void netmap_init_config(const struct smap *); >>> +void netmap_free_packet(struct dp_packet *); >>> +void netdev_netmap_register(void); >>> + >>> +#endif /* netdev-netmap.h */ >>> diff --git a/lib/netmap-stub.c b/lib/netmap-stub.c >>> new file mode 100644 >>> index 000000000..62f7a06b8 >>> --- /dev/null >>> +++ b/lib/netmap-stub.c >>> @@ -0,0 +1,21 @@ >>> +#include <config.h> >>> +#include "netmap.h" >>> + >>> +#include "smap.h" >>> +#include "ovs-thread.h" >>> +#include "openvswitch/vlog.h" >>> + >>> +VLOG_DEFINE_THIS_MODULE(netmap); >>> + >>> +void >>> +netmap_init(const struct smap *ovs_other_config) >>> +{ >>> + if (smap_get_bool(ovs_other_config, "netmap-init", false)) { >>> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; >>> + >>> + if (ovsthread_once_start(&once)) { >>> + VLOG_ERR("NETMAP not supported in this copy of Open >>> vSwitch."); >>> + ovsthread_once_done(&once); >>> + } >>> + } >>> +} >>> diff --git a/lib/netmap.c b/lib/netmap.c >>> new file mode 100644 >>> index 000000000..b4147e0ad >>> --- /dev/null >>> +++ b/lib/netmap.c >>> @@ -0,0 +1,76 @@ >>> +#include <config.h> >>> + >>> +#include <fcntl.h> >>> +#include <pthread.h> >>> +#include <stdio.h> >>> +#include <sys/time.h> /* timersub */ >>> +#include <stdlib.h> >>> +#include <string.h> >>> +#include <stdint.h> >>> +#include <unistd.h> /* read() */ >>> + >>> +#include "dirs.h" >>> +#include "netdev-netmap.h" >>> +#include "netmap.h" >>> +#include "openvswitch/vlog.h" >>> +#include "smap.h" >>> + >>> +VLOG_DEFINE_THIS_MODULE(netmap); >>> + >>> +/* initialize to avoid a division by 0 */ >>> +uint64_t netmap_ticks_per_second = 1000000000; /* set by calibrate_tsc >>> */ >>> + >>> +/* >>> + * do an idle loop to compute the clock speed. We expect >>> + * a constant TSC rate and locked on all CPUs. >>> + * Returns ticks per second >>> + */ >>> +static uint64_t >>> +netmap_calibrate_tsc(void) >>> +{ >>> + struct timeval a, b; >>> + uint64_t ta_0, ta_1, tb_0, tb_1, dmax = ~0; >>> + uint64_t da, db, cy = 0; >>> + int i; >>> + for (i=0; i < 3; i++) { >>> + ta_0 = netmap_rdtsc(); >>> + gettimeofday(&a, NULL); >>> + ta_1 = netmap_rdtsc(); >>> + usleep(20000); >>> + tb_0 = netmap_rdtsc(); >>> + gettimeofday(&b, NULL); >>> + tb_1 = netmap_rdtsc(); >>> + da = ta_1 - ta_0; >>> + db = tb_1 - tb_0; >>> + if (da + db < dmax) { >>> + cy = (b.tv_sec - a.tv_sec)*1000000 + b.tv_usec - a.tv_usec; >>> + cy = (double)(tb_0 - ta_1)*1000000/(double)cy; >>> + dmax = da + db; >>> + } >>> + } >>> + netmap_ticks_per_second = cy; >>> + return cy; >>> +} >>> + >>> +void >>> +netmap_init(const struct smap *ovs_other_config) >>> +{ >>> + static bool enabled = false; >>> + >>> + if (enabled || !ovs_other_config) { >>> + return; >>> + } >>> + >>> + if (smap_get_bool(ovs_other_config, "netmap-init", false)) { >>> + static struct ovsthread_once once_enable = >>> OVSTHREAD_ONCE_INITIALIZER; >>> + if (ovsthread_once_start(&once_enable)) { >>> + netmap_calibrate_tsc(); >>> + netmap_init_config(ovs_other_config); >>> + netdev_netmap_register(); >>> + enabled = true; >>> + ovsthread_once_done(&once_enable); >>> + VLOG_INFO("NETMAP Enabled"); >>> + } >>> + } else >>> + VLOG_INFO_ONCE("NETMAP Disabled - Use other_config:netmap-init >>> to enable"); >>> +} >>> diff --git a/lib/netmap.h b/lib/netmap.h >>> new file mode 100644 >>> index 000000000..34ff7b7a2 >>> --- /dev/null >>> +++ b/lib/netmap.h >>> @@ -0,0 +1,27 @@ >>> +#ifndef NETMAP_H >>> +#define NETMAP_H >>> + >>> +#include <stdint.h> >>> + >>> +extern uint64_t netmap_ticks_per_second; >>> +#define US2TSC(x) ((x)*netmap_ticks_per_second/1000000UL) >>> +#define TSC2US(x) ((x)*1000000UL/netmap_ticks_per_second) >>> + >>> +#if 0 /* gcc intrinsic */ >>> +#include <x86intrin.h> >>> +#define rdtsc __rdtsc >>> +#else >>> +static inline uint64_t >>> +netmap_rdtsc(void) >>> +{ >>> + uint32_t hi, lo; >>> + __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); >>> + return (uint64_t)lo | ((uint64_t)hi << 32); >>> +} >>> +#endif >>> + >>> +struct smap; >>> + >>> +void netmap_init(const struct smap *ovs_other_config); >>> + >>> +#endif /* netmap.h */ >>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c >>> index d90997e3a..2dfcbb7f6 100644 >>> --- a/vswitchd/bridge.c >>> +++ b/vswitchd/bridge.c >>> @@ -38,6 +38,7 @@ >>> #include "mac-learning.h" >>> #include "mcast-snooping.h" >>> #include "netdev.h" >>> +#include "netmap.h" >>> #include "nx-match.h" >>> #include "ofproto/bond.h" >>> #include "ofproto/ofproto.h" >>> @@ -2977,6 +2978,7 @@ bridge_run(void) >>> if (cfg) { >>> netdev_set_flow_api_enabled(&cfg->other_config); >>> dpdk_init(&cfg->other_config); >>> + netmap_init(&cfg->other_config); >>> } >>> >>> /* Initialize the ofproto library. This only needs to run once, but >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml >>> index f899a1976..f6dd6e7b6 100644 >>> --- a/vswitchd/vswitch.xml >>> +++ b/vswitchd/vswitch.xml >>> @@ -217,6 +217,46 @@ >>> </p> >>> </column> >>> >>> + <column name="other_config" key="netmap-init" >>> + type='{"type": "boolean"}'> >>> + <p> >>> + Set this value to <code>true</code> to enable runtime support >>> for >>> + NETMAP ports. The vswitch must have compile-time support for >>> NETMAP as >>> + well. >>> + </p> >>> + <p> >>> + The default value is <code>false</code>. Changing this value >>> requires >>> + restarting the daemon >>> + </p> >>> + <p> >>> + If this value is <code>false</code> at startup, any netmap >>> ports which >>> + are configured in the bridge will fail. >>> + </p> >>> + </column> >>> + >>> + <column name="other_config" key="netmap-nextrabufs" >>> + type='{"type": "integer", "minInteger": 32}'> >>> + <p> >>> + Specifies the number of extra buffers to be requested to >>> netmap >>> + when opening each netmap port. >>> + </p> >>> + <p> >>> + Each packet received or transmitted by OVS from/to a netmap >>> port >>> + needs an extra buffer. The OVS netmap runtime needs at >>> least a >>> + batch worth of extra buffers (32 packets) for each port to >>> function >>> + properly. More extra buffers may be necessary if OVS >>> temporarily >>> + stores netmap buffers within its internal queues. >>> + </p> >>> + </column> >>> + >>> + <column name="other_config" key="rxsync-intval" >>> + type='{"type": "integer", "minInteger": 0}'> >>> + <p> >>> + Specifies the minimum time (in microseconds) between two >>> + consecutive rxsync calls issued on a netmap port. >>> + </p> >>> + </column> >>> + >>> <column name="other_config" key="dpdk-init" >>> type='{"type": "boolean"}'> >>> <p> >>> >>> >>> 2018-03-20 15:07 GMT+01:00 Alessandro Rosetti < >>> alessandro.rose...@gmail.com>: >>> >>>> Hi Darrell, >>>> >>>> I'm developing netmap support for my thesis and I hope it will make it >>>> for OVS 2.10. >>>> In the next days I'm going to post the first prototype patch that is >>>> almost ready >>>> >>>> Thanks to you, >>>> Alessandro >>>> >>>> On 19 Mar 2018 9:26 pm, "Darrell Ball" <dlu...@gmail.com> wrote: >>>> >>>>> Hi Alessandro >>>>> >>>>> I also think this would be interesting. >>>>> Is netmap integration being actively being worked on for OVS 2.10 ? >>>>> >>>>> Thanks Darrell >>>>> >>>>> On Wed, Feb 7, 2018 at 9:19 AM, Ilya Maximets <i.maxim...@samsung.com> >>>>> wrote: >>>>> >>>>>> > Hi, >>>>>> >>>>>> Hi, Alessandro. >>>>>> >>>>>> > >>>>>> > My name is Alessandro Rosetti, and I'm currently adding netmap >>>>>> support to >>>>>> > ovs, following an approach similar to DPDK. >>>>>> >>>>>> Good to know that someone started to work on this. IMHO, it's a good >>>>>> idea. >>>>>> I also wanted to try to implement this someday, but had no much time. >>>>>> >>>>>> > >>>>>> > I've created a new netdev: netdev_netmap that uses the pmd >>>>>> infrastructure. >>>>>> > The prototype I have seems to work fine (I still need to tune >>>>>> performance, >>>>>> > test optional features, and test more complex topologies.) >>>>>> >>>>>> Cool. Looking forward for your RFC patch-set. >>>>>> >>>>>> > >>>>>> > I have a question about the lifetime of dp_packets. >>>>>> > Is there any guarantee that the dp_packets allocated in a receive >>>>>> callback >>>>>> > (e.g. netdev_netmap_rxq_recv) are consumed by OVS (e.g. dropped, >>>>>> cloned, or >>>>>> > sent to other ports) **before** a subsequent call to the receive >>>>>> callback >>>>>> > (on the same port)? >>>>>> > Or is it possible for dp_packets to be stored somewhere (e.g. in an >>>>>> OVS >>>>>> > internal queue) and live across subsequent invocations of the >>>>>> receive >>>>>> > callback that allocated them? >>>>>> >>>>>> I think that there was never such a guarantee, but recent changes in >>>>>> userspace >>>>>> datapath completely ruined this assumption. I mean output packet >>>>>> batching support. >>>>>> >>>>>> Please refer the following commits for details: >>>>>> 009e003 2017-12-14 | dpif-netdev: Output packet batching. >>>>>> c71ea3c 2018-01-15 | dpif-netdev: Time based output batching. >>>>>> 00adb8d 2018-01-15 | docs: Describe output packet batching in DPDK >>>>>> guide. >>>>>> >>>>>> > >>>>>> > I need to know if this is the case to check that my current >>>>>> prototype is >>>>>> > safe. >>>>>> > I use per-port pre-allocation of dp_packets, for maximum >>>>>> performance. I've >>>>>> > seen that DPDK uses its internal allocator to allocate and >>>>>> deallocate >>>>>> > dp_packets, but netmap does not expose one. >>>>>> > Each packet received with netmap is created as a new type dp_packet: >>>>>> > DPBUF_NETMAP. The data points to a netmap buffer (preallocated by >>>>>> the >>>>>> > kernel). >>>>>> > When I receive data (netdev_netmap_rxq_recv) I reuse the dp_packets, >>>>>> > updating the internal pointer and a couple of additional >>>>>> informations >>>>>> > stored inside the dp_packet. >>>>>> > When I have to send data I use zero copy if dp_packet is >>>>>> DPBUF_NETMAP and >>>>>> > copy if it's not. >>>>>> > >>>>>> > Thanks for the help! >>>>>> > Alessandro. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> d...@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> >>>>> >>>>> >>> >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev