Hi Mark,
David had some comments about the NEWS file, and I found an issue on Windows below. On Tue, Jul 06, 2021 at 05:31:11AM -0400, Mark Gray wrote: > The Open vSwitch kernel module uses the upcall mechanism to send > packets from kernel space to user space when it misses in the kernel > space flow table. The upcall sends packets via a Netlink socket. > Currently, a Netlink socket is created for every vport. In this way, > there is a 1:1 mapping between a vport and a Netlink socket. > When a packet is received by a vport, if it needs to be sent to > user space, it is sent via the corresponding Netlink socket. > > This mechanism, with various iterations of the corresponding user > space code, has seen some limitations and issues: > > * On systems with a large number of vports, there is correspondingly > a large number of Netlink sockets which can limit scaling. > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > * Packet reordering on upcalls. > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > * A thundering herd issue. > (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) > > This patch introduces an alternative, feature-negotiated, upcall > mode using a per-cpu dispatch rather than a per-vport dispatch. > > In this mode, the Netlink socket to be used for the upcall is > selected based on the CPU of the thread that is executing the upcall. > In this way, it resolves the issues above as: > > a) The number of Netlink sockets scales with the number of CPUs > rather than the number of vports. > b) Ordering per-flow is maintained as packets are distributed to > CPUs based on mechanisms such as RSS and flows are distributed > to a single user space thread. > c) Packets from a flow can only wake up one user space thread. > > Reported-at: https://bugzilla.redhat.com/1844576 > Signed-off-by: Mark Gray <mark.d.g...@redhat.com> > --- > > Notes: > v1 - Reworked based on Flavio's comments: > * change DISPATCH_MODE_PER_CPU() to inline function > * add `ovs-appctl` command to check dispatch mode for datapaths > * fixed issue with userspace actions (tested using `ovs-ofctl > monitor br0 65534 -P nxt_packet_in`) > * update documentation as requested > v2 - Reworked based on Flavio's comments: > * Used dpif_netlink_upcall_per_cpu() for check in > dpif_netlink_set_handler_pids() > * Added macro for (ignored) Netlink PID > * Fixed indentation issue > * Added NEWS entry > * Added section to ovs-vswitchd.8 man page > v4 - Reworked based on Flavio's comments: > * Cleaned up log message when dispatch mode is set > > NEWS | 7 +- > .../linux/compat/include/linux/openvswitch.h | 7 + > lib/automake.mk | 1 + > lib/dpif-netdev.c | 1 + > lib/dpif-netlink-unixctl.man | 6 + > lib/dpif-netlink.c | 464 ++++++++++++++++-- > lib/dpif-provider.h | 32 +- > lib/dpif.c | 17 + > lib/dpif.h | 1 + > ofproto/ofproto-dpif-upcall.c | 51 +- > ofproto/ofproto.c | 12 - > vswitchd/ovs-vswitchd.8.in | 1 + > vswitchd/vswitch.xml | 23 +- > 13 files changed, 526 insertions(+), 97 deletions(-) > create mode 100644 lib/dpif-netlink-unixctl.man > > diff --git a/NEWS b/NEWS > index a2a2dcf95d7d..80b13e358685 100644 > --- a/NEWS > +++ b/NEWS > @@ -29,7 +29,12 @@ Post-v2.15.0 > - ovsdb-tool: > * New option '--election-timer' to the 'create-cluster' command to set > the > leader election timer during cluster creation. > - > + - Per-cpu upcall dispatching: > + * ovs-vswitchd will configure the kernel module using per-cpu dispatch > + mode (if available). This changes the way upcalls are delivered to > user > + space in order to resolve a number of issues with per-vport dispatch. > + The new debug appctl command `dpif-netlink/dispatch-mode` > + will return the current dispatch mode for each datapath. > > v2.15.0 - 15 Feb 2021 > --------------------- > diff --git a/datapath/linux/compat/include/linux/openvswitch.h > b/datapath/linux/compat/include/linux/openvswitch.h > index 875de20250ce..f29265df055e 100644 > --- a/datapath/linux/compat/include/linux/openvswitch.h > +++ b/datapath/linux/compat/include/linux/openvswitch.h > @@ -89,6 +89,8 @@ enum ovs_datapath_cmd { > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > * not be sent. > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the > * datapath. Always present in notifications. > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > the > @@ -105,6 +107,8 @@ enum ovs_datapath_attr { > OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */ > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > OVS_DP_ATTR_PAD, > + OVS_DP_ATTR_PAD2, > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls */ > __OVS_DP_ATTR_MAX > }; > > @@ -146,6 +150,9 @@ struct ovs_vport_stats { > /* Allow tc offload recirc sharing */ > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > +/* Allow per-cpu dispatch of upcalls */ > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > + > /* Fixed logical ports. */ > #define OVSP_LOCAL ((__u32)0) > > diff --git a/lib/automake.mk b/lib/automake.mk > index db90175918fc..192db5d87ce3 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -556,6 +556,7 @@ MAN_FRAGMENTS += \ > lib/memory-unixctl.man \ > lib/netdev-dpdk-unixctl.man \ > lib/dpif-netdev-unixctl.man \ > + lib/dpif-netlink-unixctl.man \ > lib/ofp-version.man \ > lib/ovs.tmac \ > lib/ovs-replay.man \ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 026b52d27d45..b9f7bb86055a 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -8548,6 +8548,7 @@ const struct dpif_class dpif_netdev_class = { > dpif_netdev_operate, > NULL, /* recv_set */ > NULL, /* handlers_set */ > + NULL, /* number_handlers_required */ > dpif_netdev_set_config, > dpif_netdev_queue_to_priority, > NULL, /* recv */ > diff --git a/lib/dpif-netlink-unixctl.man b/lib/dpif-netlink-unixctl.man > new file mode 100644 > index 000000000000..ce5c47e85c80 > --- /dev/null > +++ b/lib/dpif-netlink-unixctl.man > @@ -0,0 +1,6 @@ > +.SS "DPIF-NETLINK COMMANDS" > +These commands are used to expose internal information of the "dpif-netlink" > +kernel space datapath. > +. > +.IP "\fBdpif-netlink/dispatch-mode\fR" > +Displays the "dispatch-mode" for all datapaths. > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index f92905dd83fd..8f97ccf49ad0 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -84,6 +84,9 @@ enum { MAX_PORTS = USHRT_MAX }; > #define EPOLLEXCLUSIVE (1u << 28) > #endif > > +/* This PID is not used by the kernel datapath when using dispatch per CPU, > + * but it is required to be set (not zero). */ > +#define DPIF_NETLINK_PER_CPU_PID UINT32_MAX > struct dpif_netlink_dp { > /* Generic Netlink header. */ > uint8_t cmd; > @@ -98,6 +101,8 @@ struct dpif_netlink_dp { > const struct ovs_dp_stats *stats; /* OVS_DP_ATTR_STATS. */ > const struct ovs_dp_megaflow_stats *megaflow_stats; > /* OVS_DP_ATTR_MEGAFLOW_STATS.*/ > + const uint32_t *upcall_pids; /* OVS_DP_ATTR_PER_CPU_PIDS */ > + uint32_t n_upcall_pids; > }; > > static void dpif_netlink_dp_init(struct dpif_netlink_dp *); > @@ -113,6 +118,10 @@ static int dpif_netlink_dp_get(const struct dpif *, > static int > dpif_netlink_set_features(struct dpif *dpif_, uint32_t new_features); > > +static void > +dpif_netlink_unixctl_dispatch_mode(struct unixctl_conn *conn, int argc, > + const char *argv[], void *aux); > + > struct dpif_netlink_flow { > /* Generic Netlink header. */ > uint8_t cmd; > @@ -178,11 +187,16 @@ struct dpif_windows_vport_sock { > #endif > > struct dpif_handler { > + /* per-vport dispatch mode */ > struct epoll_event *epoll_events; > int epoll_fd; /* epoll fd that includes channel socks. */ > int n_events; /* Num events returned by epoll_wait(). */ > int event_offset; /* Offset into 'epoll_events'. */ > > + /* per-cpu dispatch mode */ > + struct nl_sock *sock; /* Each handler thread holds one netlink > + socket. */ > + > #ifdef _WIN32 > /* Pool of sockets. */ > struct dpif_windows_vport_sock *vport_sock_pool; > @@ -201,6 +215,8 @@ struct dpif_netlink { > struct fat_rwlock upcall_lock; > struct dpif_handler *handlers; > uint32_t n_handlers; /* Num of upcall handlers. */ > + > + /* Per-vport dispatch mode */ > struct dpif_channel *channels; /* Array of channels for each port. */ > int uc_array_size; /* Size of 'handler->channels' and */ > /* 'handler->epoll_events'. */ > @@ -241,8 +257,12 @@ static int open_dpif(const struct dpif_netlink_dp *, > struct dpif **); > static uint32_t dpif_netlink_port_get_pid(const struct dpif *, > odp_port_t port_no); > static void dpif_netlink_handler_uninit(struct dpif_handler *handler); > -static int dpif_netlink_refresh_channels(struct dpif_netlink *, > - uint32_t n_handlers); > +static int dpif_netlink_refresh_handlers_vport_dispatch(struct dpif_netlink > *, > + uint32_t n_handlers); > +static void destroy_all_channels(struct dpif_netlink *); > +static int dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *); > +static void destroy_all_handlers(struct dpif_netlink *); > + > static void dpif_netlink_vport_to_ofpbuf(const struct dpif_netlink_vport *, > struct ofpbuf *); > static int dpif_netlink_vport_from_ofpbuf(struct dpif_netlink_vport *, > @@ -294,6 +314,11 @@ dpif_netlink_cast(const struct dpif *dpif) > return CONTAINER_OF(dpif, struct dpif_netlink, dpif); > } > > +static inline bool > +dpif_netlink_upcall_per_cpu(const struct dpif_netlink *dpif) { > + return !!((dpif)->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU); > +} > + > static int > dpif_netlink_enumerate(struct sset *all_dps, > const struct dpif_class *dpif_class OVS_UNUSED) > @@ -357,11 +382,39 @@ dpif_netlink_open(const struct dpif_class *class > OVS_UNUSED, const char *name, > dp_request.cmd = OVS_DP_CMD_SET; > } > > + /* The Open vSwitch kernel module has two modes for dispatching upcalls: > + * per-vport and per-cpu. > + * > + * When dispatching upcalls per-vport, the kernel will > + * send the upcall via a Netlink socket that has been selected based on > the > + * vport that received the packet that is causing the upcall. > + * > + * When dispatching upcall per-cpu, the kernel will send the upcall via > + * a Netlink socket that has been selected based on the cpu that received > + * the packet that is causing the upcall. > + * > + * First we test to see if the kernel module supports per-cpu dispatching > + * (the preferred method). If it does not support per-cpu dispatching, we > + * fall back to the per-vport dispatch mode. > + */ > dp_request.user_features |= OVS_DP_F_UNALIGNED; > - dp_request.user_features |= OVS_DP_F_VPORT_PIDS; > + dp_request.user_features &= ~OVS_DP_F_VPORT_PIDS; > + dp_request.user_features |= OVS_DP_F_DISPATCH_UPCALL_PER_CPU; > error = dpif_netlink_dp_transact(&dp_request, &dp, &buf); > if (error) { > - return error; > + dp_request.user_features &= ~OVS_DP_F_DISPATCH_UPCALL_PER_CPU; > + dp_request.user_features |= OVS_DP_F_VPORT_PIDS; > + error = dpif_netlink_dp_transact(&dp_request, &dp, &buf); > + if (error) { > + return error; > + } > + if (create) { > + VLOG_INFO("Datapath dispatch mode: per-vport"); > + } > + } else { > + if (create) { > + VLOG_INFO("Datapath dispatch mode: per-cpu"); > + } > } > > error = open_dpif(&dp, dpifp); > @@ -609,6 +662,24 @@ destroy_all_channels(struct dpif_netlink *dpif) > dpif->uc_array_size = 0; > } > > +static void > +destroy_all_handlers(struct dpif_netlink *dpif) > + OVS_REQ_WRLOCK(dpif->upcall_lock) > +{ > + int i = 0; > + > + if (!dpif->handlers) { > + return; > + } > + for (i = 0; i < dpif->n_handlers; i++) { > + struct dpif_handler *handler = &dpif->handlers[i]; > + close_nl_sock(handler->sock); > + } > + free(dpif->handlers); > + dpif->handlers = NULL; > + dpif->n_handlers = 0; > +} > + > static void > dpif_netlink_close(struct dpif *dpif_) > { > @@ -617,7 +688,11 @@ dpif_netlink_close(struct dpif *dpif_) > nl_sock_destroy(dpif->port_notifier); > > fat_rwlock_wrlock(&dpif->upcall_lock); > - destroy_all_channels(dpif); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + destroy_all_handlers(dpif); > + } else { > + destroy_all_channels(dpif); > + } > fat_rwlock_unlock(&dpif->upcall_lock); > > fat_rwlock_destroy(&dpif->upcall_lock); > @@ -641,11 +716,14 @@ dpif_netlink_run(struct dpif *dpif_) > { > struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > > - if (dpif->refresh_channels) { > - dpif->refresh_channels = false; > - fat_rwlock_wrlock(&dpif->upcall_lock); > - dpif_netlink_refresh_channels(dpif, dpif->n_handlers); > - fat_rwlock_unlock(&dpif->upcall_lock); > + if (!dpif_netlink_upcall_per_cpu(dpif)) { > + if (dpif->refresh_channels) { > + dpif->refresh_channels = false; > + fat_rwlock_wrlock(&dpif->upcall_lock); > + dpif_netlink_refresh_handlers_vport_dispatch(dpif, > + dpif->n_handlers); > + fat_rwlock_unlock(&dpif->upcall_lock); > + } > } > return false; > } > @@ -681,6 +759,41 @@ dpif_netlink_get_stats(const struct dpif *dpif_, struct > dpif_dp_stats *stats) > return error; > } > > +static int > +dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t > *upcall_pids, > + uint32_t n_upcall_pids) > +{ > + struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > + struct dpif_netlink_dp request, reply; > + struct ofpbuf *bufp; > + int error; > + int n_cores; > + > + n_cores = count_cpu_cores(); > + ovs_assert(n_cores == n_upcall_pids); > + VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores); > + > + dpif_netlink_dp_init(&request); > + request.cmd = OVS_DP_CMD_SET; > + request.name = dpif_->base_name; > + request.dp_ifindex = dpif->dp_ifindex; > + request.user_features = dpif->user_features | > + OVS_DP_F_DISPATCH_UPCALL_PER_CPU; > + > + request.upcall_pids = upcall_pids; > + request.n_upcall_pids = n_cores; > + > + error = dpif_netlink_dp_transact(&request, &reply, &bufp); > + if (!error) { > + dpif->user_features = reply.user_features; > + ofpbuf_delete(bufp); > + if (!dpif_netlink_upcall_per_cpu(dpif)) { > + return -EOPNOTSUPP; > + } > + } > + return error; > +} > + > static int > dpif_netlink_set_features(struct dpif *dpif_, uint32_t new_features) > { > @@ -741,7 +854,7 @@ get_vport_type(const struct dpif_netlink_vport *vport) > return "erspan"; > > case OVS_VPORT_TYPE_IP6ERSPAN: > - return "ip6erspan"; > + return "ip6erspan"; > > case OVS_VPORT_TYPE_IP6GRE: > return "ip6gre"; > @@ -807,10 +920,16 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, > const char *name, > uint32_t upcall_pids = 0; > int error = 0; > > - if (dpif->handlers) { > - error = create_nl_sock(dpif, &sock); > - if (error) { > - return error; > + /* per-cpu dispatch mode does not require a socket per vport */ > + if (!dpif_netlink_upcall_per_cpu(dpif)) { > + if (dpif->handlers) { > + error = create_nl_sock(dpif, &sock); > + if (error) { > + return error; > + } > + } > + if (sock) { > + upcall_pids = nl_sock_pid(sock); > } > } > > @@ -821,9 +940,6 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, const > char *name, > request.name = name; > > request.port_no = *port_nop; > - if (sock) { > - upcall_pids = nl_sock_pid(sock); > - } > request.n_upcall_pids = 1; > request.upcall_pids = &upcall_pids; > > @@ -845,19 +961,21 @@ dpif_netlink_port_add__(struct dpif_netlink *dpif, > const char *name, > goto exit; > } > > - error = vport_add_channel(dpif, *port_nop, sock); > - if (error) { > - VLOG_INFO("%s: could not add channel for port %s", > - dpif_name(&dpif->dpif), name); > - > - /* Delete the port. */ > - dpif_netlink_vport_init(&request); > - request.cmd = OVS_VPORT_CMD_DEL; > - request.dp_ifindex = dpif->dp_ifindex; > - request.port_no = *port_nop; > - dpif_netlink_vport_transact(&request, NULL, NULL); > - close_nl_sock(sock); > - goto exit; > + if (!dpif_netlink_upcall_per_cpu(dpif)) { > + error = vport_add_channel(dpif, *port_nop, sock); > + if (error) { > + VLOG_INFO("%s: could not add channel for port %s", > + dpif_name(&dpif->dpif), name); > + > + /* Delete the port. */ > + dpif_netlink_vport_init(&request); > + request.cmd = OVS_VPORT_CMD_DEL; > + request.dp_ifindex = dpif->dp_ifindex; > + request.port_no = *port_nop; > + dpif_netlink_vport_transact(&request, NULL, NULL); > + close_nl_sock(sock); > + goto exit; > + } > } > > exit: > @@ -1115,6 +1233,16 @@ dpif_netlink_port_get_pid(const struct dpif *dpif_, > odp_port_t port_no) > const struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > uint32_t ret; > > + /* In per-cpu dispatch mode, vports do not have an associated PID */ > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + /* In per-cpu dispatch mode, this will be ignored as kernel space > will > + * select the PID before sending to user space. We set to > + * DPIF_NETLINK_PER_CPU_PID as 0 is rejected by kernel space as an > + * invalid PID. > + */ > + return DPIF_NETLINK_PER_CPU_PID; > + } > + > fat_rwlock_rdlock(&dpif->upcall_lock); > ret = dpif_netlink_port_get_pid__(dpif, port_no); > fat_rwlock_unlock(&dpif->upcall_lock); > @@ -2326,12 +2454,51 @@ dpif_netlink_handler_uninit(struct dpif_handler > *handler) > } > #endif > > +static int > +dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) > + OVS_REQ_WRLOCK(dpif->upcall_lock) > +{ > + int handler_id; > + int error = 0; > + uint32_t n_handlers; > + uint32_t *upcall_pids; > + > + n_handlers = count_cpu_cores(); > + if (dpif->n_handlers != n_handlers) { > + VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers", > + n_handlers); > + destroy_all_handlers(dpif); > + upcall_pids = xzalloc(n_handlers * sizeof *upcall_pids); > + dpif->handlers = xzalloc(n_handlers * sizeof *dpif->handlers); > + for (handler_id = 0; handler_id < n_handlers; handler_id++) { > + struct dpif_handler *handler = &dpif->handlers[handler_id]; > + error = create_nl_sock(dpif, &handler->sock); > + if (error) { > + VLOG_ERR("Dispatch mode(per-cpu): Cannot create socket for" > + "handler %d", handler_id); > + continue; > + } > + upcall_pids[handler_id] = nl_sock_pid(handler->sock); > + VLOG_DBG("Dispatch mode(per-cpu): " > + "handler %d has Netlink PID of %u", > + handler_id, upcall_pids[handler_id]); > + } > + > + dpif->n_handlers = n_handlers; > + error = dpif_netlink_set_handler_pids(&dpif->dpif, upcall_pids, > + n_handlers); > + free(upcall_pids); > + } > + return error; > +} > + > /* Synchronizes 'channels' in 'dpif->handlers' with the set of vports > * currently in 'dpif' in the kernel, by adding a new set of channels for > * any kernel vport that lacks one and deleting any channels that have no > * backing kernel vports. */ > static int > -dpif_netlink_refresh_channels(struct dpif_netlink *dpif, uint32_t n_handlers) > +dpif_netlink_refresh_handlers_vport_dispatch(struct dpif_netlink *dpif, > + uint32_t n_handlers) > OVS_REQ_WRLOCK(dpif->upcall_lock) > { > unsigned long int *keep_channels; > @@ -2458,7 +2625,7 @@ dpif_netlink_refresh_channels(struct dpif_netlink > *dpif, uint32_t n_handlers) > } > > static int > -dpif_netlink_recv_set__(struct dpif_netlink *dpif, bool enable) > +dpif_netlink_recv_set_vport_dispatch(struct dpif_netlink *dpif, bool enable) > OVS_REQ_WRLOCK(dpif->upcall_lock) > { > if ((dpif->handlers != NULL) == enable) { > @@ -2467,7 +2634,21 @@ dpif_netlink_recv_set__(struct dpif_netlink *dpif, > bool enable) > destroy_all_channels(dpif); > return 0; > } else { > - return dpif_netlink_refresh_channels(dpif, 1); > + return dpif_netlink_refresh_handlers_vport_dispatch(dpif, 1); > + } > +} > + > +static int > +dpif_netlink_recv_set_cpu_dispatch(struct dpif_netlink *dpif, bool enable) > + OVS_REQ_WRLOCK(dpif->upcall_lock) > +{ > + if ((dpif->handlers != NULL) == enable) { > + return 0; > + } else if (!enable) { > + destroy_all_handlers(dpif); > + return 0; > + } else { > + return dpif_netlink_refresh_handlers_cpu_dispatch(dpif); > } > } > > @@ -2478,7 +2659,11 @@ dpif_netlink_recv_set(struct dpif *dpif_, bool enable) > int error; > > fat_rwlock_wrlock(&dpif->upcall_lock); > - error = dpif_netlink_recv_set__(dpif, enable); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + error = dpif_netlink_recv_set_cpu_dispatch(dpif, enable); > + } else { > + error = dpif_netlink_recv_set_vport_dispatch(dpif, enable); > + } > fat_rwlock_unlock(&dpif->upcall_lock); > > return error; > @@ -2500,13 +2685,31 @@ dpif_netlink_handlers_set(struct dpif *dpif_, > uint32_t n_handlers) > > fat_rwlock_wrlock(&dpif->upcall_lock); > if (dpif->handlers) { > - error = dpif_netlink_refresh_channels(dpif, n_handlers); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + error = dpif_netlink_refresh_handlers_cpu_dispatch(dpif); > + } else { > + error = dpif_netlink_refresh_handlers_vport_dispatch(dpif, > + n_handlers); > + } > } > fat_rwlock_unlock(&dpif->upcall_lock); > > return error; > } > > +static bool > +dpif_netlink_number_handlers_required(struct dpif *dpif_, uint32_t > *n_handlers) > +{ > + struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > + > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + *n_handlers = count_cpu_cores(); > + return true; > + } > + > + return false; > +} > + > static int > dpif_netlink_queue_to_priority(const struct dpif *dpif OVS_UNUSED, > uint32_t queue_id, uint32_t *priority) > @@ -2669,8 +2872,59 @@ dpif_netlink_recv_windows(struct dpif_netlink *dpif, > uint32_t handler_id, > } > #else > static int > -dpif_netlink_recv__(struct dpif_netlink *dpif, uint32_t handler_id, > - struct dpif_upcall *upcall, struct ofpbuf *buf) > +dpif_netlink_recv_cpu_dispatch(struct dpif_netlink *dpif, uint32_t > handler_id, > + struct dpif_upcall *upcall, struct ofpbuf > *buf) > + OVS_REQ_RDLOCK(dpif->upcall_lock) > +{ > + struct dpif_handler *handler; > + int read_tries = 0; > + > + if (!dpif->handlers || handler_id >= dpif->n_handlers) { > + return EAGAIN; > + } > + > + handler = &dpif->handlers[handler_id]; > + > + for (;;) { > + int dp_ifindex; > + int error; > + > + if (++read_tries > 50) { > + return EAGAIN; > + } > + error = nl_sock_recv(handler->sock, buf, NULL, false); > + if (error == ENOBUFS) { > + /* ENOBUFS typically means that we've received so many > + * packets that the buffer overflowed. Try again > + * immediately because there's almost certainly a packet > + * waiting for us. */ > + report_loss(dpif, NULL, 0, handler_id); > + continue; > + } > + > + if (error) { > + if (error == EAGAIN) { > + break; > + } > + return error; > + } > + > + error = parse_odp_packet(buf, upcall, &dp_ifindex); > + if (!error && dp_ifindex == dpif->dp_ifindex) { > + return 0; > + } else if (error) { > + return error; > + } > + } > + > + return EAGAIN; > +} > + > +static int > +dpif_netlink_recv_vport_dispatch(struct dpif_netlink *dpif, > + uint32_t handler_id, > + struct dpif_upcall *upcall, > + struct ofpbuf *buf) > OVS_REQ_RDLOCK(dpif->upcall_lock) > { > struct dpif_handler *handler; > @@ -2755,18 +3009,24 @@ dpif_netlink_recv(struct dpif *dpif_, uint32_t > handler_id, > #ifdef _WIN32 > error = dpif_netlink_recv_windows(dpif, handler_id, upcall, buf); > #else > - error = dpif_netlink_recv__(dpif, handler_id, upcall, buf); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + error = dpif_netlink_recv_cpu_dispatch(dpif, handler_id, upcall, > buf); > + } else { > + error = dpif_netlink_recv_vport_dispatch(dpif, > + handler_id, upcall, buf); > + } > #endif > fat_rwlock_unlock(&dpif->upcall_lock); > > return error; > } > > +#ifdef _WIN32 > static void > -dpif_netlink_recv_wait__(struct dpif_netlink *dpif, uint32_t handler_id) > +dpif_netlink_recv_wait_windows(struct dpif_netlink *dpif, uint32_t > handler_id) > OVS_REQ_RDLOCK(dpif->upcall_lock) > { > -#ifdef _WIN32 > + > uint32_t i; > struct dpif_windows_vport_sock *sock_pool = > dpif->handlers[handler_id].vport_sock_pool; > @@ -2779,13 +3039,31 @@ dpif_netlink_recv_wait__(struct dpif_netlink *dpif, > uint32_t handler_id) > for (i = 0; i < VPORT_SOCK_POOL_SIZE; i++) { > nl_sock_wait(sock_pool[i].nl_sock, POLLIN); > } > -#else > +} > +#endif The two functions below are Linux specific, so they need to be conditional to the #ifdef above like you did in dpif_netlink_recv_wait(): #ifdef _WIN32 dpif_netlink_recv_wait_windows() #else dpif_netlink_recv_wait_vport_dispatch() dpif_netlink_recv_wait_cpu_dispatch() #endif Otherwise I don't see anything else. Tests are good on my end. Thanks fbl > + > +static void > +dpif_netlink_recv_wait_vport_dispatch(struct dpif_netlink *dpif, > + uint32_t handler_id) > + OVS_REQ_RDLOCK(dpif->upcall_lock) > +{ > if (dpif->handlers && handler_id < dpif->n_handlers) { > struct dpif_handler *handler = &dpif->handlers[handler_id]; > > poll_fd_wait(handler->epoll_fd, POLLIN); > } > -#endif > +} > + > +static void > +dpif_netlink_recv_wait_cpu_dispatch(struct dpif_netlink *dpif, > + uint32_t handler_id) > + OVS_REQ_RDLOCK(dpif->upcall_lock) > +{ > + if (dpif->handlers && handler_id < dpif->n_handlers) { > + struct dpif_handler *handler = &dpif->handlers[handler_id]; > + > + poll_fd_wait(nl_sock_fd(handler->sock), POLLIN); > + } > } > > static void > @@ -2794,12 +3072,20 @@ dpif_netlink_recv_wait(struct dpif *dpif_, uint32_t > handler_id) > struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > > fat_rwlock_rdlock(&dpif->upcall_lock); > - dpif_netlink_recv_wait__(dpif, handler_id); > +#ifdef _WIN32 > + dpif_netlink_recv_wait_windows(dpif, handler_id); > +#else > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + dpif_netlink_recv_wait_cpu_dispatch(dpif, handler_id); > + } else { > + dpif_netlink_recv_wait_vport_dispatch(dpif, handler_id); > + } > +#endif > fat_rwlock_unlock(&dpif->upcall_lock); > } > > static void > -dpif_netlink_recv_purge__(struct dpif_netlink *dpif) > +dpif_netlink_recv_purge_vport_dispatch(struct dpif_netlink *dpif) > OVS_REQ_WRLOCK(dpif->upcall_lock) > { > if (dpif->handlers) { > @@ -2815,13 +3101,31 @@ dpif_netlink_recv_purge__(struct dpif_netlink *dpif) > } > } > > +static void > +dpif_netlink_recv_purge_cpu_dispatch(struct dpif_netlink *dpif) > + OVS_REQ_WRLOCK(dpif->upcall_lock) > +{ > + int handler_id; > + > + if (dpif->handlers) { > + for (handler_id = 0; handler_id < dpif->n_handlers; handler_id++) { > + struct dpif_handler *handler = &dpif->handlers[handler_id]; > + nl_sock_drain(handler->sock); > + } > + } > +} > + > static void > dpif_netlink_recv_purge(struct dpif *dpif_) > { > struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > > fat_rwlock_wrlock(&dpif->upcall_lock); > - dpif_netlink_recv_purge__(dpif); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + dpif_netlink_recv_purge_cpu_dispatch(dpif); > + } else { > + dpif_netlink_recv_purge_vport_dispatch(dpif); > + } > fat_rwlock_unlock(&dpif->upcall_lock); > } > > @@ -3978,6 +4282,7 @@ const struct dpif_class dpif_netlink_class = { > dpif_netlink_operate, > dpif_netlink_recv_set, > dpif_netlink_handlers_set, > + dpif_netlink_number_handlers_required, > NULL, /* set_config */ > dpif_netlink_queue_to_priority, > dpif_netlink_recv, > @@ -4065,6 +4370,9 @@ dpif_netlink_init(void) > > ovs_tunnels_out_of_tree = dpif_netlink_rtnl_probe_oot_tunnels(); > > + unixctl_command_register("dpif-netlink/dispatch-mode", "", 0, 0, > + dpif_netlink_unixctl_dispatch_mode, NULL); > + > ovsthread_once_done(&once); > } > > @@ -4341,6 +4649,11 @@ dpif_netlink_dp_to_ofpbuf(const struct dpif_netlink_dp > *dp, struct ofpbuf *buf) > nl_msg_put_u32(buf, OVS_DP_ATTR_USER_FEATURES, dp->user_features); > } > > + if (dp->upcall_pids) { > + nl_msg_put_unspec(buf, OVS_DP_ATTR_PER_CPU_PIDS, dp->upcall_pids, > + sizeof *dp->upcall_pids * dp->n_upcall_pids); > + } > + > /* Skip OVS_DP_ATTR_STATS since we never have a reason to serialize it. > */ > } > > @@ -4660,13 +4973,58 @@ report_loss(struct dpif_netlink *dpif, struct > dpif_channel *ch, uint32_t ch_idx, > return; > } > > - ds_init(&s); > - if (ch->last_poll != LLONG_MIN) { > - ds_put_format(&s, " (last polled %lld ms ago)", > - time_msec() - ch->last_poll); > + if (dpif_netlink_upcall_per_cpu(dpif)) { > + VLOG_WARN("%s: lost packet on handler %u", > + dpif_name(&dpif->dpif), handler_id); > + } else { > + ds_init(&s); > + if (ch->last_poll != LLONG_MIN) { > + ds_put_format(&s, " (last polled %lld ms ago)", > + time_msec() - ch->last_poll); > + } > + > + VLOG_WARN("%s: lost packet on port channel %u of handler %u%s", > + dpif_name(&dpif->dpif), ch_idx, handler_id, ds_cstr(&s)); > + ds_destroy(&s); > + } > +} > + > +static void > +dpif_netlink_unixctl_dispatch_mode(struct unixctl_conn *conn, > + int argc OVS_UNUSED, > + const char *argv[] OVS_UNUSED, > + void *aux OVS_UNUSED) > +{ > + struct ds reply = DS_EMPTY_INITIALIZER; > + struct nl_dump dump; > + uint64_t reply_stub[NL_DUMP_BUFSIZE / 8]; > + struct ofpbuf msg, buf; > + int error; > + > + error = dpif_netlink_init(); > + if (error) { > + return; > + } > + > + ofpbuf_use_stub(&buf, reply_stub, sizeof reply_stub); > + dpif_netlink_dp_dump_start(&dump); > + while (nl_dump_next(&dump, &msg, &buf)) { > + struct dpif_netlink_dp dp; > + if (!dpif_netlink_dp_from_ofpbuf(&dp, &msg)) { > + ds_put_format(&reply, "%s: ", dp.name); > + if (dp.user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU) { > + ds_put_format(&reply, "per-cpu dispatch mode"); > + } else { > + ds_put_format(&reply, "per-vport dispatch mode"); > + } > + ds_put_format(&reply, "\n"); > + } > + } > + ofpbuf_uninit(&buf); > + error = nl_dump_done(&dump); > + if (!error) { > + unixctl_command_reply(conn, ds_cstr(&reply)); > } > > - VLOG_WARN("%s: lost packet on port channel %u of handler %u%s", > - dpif_name(&dpif->dpif), ch_idx, handler_id, ds_cstr(&s)); > - ds_destroy(&s); > + ds_destroy(&reply); > } > diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h > index b817fceac698..21068534bf4d 100644 > --- a/lib/dpif-provider.h > +++ b/lib/dpif-provider.h > @@ -336,26 +336,28 @@ struct dpif_class { > * updating flows as necessary if it does this. */ > int (*recv_set)(struct dpif *dpif, bool enable); > > - /* Refreshes the poll loops and Netlink sockets associated to each port, > - * when the number of upcall handlers (upcall receiving thread) is > changed > - * to 'n_handlers' and receiving packets for 'dpif' is enabled by > + /* Attempts to refresh the poll loops and Netlink sockets used for > handling > + * upcalls when the number of upcall handlers (upcall receiving thread) > is > + * changed to 'n_handlers' and receiving packets for 'dpif' is enabled by > * recv_set(). > * > - * Since multiple upcall handlers can read upcalls simultaneously from > - * 'dpif', each port can have multiple Netlink sockets, one per upcall > - * handler. So, handlers_set() is responsible for the following tasks: > + * A dpif implementation may choose to ignore 'n_handlers' while > returning > + * success. > * > - * When receiving upcall is enabled, extends or creates the > - * configuration to support: > - * > - * - 'n_handlers' Netlink sockets for each port. > + * The method for distribution of upcalls between handler threads is > + * specific to the dpif implementation. > + */ > + int (*handlers_set)(struct dpif *dpif, uint32_t n_handlers); > + > + /* Queries 'dpif' to see if a certain number of handlers are required by > + * the implementation. > * > - * - 'n_handlers' poll loops, one for each upcall handler. > + * If a certain number of handlers are required, returns 'true' and sets > + * 'n_handlers' to that number of handler threads. > * > - * - registering the Netlink sockets for the same upcall handler > to > - * the corresponding poll loop. > - * */ > - int (*handlers_set)(struct dpif *dpif, uint32_t n_handlers); > + * If not, returns 'false'. > + */ > + bool (*number_handlers_required)(struct dpif *dpif, uint32_t > *n_handlers); > > /* Pass custom configuration options to the datapath. The implementation > * might postpone applying the changes until run() is called. */ > diff --git a/lib/dpif.c b/lib/dpif.c > index 26e8bfb7db98..511383514d5b 100644 > --- a/lib/dpif.c > +++ b/lib/dpif.c > @@ -1489,6 +1489,23 @@ dpif_handlers_set(struct dpif *dpif, uint32_t > n_handlers) > return error; > } > > +/* Checks if a certain number of handlers are required. > + * > + * If a certain number of handlers are required, returns 'true' and sets > + * 'n_handlers' to that number of handler threads. > + * > + * If not, returns 'false' > + */ > +bool > +dpif_number_handlers_required(struct dpif *dpif, uint32_t *n_handlers) { > + bool ret = false; > + > + if (dpif->dpif_class->number_handlers_required) { > + ret = dpif->dpif_class->number_handlers_required(dpif, n_handlers); > + } > + return ret; > +} > + > void > dpif_register_dp_purge_cb(struct dpif *dpif, dp_purge_callback *cb, void > *aux) > { > diff --git a/lib/dpif.h b/lib/dpif.h > index f9728e67393b..7c322d20e6c7 100644 > --- a/lib/dpif.h > +++ b/lib/dpif.h > @@ -873,6 +873,7 @@ void dpif_register_upcall_cb(struct dpif *, > upcall_callback *, void *aux); > > int dpif_recv_set(struct dpif *, bool enable); > int dpif_handlers_set(struct dpif *, uint32_t n_handlers); > +bool dpif_number_handlers_required(struct dpif *, uint32_t *n_handlers); > int dpif_set_config(struct dpif *, const struct smap *cfg); > int dpif_port_set_config(struct dpif *, odp_port_t, const struct smap *cfg); > int dpif_recv(struct dpif *, uint32_t handler_id, struct dpif_upcall *, > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index d22f7f07361f..f9145bf23185 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -636,24 +636,61 @@ udpif_set_threads(struct udpif *udpif, uint32_t > n_handlers_, > uint32_t n_revalidators_) > { > ovs_assert(udpif); > - ovs_assert(n_handlers_ && n_revalidators_); > + uint32_t n_handlers_requested; > + uint32_t n_revalidators_requested; > + bool forced = false; > + > + if (dpif_number_handlers_required(udpif->dpif, &n_handlers_requested)) { > + forced = true; > + if (!n_revalidators_) { > + n_revalidators_requested = n_handlers_requested / 4 + 1; > + } else { > + n_revalidators_requested = n_revalidators_; > + } > + } else { > + int threads = MAX(count_cpu_cores(), 2); > + > + n_revalidators_requested = MAX(n_revalidators_, 0); > + n_handlers_requested = MAX(n_handlers_, 0); > > - if (udpif->n_handlers != n_handlers_ > - || udpif->n_revalidators != n_revalidators_) { > + if (!n_revalidators_requested) { > + n_revalidators_requested = n_handlers_requested > + ? MAX(threads - (int) n_handlers_requested, 1) > + : threads / 4 + 1; > + } > + > + if (!n_handlers_requested) { > + n_handlers_requested = MAX(threads - > + (int) n_revalidators_requested, 1); > + } > + } > + > + if (udpif->n_handlers != n_handlers_requested > + || udpif->n_revalidators != n_revalidators_requested) { > + if (forced) { > + VLOG_INFO("Overriding n-handler-threads to %u, setting " > + "n-revalidator-threads to %u", n_handlers_requested, > + n_revalidators_requested); > + } else { > + VLOG_INFO("Setting n-handler-threads to %u, setting " > + "n-revalidator-threads to %u", n_handlers_requested, > + n_revalidators_requested); > + } > udpif_stop_threads(udpif, true); > } > > if (!udpif->handlers && !udpif->revalidators) { > + VLOG_INFO("Starting %u threads", n_handlers_requested + > + n_revalidators_requested); > int error; > - > - error = dpif_handlers_set(udpif->dpif, n_handlers_); > + error = dpif_handlers_set(udpif->dpif, n_handlers_requested); > if (error) { > VLOG_ERR("failed to configure handlers in dpif %s: %s", > dpif_name(udpif->dpif), ovs_strerror(error)); > return; > } > - > - udpif_start_threads(udpif, n_handlers_, n_revalidators_); > + udpif_start_threads(udpif, n_handlers_requested, > + n_revalidators_requested); > } > } > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > index 53002f082b52..bd6103b1c88c 100644 > --- a/ofproto/ofproto.c > +++ b/ofproto/ofproto.c > @@ -792,20 +792,8 @@ ofproto_type_set_config(const char *datapath_type, const > struct smap *cfg) > void > ofproto_set_threads(int n_handlers_, int n_revalidators_) > { > - int threads = MAX(count_cpu_cores(), 2); > - > n_revalidators = MAX(n_revalidators_, 0); > n_handlers = MAX(n_handlers_, 0); > - > - if (!n_revalidators) { > - n_revalidators = n_handlers > - ? MAX(threads - (int) n_handlers, 1) > - : threads / 4 + 1; > - } > - > - if (!n_handlers) { > - n_handlers = MAX(threads - (int) n_revalidators, 1); > - } > } > > void > diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in > index 50dad7208e85..dbba54902083 100644 > --- a/vswitchd/ovs-vswitchd.8.in > +++ b/vswitchd/ovs-vswitchd.8.in > @@ -274,6 +274,7 @@ type). > . > .so lib/dpdk-unixctl.man > .so lib/dpif-netdev-unixctl.man > +.so lib/dpif-netlink-unixctl.man > .so lib/netdev-dpdk-unixctl.man > .so ofproto/ofproto-dpif-unixctl.man > .so ofproto/ofproto-unixctl.man > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 3522b2497fc6..cda4da119c42 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -544,9 +544,9 @@ > <column name="other_config" key="n-handler-threads" > type='{"type": "integer", "minInteger": 1}'> > <p> > - Specifies the number of threads for software datapaths to use for > - handling new flows. The default the number of online CPU cores > minus > - the number of revalidators. > + Attempts to specify the number of threads for software datapaths to > + use for handling new flows. Some datapaths may choose to ignore > + this and it will be set to a sensible option for the datapath type. > </p> > <p> > This configuration is per datapath. If you have more than one > @@ -560,12 +560,17 @@ > <column name="other_config" key="n-revalidator-threads" > type='{"type": "integer", "minInteger": 1}'> > <p> > - Specifies the number of threads for software datapaths to use for > - revalidating flows in the datapath. Typically, there is a direct > - correlation between the number of revalidator threads, and the > number > - of flows allowed in the datapath. The default is the number of cpu > - cores divided by four plus one. If <code>n-handler-threads</code> > is > - set, the default changes to the number of cpu cores minus the > number > + Attempts to specify the number of threads for software datapaths to > + use for revalidating flows in the datapath. Some datapaths may > + choose to ignore this and will set to a sensible option for the > + datapath type. > + </p> > + <p> > + Typically, there is a direct correlation between the > + number of revalidator threads, and the number of flows allowed in > the > + datapath. The default is the number of cpu cores divided by four > + plus one. If <code>n-handler-threads</code> is set, the default > + changes to the number of cpu cores minus the number > of handler threads. > </p> > <p> > -- > 2.27.0 > -- fbl _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev