From: Anurag Agarwal <anurag.agar...@ericsson.com> Today dpif-netdev considers PMD threads on a non-local NUMA node for automatic assignment of the rxqs of a port only if there are no local, non-isolated PMDs.
On typical servers with both physical ports on one NUMA node, this often leaves the PMDs on the other NUMA node under-utilized, wasting CPU resources. The alternative, to manually pin the rxqs to PMDs on remote NUMA nodes, also has drawbacks as it limits OVS' ability to auto load-balance the rxqs. This patch introduces a new interface configuration option to allow ports to be automatically polled by PMDs on any NUMA node: ovs-vsctl set interface <Name> other_config:cross-numa-polling=true If this option is not present or set to false, legacy behaviour applies. Signed-off-by: Anurag Agarwal <anurag.agar...@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> Signed-off-by: Rudra Surya Bhaskara Rao <rudrasury...@acldigital.com> --- Documentation/topics/dpdk/pmd.rst | 28 ++++++++++++++++++++++++++-- lib/dpif-netdev.c | 35 +++++++++++++++++++++++++---------- tests/pmd.at | 30 ++++++++++++++++++++++++++++++ vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 4 files changed, 101 insertions(+), 12 deletions(-) diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst index d63750e..abe1cda 100644 --- a/Documentation/topics/dpdk/pmd.rst +++ b/Documentation/topics/dpdk/pmd.rst @@ -78,8 +78,27 @@ To show port/Rx queue assignment:: $ ovs-appctl dpif-netdev/pmd-rxq-show -Rx queues may be manually pinned to cores. This will change the default Rx -queue assignment to PMD threads:: +Normally, Rx queues are assigned to PMD threads automatically. By default +OVS only assigns Rx queues to PMD threads executing on the same NUMA +node in order to avoid unnecessary latency for accessing packet buffers +across the NUMA boundary. Typically this overhead is higher for vhostuser +ports than for physical ports due to the packet copy that is done for all +rx packets. + +On NUMA servers with physical ports only on one NUMA node, the NUMA-local +polling policy can lead to an under-utilization of the PMD threads on the +remote NUMA node. For the overall OVS performance it may in such cases be +beneficial to utilize the spare capacity and allow polling of a physical +port's rxqs across NUMA nodes despite the overhead involved. +The policy can be set per port with the following configuration option:: + + $ ovs-vsctl set Interface <iface> \ + other_config:cross-numa-polling=true|false + +The default value is false. + +Rx queues may also be manually pinned to cores. This will change the default +Rx queue assignment to PMD threads:: $ ovs-vsctl set Interface <iface> \ other_config:pmd-rxq-affinity=<rxq-affinity-list> @@ -194,6 +213,11 @@ or can be triggered by using:: Rx queue utilization of the PMD as a percentage. Prior to this, tracking of stats was not available. +.. versionchanged:: 2.15.0 + + Added the interface parameter ``other_config:cross-numa-polling`` and the + ``no-isol`` option for ``pmd-rxq-affinity``. + Automatic assignment of Port/Rx Queue to PMD Threads (experimental) ------------------------------------------------------------------- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 7d9078f..6b9a151 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -478,6 +478,7 @@ struct dp_netdev_port { bool emc_enabled; /* If true EMC will be used. */ char *type; /* Port type as requested by user. */ char *rxq_affinity_list; /* Requested affinity of rx queues. */ + bool cross_numa_polling; /* If true cross polling will be enabled */ }; /* Contained by struct dp_netdev_flow's 'stats' member. */ @@ -4548,6 +4549,7 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no, int error = 0; const char *affinity_list = smap_get(cfg, "pmd-rxq-affinity"); bool emc_enabled = smap_get_bool(cfg, "emc-enable", true); + bool cross_numa_polling = smap_get_bool(cfg, "cross-numa-polling", false); ovs_mutex_lock(&dp->port_mutex); error = get_port_by_number(dp, port_no, &port); @@ -4555,6 +4557,11 @@ dpif_netdev_port_set_config(struct dpif *dpif, odp_port_t port_no, goto unlock; } + if (cross_numa_polling != port->cross_numa_polling) { + port->cross_numa_polling = cross_numa_polling; + dp_netdev_request_reconfigure(dp); + } + if (emc_enabled != port->emc_enabled) { struct dp_netdev_pmd_thread *pmd; struct ds ds = DS_EMPTY_INITIALIZER; @@ -5173,8 +5180,8 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run) struct dp_netdev_port *port; struct dp_netdev_rxq ** rxqs = NULL; struct rr_numa_list rr; - struct rr_numa *numa = NULL; - struct rr_numa *non_local_numa = NULL; + struct rr_numa *local_numa = NULL; + struct rr_numa *next_numa = NULL; int n_rxqs = 0; int numa_id; bool assign_cyc = dp->pmd_rxq_assign_cyc; @@ -5214,12 +5221,20 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run) numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); cycles = dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_HIST); - numa = rr_numa_list_lookup(&rr, numa_id); - if (!numa) { - /* There are no pmds on the queue's local NUMA node. - Round robin on the NUMA nodes that do have pmds. */ - non_local_numa = rr_numa_list_next(&rr, non_local_numa); - if (!non_local_numa) { + if (!(rxqs[i]->port->cross_numa_polling)) { + /* Try to find a local pmd. */ + local_numa = rr_numa_list_lookup(&rr, numa_id); + } else { + /* Allow polling by any pmd. */ + local_numa = NULL; + } + + if (!local_numa) { + /* Port configured for cross-NUMA polling or there are no pmds + * on the queue's local NUMA node. + * Round robin on the NUMA nodes that do have pmds. */ + next_numa = rr_numa_list_next(&rr, next_numa); + if (!next_numa) { if (!dry_run) { VLOG_ERR("There is no available (non-isolated) pmd " "thread for port \'%s\' queue %d. This queue " @@ -5231,7 +5246,7 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run) continue; } rxqs[i]->pmd = - rr_numa_assign_least_loaded_pmd(non_local_numa, cycles); + rr_numa_assign_least_loaded_pmd(next_numa, cycles); if (!dry_run) { VLOG_WARN("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " @@ -5242,7 +5257,7 @@ rxq_scheduling(struct dp_netdev *dp, bool dry_run) rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); } } else { - rxqs[i]->pmd = rr_numa_assign_least_loaded_pmd(numa, cycles); + rxqs[i]->pmd = rr_numa_assign_least_loaded_pmd(local_numa, cycles); if (!dry_run) { if (assign_cyc) { VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " diff --git a/tests/pmd.at b/tests/pmd.at index 57b5fb8..263c722 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -357,6 +357,36 @@ icmp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10 OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([PMD - Enable cross numa polling]) +OVS_VSWITCHD_START( + [add-port br0 p1 -- set Interface p1 type=dummy-pmd ofport_request=1 options:n_rxq=4 -- \ + set Open_vSwitch . other_config:pmd-cpu-mask=3 +], [], [], [--dummy-numa 0,1]) + +AT_CHECK([ovs-ofctl add-flow br0 action=controller]) + +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl +0 +]) + +dnl Enable cross numa polling and check numa ids +AT_CHECK([ovs-vsctl set Interface p1 other_config:cross-numa-polling=true]) + +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl +0 +1 +]) + +dnl Disable cross numa polling and check numa ids +AT_CHECK([ovs-vsctl set Interface p1 other_config:cross-numa-polling=false]) + +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show | cut -f 3 -d ' ' | sort | uniq], [0], [dnl +0 +]) + +OVS_VSWITCHD_STOP(["/|WARN|/d"]) +AT_CLEANUP + AT_SETUP([PMD - change numa node]) OVS_VSWITCHD_START( [add-port br0 p1 -- set Interface p1 type=dummy-pmd ofport_request=1 options:n_rxq=2 -- \ diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 97bbb11..7fa7146 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -3252,6 +3252,26 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ </p> </column> + <column name="other_config" key="cross-numa-polling" + type='{"type": "boolean"}'> + <p> + Specifies if the RX queues of the port can be automatically assigned + to PMD threads on any NUMA node or only on the local NUMA node of + the port. + </p> + <p> + Polling of physical ports from a non-local PMD thread incurs some + performance penalty due to the access to packet data across the NUMA + barrier. This option can still increase the overall performance if + it allows better utilization of those non-local PMDs threads. + It is most useful together with the auto load-balancing of RX queues + (see other_config:auto_lb in table Open_vSwitch). + </p> + <p> + Defaults to false. + </p> + </column> + <column name="options" key="xdp-mode" type='{"type": "string", "enum": ["set", ["best-effort", "native-with-zerocopy", -- 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev