Hi Ian, Sorry for delay responding. Some comment below.
> -----Original Message----- > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > boun...@openvswitch.org] On Behalf Of O Mahony, Billy > Sent: Tuesday, April 25, 2017 11:15 AM > To: Stokes, Ian <ian.sto...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non- > local numa node. > > Thanks Ian, > > I'll incorporate these suggestions in a new patch version. > > Cheers, > Billy. > > > -----Original Message----- > > From: Stokes, Ian > > Sent: Monday, April 24, 2017 5:33 PM > > To: O Mahony, Billy <billy.o.mah...@intel.com>; d...@openvswitch.org > > Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on > > non- local numa node. > > > > > Previously if there is no available (non-isolated) pmd on the numa > > > node for a port then the port is not polled at all. This can result > > > in a non- operational system until such time as nics are physically > > > repositioned. It is preferable to operate with a pmd on the 'wrong' > > > numa node albeit with lower performance. Local pmds are still chosen > > when available. > > > > > > Signed-off-by: Billy O'Mahony <billy.o.mah...@intel.com> > > > > Thanks for the patch Billy, few comments below. > > > > > --- > > > Documentation/intro/install/dpdk.rst | 10 ++++++++++ > > > lib/dpif-netdev.c | 36 > > > ++++++++++++++++++++++++++++++++---- > > > 2 files changed, 42 insertions(+), 4 deletions(-) > > > > > > diff --git a/Documentation/intro/install/dpdk.rst > > > b/Documentation/intro/install/dpdk.rst > > > index b947bd5..0e0b9f0 100644 > > > --- a/Documentation/intro/install/dpdk.rst > > > +++ b/Documentation/intro/install/dpdk.rst > > > @@ -450,6 +450,16 @@ affinitized accordingly. > > > pmd thread on a NUMA node is only created if there is at least > > > one DPDK > > > interface from that NUMA node added to OVS. > > > > > > + .. note:: > > > + On NUMA systems PCI devices are also local to a NUMA node. Rx > > > + queues > > > for > > > + PCI device will assigned to a pmd on it's local NUMA node if > > > + pmd-cpu- > > > mask > > > + has created a pmd thread on that NUMA node. If not the queue will > be > > > + assigned to a pmd on a remote NUMA node. This will result in reduced > > > + maximum throughput on that device. In the case such a queue > > > assingment > > > > Typo in assignment. [[BO'M]] Thanks. > > > > > + is made a warning message will be logged: "There's no available > > > + (non isolated) pmd thread on numa node N. Queue Q on port P will > > > + be > > > assigned > > > + to a pmd on numa node M. Expect reduced performance." > > > + > > > - QEMU vCPU thread Affinity > > > > > > A VM performing simple packet forwarding or running complex > > > packet pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > > index a14a2eb..c6570ba 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, > > > int > > > numa_id) } > > > > > > static void > > > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list > > > *rr) > > > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr, > > > + int *all_numa_ids, unsigned all_numa_ids_sz, > > > + int *num_ids_written) > > > { > > > struct dp_netdev_pmd_thread *pmd; > > > struct rr_numa *numa; > > > + unsigned idx = 0; > > > > > > hmap_init(&rr->numas); > > > > > > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, > > > struct rr_numa_list *rr) > > > numa->n_pmds++; > > > numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof > > > *numa- > > > >pmds); > > > numa->pmds[numa->n_pmds - 1] = pmd; > > > + > > > + all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id; > > > + idx++; > > > } > > > + *num_ids_written = idx; > > > } > > > > > > static struct dp_netdev_pmd_thread * @@ -3202,8 +3209,15 @@ > > > rxq_scheduling(struct dp_netdev *dp, bool > > > pinned) > > > OVS_REQUIRES(dp->port_mutex) { > > > struct dp_netdev_port *port; > > > struct rr_numa_list rr; > > > + int all_numa_ids [64]; > > > + int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0]; > > > + unsigned all_numa_ids_idx = 0; > > > + int all_numa_ids_max_idx = 0; > > > + int num_numa_ids = 0; > > > > > > - rr_numa_list_populate(dp, &rr); > > > + rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz, > > > + &num_numa_ids); > > > + all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - > > > + 1); > > > > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > > struct rr_numa *numa; > > > @@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool > > > pinned) > > > OVS_REQUIRES(dp->port_mutex) > > > } > > > } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { > > > if (!numa) { > > > + if (all_numa_ids_max_idx < 0) { > > > + VLOG_ERR("There are no pmd threads. " > > > + "Is pmd-cpu-mask set to zero?"); > > > > The VLOG_ERR is a bit general and misleading. A user could set a PMD > > on a numa node, add a DPDK device local to that node, isolate the PMD > > for that DPDK port, then add a second DPDK port that is local to the > > same numa node. This will trigger the ERR above. > > > > Strictly speaking there is a PMD thread set on the numa node but it > > has been isolated. I would change the message above to include > > isolation. I would also add the name of the port searching for a PMD > > to the message so it is clear which port could not find a thread. > > Something like below > > > > if (all_numa_ids_max_idx < 0) { > > VLOG_ERR("There are no available pmd threads for %s. " > > "Is pmd-cpu-mask set to zero or isolated?", > > netdev_get_name(port->netdev)); > > [[BO'M]] I'll change this in v3. > > > + continue; > > > + } > > > VLOG_WARN("There's no available (non isolated) > > > pmd thread " > > > "on numa node %d. Queue %d on port \'%s\' > > > will " > > > - "not be polled.", > > > - numa_id, qid, netdev_get_name(port- > > > >netdev)); > > > + "be assigned to a pmd on numa node %d. > > > Expect " > > > + "reduced performance.", > > > + numa_id, qid, netdev_get_name(port- > > > >netdev), > > > + all_numa_ids[all_numa_ids_idx]); > > > > The warning above will identify queue 0 will be assigned to a pmd on > > the other numa node. > > However it's not just queue 0 that will be assigned but all the > > unpinned queues associated with the port (see 'q->pmd = > rr_numa_get_pmd(numa)' > > below). > > > > I'm thinking the warning should reflect this so users don't think only > > queue 0 has been assigned to the non local numa pmd. [[BO'M]] AFAIK (though I didn't test multiqueue explicitly) pmd assignment is done on a per queue basis as opposed to a per port basis so if there are two queues for dpdk0 say then this error should be written twice. Once for queue 0 and once for queue 1. (Assuming that there is no PMD for dpdk0 on the local NUMA node). I'll verify this is the behavior of the path when using multiple queues and revert back to the ML. > > > > > + numa_id = all_numa_ids[all_numa_ids_idx]; > > > + numa = rr_numa_list_lookup(&rr, numa_id); > > > + q->pmd = rr_numa_get_pmd(numa); > > > + all_numa_ids_idx++; > > > + if (all_numa_ids_idx > all_numa_ids_max_idx) { > > > + all_numa_ids_idx = 0; > > > + } > > > } else { > > > q->pmd = rr_numa_get_pmd(numa); > > > } > > > -- > > > 2.7.4 > > > > > > _______________________________________________ > > > dev mailing list > > > d...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev