> > -----Original Message----- > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev- > > boun...@openvswitch.org] On Behalf Of Kevin Traynor > > Sent: Friday, May 5, 2017 5:34 PM > > To: d...@openvswitch.org > > Subject: [ovs-dev] [RFC PATCH 6/6] dpif-netdev: Change rxq_scheduling > > to use rxq processing cycles. > > > > Previously rxqs were distributed to pmds by round robin in port/queue > > order. > > > > Now that we have the processing cycles used for existing rxqs, use > > that information to try and produced a better balanced distribution of > > rxqs across pmds. i.e. given multiple pmds, the rxqs which have > > consumed the largest amount of processing cycles will be placed on > different pmds. > > > > The rxqs are sorted by their processing cycles and distributed (in > > sorted > > order) round robin across pmds. > > > > Signed-off-by: Kevin Traynor <ktray...@redhat.com> > > --- > > lib/dpif-netdev.c | 67 > > +++++++++++++++++++++++++++++++++++++++++++------- > > ----- > > 1 file changed, 53 insertions(+), 14 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > > 3131255..89f7d44 > > 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3259,8 +3259,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) > > } > > > > +/* Sort Rx Queues by the processing cycles they are consuming. */ > > +static int rxq_cycle_sort(const void *a, const void *b) { > > + const struct dp_netdev_rxq * qa; > > + const struct dp_netdev_rxq * qb; > > + > > + qa = *(struct dp_netdev_rxq **) a; > > + qb = *(struct dp_netdev_rxq **) b; > > + > > + if (dp_netdev_rxq_get_cyc_last(qa) >= > > + dp_netdev_rxq_get_cyc_last(qb)) { > > + return -1; > > + } > > + > > + return 1; > > +} > > + > > /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned > > * queues and marks the pmds as isolated. Otherwise, assign non > isolated > > * pmds to unpinned queues. > > * > > + * If 'pinned' is false queues will be sorted by processing cycles > > + they are > > + * consuming and then assigned to pmds in round robin order. > > + * > > * The function doesn't touch the pmd threads, it just stores the > > assignment > > * in the 'pmd' member of each rxq. */ @@ -3270,18 +3291,14 @@ > > rxq_scheduling(struct dp_netdev *dp, bool pinned) > > OVS_REQUIRES(dp->port_mutex) > > struct dp_netdev_port *port; > > struct rr_numa_list rr; > > - > > - rr_numa_list_populate(dp, &rr); > > + struct dp_netdev_rxq ** rxqs = NULL; > > + int i, n_rxqs = 0; > > + struct rr_numa *numa = NULL; > > + int numa_id; > > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > - struct rr_numa *numa; > > - int numa_id; > > - > > if (!netdev_is_pmd(port->netdev)) { > > continue; > > } > > > > - numa_id = netdev_get_numa_id(port->netdev); > > - numa = rr_numa_list_lookup(&rr, numa_id); > > - > > for (int qid = 0; qid < port->n_rxq; qid++) { > > struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3301,17 > > +3318,39 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) > > OVS_REQUIRES(dp->port_mutex) > > } > > } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { > > - if (!numa) { > > - VLOG_WARN("There's no available (non isolated) pmd > > thread " > > - "on numa node %d. Queue %d on port \'%s\' > > will " > > - "not be polled.", > > - numa_id, qid, netdev_get_name(port- > > >netdev)); > > + if (n_rxqs == 0) { > > + rxqs = xmalloc(sizeof *rxqs); > > } else { > > - q->pmd = rr_numa_get_pmd(numa); > > + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + > > + 1)); > > } > > + /* Store the queue. */ > > + rxqs[n_rxqs++] = q; > > } > > } > > } > > > > + if (n_rxqs > 1) { > > + /* Sort the queues in order of the processing cycles > > + * they consumed during their last pmd interval. */ > > + qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort); > > + } > > + > > + rr_numa_list_populate(dp, &rr); > > + /* Assign the sorted queues to pmds in round robin. */ > > + for (i = 0; i < n_rxqs; i++) { > > + numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); > > + numa = rr_numa_list_lookup(&rr, numa_id); > > + if (!numa) { > > + VLOG_WARN("There's no available (non isolated) pmd thread " > > + "on numa node %d. Queue %d on port \'%s\' will " > > + "not be polled.", > > + numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx), > > + netdev_get_name(rxqs[i]->port->netdev)); > > + continue; > > + } > > + rxqs[i]->pmd = rr_numa_get_pmd(numa); > > + } > > + > I found this worked as expected during testing, one issue I noticed though > was when the wraparound for assigning pmds to queues occurred. > > For example assume you have 4 queues (q0,q1,q2,q3) and 3 pmds (pmd1, pmd2, > pmd3). > > Assuming the queues are sorted in terms of the highest processing cycles > as (q0,q1,q2,q3) they would be assigned PMDs as follows > > q0 -> pmd1 > q1 -> pmd2 > q2 -> pmd3 > q4 -> pmd1
Apologies for the typo, this should be 'q3' instead of 'q4' here and below. > > At this stage a wraparound occurs and pmd1 is assigned to q4. But now you > have our highest processing cycles queue on the same pmd as a queue with > lower processing cycles. > > I think it would be good to try and avoid this situation although it would > need a change to the pmd assignment logic. > > How about when you have reached the last pmd assignment, instead of > starting at the beginning again you start at the last assigned pmd and > work backwards? > > So the assignments in this case would look like > > q0 -> pmd1 > q1 -> pmd2 > q2 -> pmd3 > q4 -> pmd3 > > This would help to avoid/minimize the chances of assigning multiple queues > to a pmd that already has a queue with a high processing cycle. > > > rr_numa_list_destroy(&rr); > > + free(rxqs); > > } > > > > -- > > 1.8.3.1 > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev