On 12/18/17, 2:41 PM, "Jan Scheurich" <jan.scheur...@ericsson.com> wrote:
Hi Aaron, Thanks for the review. Answers in-line. Regards, Jan > -----Original Message----- > From: Aaron Conole [mailto:acon...@redhat.com] > Sent: Monday, 18 December, 2017 19:41 > To: Jan Scheurich <jan.scheur...@ericsson.com> > Cc: d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v4 1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf > > Hi Jan, > > Jan Scheurich <jan.scheur...@ericsson.com> writes: > > > Add module dpif-netdev-perf to host all PMD performance-related > > data structures and functions in dpif-netdev. Refactor the PMD > > stats handling in dpif-netdev and delegate whatever possible into > > the new module, using clean interfaces to shield dpif-netdev from > > the implementation details. Accordingly, the all PMD statistics > > members are moved from the main struct dp_netdev_pmd_thread into > > a dedicated member of type struct pmd_perf_stats. > > > > Include Darrel's prior refactoring of PMD stats contained in > > [PATCH v5,2/3] dpif-netdev: Refactor some pmd stats: > > > > 1. The cycles per packet counts are now based on packets > > received rather than packet passes through the datapath. > > > > 2. Packet counters are now kept for packets received and > > packets recirculated. These are kept as separate counters for > > maintainability reasons. The cost of incrementing these counters > > is negligible. These new counters are also displayed to the user. > > > > 3. A display statistic is added for the average number of > > datapath passes per packet. This should be useful for user > > debugging and understanding of packet processing. > > > > 4. The user visible 'miss' counter is used for successful upcalls, > > rather than the sum of sucessful and unsuccessful upcalls. Hence, > > this becomes what user historically understands by OVS 'miss upcall'. > > The user display is annotated to make this clear as well. > > > > 5. The user visible 'lost' counter remains as failed upcalls, but > > is annotated to make it clear what the meaning is. > > > > 6. The enum pmd_stat_type is annotated to make the usage of the > > stats counters clear. > > > > 7. The subtable lookup stats is renamed to make it clear that it > > relates to masked lookups. > > > > 8. The PMD stats test is updated to handle the new user stats of > > packets received, packets recirculated and average number of datapath > > passes per packet. > > > > On top of that introduce a "-pmd <core>" option to the PMD info > > commands to filter the output for a single PMD. > > > > Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> > > Co-authored-by: Darrell Ball <dlu...@gmail.com> > > Signed-off-by: Darrell Ball <dlu...@gmail.com> > > --- > > I'm really happy to see this series making progress. I think it's > extremely useful. > > Some comments inline. > > > lib/automake.mk | 2 + > > lib/dpif-netdev-perf.c | 66 ++++++++++ > > lib/dpif-netdev-perf.h | 123 ++++++++++++++++++ > > lib/dpif-netdev.c | 330 ++++++++++++++++++++----------------------------- > > tests/pmd.at | 22 ++-- > > 5 files changed, 339 insertions(+), 204 deletions(-) > > create mode 100644 lib/dpif-netdev-perf.c > > create mode 100644 lib/dpif-netdev-perf.h > > > > diff --git a/lib/automake.mk b/lib/automake.mk > > index effe5b5..0b8df0e 100644 > > --- a/lib/automake.mk > > +++ b/lib/automake.mk > > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \ > > lib/dpdk.h \ > > lib/dpif-netdev.c \ > > lib/dpif-netdev.h \ > > + lib/dpif-netdev-perf.c \ > > + lib/dpif-netdev-perf.h \ > > lib/dpif-provider.h \ > > lib/dpif.c \ > > lib/dpif.h \ > > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c > > new file mode 100644 > > index 0000000..b198fd3 > > --- /dev/null > > +++ b/lib/dpif-netdev-perf.c > > @@ -0,0 +1,66 @@ > > +/* > > + * Copyright (c) 2017 Ericsson AB. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=dGZmbKhBG9tJHY4odedsGA&m=QBSK6lo6kCkaTQZasac31Kh8ObSufTKYj5CCoTMDA8w&s=r0yS7-nx0OmVHTnfrqFfhk0vQLgePifRH8jO7uhdGuU&e= > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#include <config.h> > > + > > +#ifdef DPDK_NETDEV > > +#include <rte_cycles.h> > > +#endif > > + > > +#include "openvswitch/dynamic-string.h" > > +#include "openvswitch/vlog.h" > > +#include "dpif-netdev-perf.h" > > +#include "timeval.h" > > + > > +VLOG_DEFINE_THIS_MODULE(pmd_perf); > > + > > +void > > +pmd_perf_stats_init(struct pmd_perf_stats *s) { > > + memset(s, 0 , sizeof(*s)); > > + s->start_ms = time_msec(); > > +} > > + > > +void > > +pmd_perf_read_counters(struct pmd_perf_stats *s, > > + uint64_t stats[PMD_N_STATS]) > > +{ > > + uint64_t val; > > + > > + /* These loops subtracts reference values ('*_zero') from the counters. > > + * Since loads and stores are relaxed, it might be possible for a '*_zero' > > Kind of a nit, but I think that should read .zero[*] or ->zero[*]. No > big deal one way or the other. Yes, the comment is a leftover. Will fix. > > > + * value to be more recent than the current value we're reading from the > > + * counter. This is not a big problem, since these numbers are not > > + * supposed to be too accurate, but we should at least make sure that > > + * the result is not negative. */ > > + for (int i = 0; i < PMD_N_STATS; i++) { > > + atomic_read_relaxed(&s->counters.n[i], &val); > > + if (val > s->counters.zero[i]) { > > + stats[i] = val - s->counters.zero[i]; > > + } else { > > + stats[i] = 0; > > + } > > + } > > +} > > + > > +void > > +pmd_perf_stats_clear(struct pmd_perf_stats *s) > > +{ > > + s->start_ms = 0; /* Marks start of clearing. */ > > + for (int i = 0; i < PMD_N_STATS; i++) { > > + atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); > > + } > > + s->start_ms = time_msec(); /* Clearing finished. */ > > +} > > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > > new file mode 100644 > > index 0000000..f55f7a2 > > --- /dev/null > > +++ b/lib/dpif-netdev-perf.h > > @@ -0,0 +1,123 @@ > > +/* > > + * Copyright (c) 2017 Ericsson AB. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=dGZmbKhBG9tJHY4odedsGA&m=QBSK6lo6kCkaTQZasac31Kh8ObSufTKYj5CCoTMDA8w&s=r0yS7-nx0OmVHTnfrqFfhk0vQLgePifRH8jO7uhdGuU&e= > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#ifndef PMD_PERF_H > > +#define PMD_PERF_H 1 > > + > > +#include <stdbool.h> > > +#include <stddef.h> > > +#include <stdint.h> > > +#include <string.h> > > +#include <math.h> > > + > > +#include "openvswitch/vlog.h" > > +#include "ovs-atomic.h" > > +#include "timeval.h" > > +#include "unixctl.h" > > +#include "util.h" > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +enum pmd_stat_type { > > + PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > > + PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > > + PMD_STAT_MISS, /* Packets that did not match and upcall was ok. */ > > + PMD_STAT_LOST, /* Packets that did not match and upcall failed. */ > > + /* The above statistics account for the total > > + * number of packet passes through the datapath > > + * pipeline and should not be overlapping with each > > + * other. */ > > + PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table > > + hits. Each MASKED_HIT hit will have >= 1 > > + MASKED_LOOKUP(s). */ > > + PMD_STAT_RECV, /* Packets entering the datapath pipeline from an > > + * interface. */ > > + PMD_STAT_RECIRC, /* Packets reentering the datapath pipeline due to > > + * recirculation. */ > > + PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */ > > + PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and > > + * processing polled packets. */ > > + PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */ > > + PMD_CYCLES_UPCALL, /* Cycles spent in upcalls. Included in > > + PMD_CYCLES_POLL_BUSY. */ > > + PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ > > + PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ > > + PMD_N_STATS > > +}; > > + > > +struct pmd_counters { > > + atomic_uint64_t n[PMD_N_STATS]; /* Indexed by PMD_STAT_*. */ > > + uint64_t zero[PMD_N_STATS]; > > +}; > > + > > +struct pmd_perf_stats { > > + uint64_t start_ms; > > + uint64_t last_tsc; > > + struct pmd_counters counters; > > +}; > > + > > +enum pmd_info_type; > > +struct dp_netdev_pmd_thread; > > + > > +struct pmd_perf_params { > > + int command_type; > > + bool histograms; > > + size_t iter_hist_len; > > + size_t ms_hist_len; > > +}; > > + > > +void pmd_perf_stats_init(struct pmd_perf_stats *s); > > +void pmd_perf_stats_clear(struct pmd_perf_stats *s); > > +void pmd_perf_read_counters(struct pmd_perf_stats *s, > > + uint64_t stats[PMD_N_STATS]); > > + > > +static inline void > > +pmd_perf_update_counter(struct pmd_perf_stats *s, > > + enum pmd_stat_type counter, int delta) > > +{ > > + uint64_t tmp; > > + atomic_read_relaxed(&s->counters.n[counter], &tmp); > > + tmp += delta; > > + atomic_store_relaxed(&s->counters.n[counter], tmp); > > +} > > + > > +static inline void > > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) > > +{ > > + if (OVS_UNLIKELY(s->start_ms == 0)) { > > + /* Stats not yet fully initialised. */ > > + return; > > + } > > + s->last_tsc = now_tsc; > > +} > > + > > +static inline void > > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc, > > + int packets) > > +{ > > + uint64_t cycles = now_tsc - s->last_tsc; > > + > > + /* No need for atomic updates as only invoked within PMD thread. */ > > + if (packets > 0) { > > + s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles; > > + } else { > > + s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles; > > + } > > +} > > + > > This is missing a closing block for the cplusplus block above. Sure, ended up wrongly in the patch 2/3. > > > +#endif /* pmd-perf.h */ > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 55be632..6ae1a3f 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -43,6 +43,7 @@ > > #include "csum.h" > > #include "dp-packet.h" > > #include "dpif.h" > > +#include "dpif-netdev-perf.h" > > #include "dpif-provider.h" > > #include "dummy.h" > > #include "fat-rwlock.h" > > @@ -330,23 +331,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, > > odp_port_t) > > OVS_REQUIRES(dp->port_mutex); > > > > -enum dp_stat_type { > > - DP_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ > > - DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */ > > - DP_STAT_MISS, /* Packets that did not match. */ > > - DP_STAT_LOST, /* Packets not passed up to the client. */ > > - DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table > > - hits */ > > - DP_N_STATS > > -}; > > - > > -enum pmd_cycles_counter_type { > > - PMD_CYCLES_IDLE, /* Cycles spent idle or unsuccessful polling */ > > - PMD_CYCLES_PROCESSING, /* Cycles spent successfully polling and > > - * processing polled packets */ > > - PMD_N_CYCLES > > -}; > > - > > enum rxq_cycles_counter_type { > > RXQ_CYCLES_PROC_CURR, /* Cycles spent successfully polling and > > processing packets during the current > > @@ -496,18 +480,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions( > > const struct dp_netdev_flow *); > > static void dp_netdev_actions_free(struct dp_netdev_actions *); > > > > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member. */ > > -struct dp_netdev_pmd_stats { > > - /* Indexed by DP_STAT_*. */ > > - atomic_ullong n[DP_N_STATS]; > > -}; > > - > > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member. */ > > -struct dp_netdev_pmd_cycles { > > - /* Indexed by PMD_CYCLES_*. */ > > - atomic_ullong n[PMD_N_CYCLES]; > > -}; > > - > > struct polled_queue { > > struct dp_netdev_rxq *rxq; > > odp_port_t port_no; > > @@ -577,12 +549,6 @@ struct dp_netdev_pmd_thread { > > are stored for each polled rxq. */ > > long long int rxq_next_cycle_store; > > > > - /* Statistics. */ > > - struct dp_netdev_pmd_stats stats; > > - > > - /* Cycles counters */ > > - struct dp_netdev_pmd_cycles cycles; > > - > > /* Used to count cicles. See 'cycles_counter_end()' */ > > unsigned long long last_cycles; > > > > @@ -620,12 +586,8 @@ struct dp_netdev_pmd_thread { > > struct hmap tnl_port_cache; > > struct hmap send_port_cache; > > > > - /* Only a pmd thread can write on its own 'cycles' and 'stats'. > > - * The main thread keeps 'stats_zero' and 'cycles_zero' as base > > - * values and subtracts them from 'stats' and 'cycles' before > > - * reporting to the user */ > > - unsigned long long stats_zero[DP_N_STATS]; > > - uint64_t cycles_zero[PMD_N_CYCLES]; > > + /* Keep track of detailed PMD performance statistics. */ > > + struct pmd_perf_stats perf_stats; > > > > /* Set to true if the pmd thread needs to be reloaded. */ > > bool need_reload; > > @@ -791,46 +753,11 @@ enum pmd_info_type { > > }; > > > > static void > > -pmd_info_show_stats(struct ds *reply, > > - struct dp_netdev_pmd_thread *pmd, > > - unsigned long long stats[DP_N_STATS], > > - uint64_t cycles[PMD_N_CYCLES]) > > +format_pmd_thread(struct ds *reply, > > + struct dp_netdev_pmd_thread *pmd) > > Why not put this on one line? Just a nit. Leftover from earlier version. Will join the lines. > > > { > > - unsigned long long total_packets; > > - uint64_t total_cycles = 0; > > - int i; > > - > > - /* These loops subtracts reference values ('*_zero') from the counters. > > - * Since loads and stores are relaxed, it might be possible for a '*_zero' > > - * value to be more recent than the current value we're reading from the > > - * counter. This is not a big problem, since these numbers are not > > - * supposed to be too accurate, but we should at least make sure that > > - * the result is not negative. */ > > - for (i = 0; i < DP_N_STATS; i++) { > > - if (stats[i] > pmd->stats_zero[i]) { > > - stats[i] -= pmd->stats_zero[i]; > > - } else { > > - stats[i] = 0; > > - } > > - } > > - > > - /* Sum of all the matched and not matched packets gives the total. */ > > - total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT] > > - + stats[DP_STAT_MISS]; > > - > > - for (i = 0; i < PMD_N_CYCLES; i++) { > > - if (cycles[i] > pmd->cycles_zero[i]) { > > - cycles[i] -= pmd->cycles_zero[i]; > > - } else { > > - cycles[i] = 0; > > - } > > - > > - total_cycles += cycles[i]; > > - } > > - > > ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID) > > ? "main thread" : "pmd thread"); > > - > > Why was this line trimmed? Probably just my personal aesthetics. There is no particular reason why there should be a bank line here but not 6 lines further down in the same function. I have factored this code segment out into a dedicated function, and in this context I believe it may be justified to align white-space usage. > > > if (pmd->numa_id != OVS_NUMA_UNSPEC) { > > ds_put_format(reply, " numa_id %d", pmd->numa_id); > > } > > @@ -838,16 +765,39 @@ pmd_info_show_stats(struct ds *reply, > > ds_put_format(reply, " core_id %u", pmd->core_id); > > } > > ds_put_cstr(reply, ":\n"); > > +} > > + > > +static void > > +pmd_info_show_stats(struct ds *reply, > > + struct dp_netdev_pmd_thread *pmd) > > +{ > > + uint64_t stats[PMD_N_STATS]; > > + uint64_t total_cycles = 0; > > + > > + pmd_perf_read_counters(&pmd->perf_stats, stats); > > + total_cycles = stats[PMD_CYCLES_ITER_IDLE] > > + + stats[PMD_CYCLES_ITER_BUSY]; > > + > > + format_pmd_thread(reply, pmd); > > > > ds_put_format(reply, > > - "\temc hits:%llu\n\tmegaflow hits:%llu\n" > > - "\tavg. subtable lookups per hit:%.2f\n" > > - "\tmiss:%llu\n\tlost:%llu\n", > > - stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT], > > - stats[DP_STAT_MASKED_HIT] > 0 > > - ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT] > > - : 0, > > - stats[DP_STAT_MISS], stats[DP_STAT_LOST]); > > + "\tpackets received:%"PRIu64"\n" > > + "\tpacket recirculations:%"PRIu64"\n" > > + "\tavg. datapath passes per packet:%.2f\n" > > + "\temc hits:%"PRIu64"\n" > > + "\tmegaflow hits:%"PRIu64"\n" > > + "\tavg. subtable lookups per megaflow hit:%.2f\n" > > + "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n" > > + "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n", > > + stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC], > > + stats[PMD_STAT_RECV] > 0 > > + ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC])) > > + / stats[PMD_STAT_RECV] : 0, > > + stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT], > > + stats[PMD_STAT_MASKED_HIT] > 0 > > + ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP]) > > + / stats[PMD_STAT_MASKED_HIT] > > + : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]); > > > > if (total_cycles == 0) { > > return; > > @@ -856,46 +806,26 @@ pmd_info_show_stats(struct ds *reply, > > ds_put_format(reply, > > "\tidle cycles:%"PRIu64" (%.02f%%)\n" > > "\tprocessing cycles:%"PRIu64" (%.02f%%)\n", > > - cycles[PMD_CYCLES_IDLE], > > - cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100, > > - cycles[PMD_CYCLES_PROCESSING], > > - cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100); > > + stats[PMD_CYCLES_ITER_IDLE], > > + stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100, > > + stats[PMD_CYCLES_ITER_BUSY], > > + stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles * 100); > > > > + uint64_t total_packets = stats[PMD_STAT_RECV]; > > if (total_packets == 0) { > > return; > > } > > > > ds_put_format(reply, > > - "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n", > > + "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n", > > total_cycles / (double)total_packets, > > total_cycles, total_packets); > > > > ds_put_format(reply, > > "\tavg processing cycles per packet: " > > - "%.02f (%"PRIu64"/%llu)\n", > > - cycles[PMD_CYCLES_PROCESSING] / (double)total_packets, > > - cycles[PMD_CYCLES_PROCESSING], total_packets); > > -} > > - > > -static void > > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED, > > - struct dp_netdev_pmd_thread *pmd, > > - unsigned long long stats[DP_N_STATS], > > - uint64_t cycles[PMD_N_CYCLES]) > > -{ > > - int i; > > - > > - /* We cannot write 'stats' and 'cycles' (because they're written by other > > - * threads) and we shouldn't change 'stats' (because they're used to count > > - * datapath stats, which must not be cleared here). Instead, we save the > > - * current values and subtract them from the values to be displayed in the > > - * future */ > > - for (i = 0; i < DP_N_STATS; i++) { > > - pmd->stats_zero[i] = stats[i]; > > - } > > - for (i = 0; i < PMD_N_CYCLES; i++) { > > - pmd->cycles_zero[i] = cycles[i]; > > - } > > + "%.02f (%"PRIu64"/%"PRIu64")\n", > > + stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets, > > + stats[PMD_CYCLES_POLL_BUSY], total_packets); > > } > > > > static int > > @@ -1055,23 +985,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > > struct ds reply = DS_EMPTY_INITIALIZER; > > struct dp_netdev_pmd_thread **pmd_list; > > struct dp_netdev *dp = NULL; > > - size_t n; > > enum pmd_info_type type = *(enum pmd_info_type *) aux; > > + int core_id = -1; > > + size_t n; > > > > ovs_mutex_lock(&dp_netdev_mutex); > > > > - if (argc == 2) { > > - dp = shash_find_data(&dp_netdevs, argv[1]); > > - } else if (shash_count(&dp_netdevs) == 1) { > > - /* There's only one datapath */ > > - dp = shash_first(&dp_netdevs)->data; > > + while (argc > 1) { > > + if (!strcmp(argv[1], "-pmd")) { > > + core_id = strtol(argv[2], NULL, 10); > > + argc -= 2; > > + argv += 2; > > + } else { > > + dp = shash_find_data(&dp_netdevs, argv[1]); > > + argc -= 1; > > + argv += 1; > > + } > > } > > > > if (!dp) { > > - ovs_mutex_unlock(&dp_netdev_mutex); > > - unixctl_command_reply_error(conn, > > - "please specify an existing datapath"); > > - return; > > + if (shash_count(&dp_netdevs) == 1) { > > + /* There's only one datapath */ > > + dp = shash_first(&dp_netdevs)->data; > > + } else { > > + ovs_mutex_unlock(&dp_netdev_mutex); > > + unixctl_command_reply_error(conn, > > + "please specify an existing datapath"); > > + return; > > + } > > } > > > > sorted_poll_thread_list(dp, &pmd_list, &n); > > @@ -1080,26 +1021,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], > > if (!pmd) { > > break; > > } > > - > > + if (core_id != -1 && pmd->core_id != core_id) { > > + continue; > > + } > > Did I read this right? If I omit the core_id it will skip displaying > stats now? I didn't get a chance to test - maybe I misread and just > need coffee. If you don't specify "-pmd <core_id>", core_id == -1 and all PMDs are displayed as in the past. If you specify "-pmd <core_id>", core_id != -1 and all PMDs not equal to core_id are skipped. > > > if (type == PMD_INFO_SHOW_RXQ) { > > pmd_info_show_rxq(&reply, pmd); > > - } else { > > - unsigned long long stats[DP_N_STATS]; > > - uint64_t cycles[PMD_N_CYCLES]; > > - > > - /* Read current stats and cycle counters */ > > - for (size_t j = 0; j < ARRAY_SIZE(stats); j++) { > > - atomic_read_relaxed(&pmd->stats.n[j], &stats[j]); > > - } > > - for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) { > > - atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]); > > - } > > - > > - if (type == PMD_INFO_CLEAR_STATS) { > > - pmd_info_clear_stats(&reply, pmd, stats, cycles); > > - } else if (type == PMD_INFO_SHOW_STATS) { > > - pmd_info_show_stats(&reply, pmd, stats, cycles); > > - } > > + } else if (type == PMD_INFO_CLEAR_STATS) { > > + pmd_perf_stats_clear(&pmd->perf_stats); > > + } else if (type == PMD_INFO_SHOW_STATS) { > > + pmd_info_show_stats(&reply, pmd); > > } > > } > > free(pmd_list); > > @@ -1117,14 +1047,14 @@ dpif_netdev_init(void) > > clear_aux = PMD_INFO_CLEAR_STATS, > > poll_aux = PMD_INFO_SHOW_RXQ; > > > > - unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]", > > - 0, 1, dpif_netdev_pmd_info, > > + unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core | dp]", > > + 0, 2, dpif_netdev_pmd_info, > > (void *)&show_aux); > > - unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", > > - 0, 1, dpif_netdev_pmd_info, > > + unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core | dp]", > > + 0, 2, dpif_netdev_pmd_info, > > (void *)&clear_aux); > > - unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", > > - 0, 1, dpif_netdev_pmd_info, > > + unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]", > > + 0, 2, dpif_netdev_pmd_info, > > (void *)&poll_aux); > > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", > > 0, 1, dpif_netdev_pmd_rebalance, > > @@ -1460,23 +1390,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats) > > { > > struct dp_netdev *dp = get_dp_netdev(dpif); > > struct dp_netdev_pmd_thread *pmd; > > + uint64_t pmd_stats[PMD_N_STATS]; > > > > - stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0; > > + stats->n_flows = stats->n_hit = > > + stats->n_mask_hit = stats->n_missed = stats->n_lost = 0; > > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > > - unsigned long long n; > > stats->n_flows += cmap_count(&pmd->flow_table); > > - > > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n); > > - stats->n_hit += n; > > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n); > > - stats->n_hit += n; > > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n); > > - stats->n_missed += n; > > - atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); > > - stats->n_lost += n; > > + pmd_perf_read_counters(&pmd->perf_stats, pmd_stats); > > + stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT]; > > + stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > > + stats->n_missed += pmd_stats[PMD_STAT_MISS]; > > + stats->n_lost += pmd_stats[PMD_STAT_LOST]; > > } > > stats->n_masks = UINT32_MAX; > > - stats->n_mask_hit = UINT64_MAX; > > > > return 0; > > } > > @@ -3155,28 +3081,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd) > > /* Stop counting cycles and add them to the counter 'type' */ > > static inline void > > cycles_count_end(struct dp_netdev_pmd_thread *pmd, > > - enum pmd_cycles_counter_type type) > > + enum pmd_stat_type type) > > OVS_RELEASES(&cycles_counter_fake_mutex) > > OVS_NO_THREAD_SAFETY_ANALYSIS > > { > > unsigned long long interval = cycles_counter() - pmd->last_cycles; > > > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > > } > > > > /* Calculate the intermediate cycle result and add to the counter 'type' */ > > static inline void > > cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd, > > struct dp_netdev_rxq *rxq, > > - enum pmd_cycles_counter_type type) > > + enum pmd_stat_type type) > > OVS_NO_THREAD_SAFETY_ANALYSIS > > { > > unsigned long long new_cycles = cycles_counter(); > > unsigned long long interval = new_cycles - pmd->last_cycles; > > pmd->last_cycles = new_cycles; > > > > - non_atomic_ullong_add(&pmd->cycles.n[type], interval); > > - if (rxq && (type == PMD_CYCLES_PROCESSING)) { > > + pmd_perf_update_counter(&pmd->perf_stats, type, interval); > > + if (rxq && (type == PMD_CYCLES_POLL_BUSY)) { > > /* Add to the amount of current processing cycles. */ > > non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval); > > } > > @@ -3879,12 +3805,12 @@ dpif_netdev_run(struct dpif *dpif) > > port->port_no); > > cycles_count_intermediate(non_pmd, NULL, > > process_packets > > - ? PMD_CYCLES_PROCESSING > > - : PMD_CYCLES_IDLE); > > + ? PMD_CYCLES_POLL_BUSY > > + : PMD_CYCLES_POLL_IDLE); > > } > > } > > } > > - cycles_count_end(non_pmd, PMD_CYCLES_IDLE); > > + cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE); > > dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false); > > ovs_mutex_unlock(&dp->non_pmd_mutex); > > > > @@ -4028,6 +3954,7 @@ static void * > > pmd_thread_main(void *f_) > > { > > struct dp_netdev_pmd_thread *pmd = f_; > > + struct pmd_perf_stats *s = &pmd->perf_stats; > > unsigned int lc = 0; > > struct polled_queue *poll_list; > > bool exiting; > > @@ -4063,13 +3990,17 @@ reload: > > > > cycles_count_start(pmd); > > for (;;) { > > + uint64_t iter_packets = 0; > > + pmd_perf_start_iteration(s, pmd->last_cycles); > > for (i = 0; i < poll_cnt; i++) { > > process_packets = > > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx, > > poll_list[i].port_no); > > cycles_count_intermediate(pmd, poll_list[i].rxq, > > - process_packets ? PMD_CYCLES_PROCESSING > > - : PMD_CYCLES_IDLE); > > + process_packets > > + ? PMD_CYCLES_POLL_BUSY > > + : PMD_CYCLES_POLL_IDLE); > > + iter_packets += process_packets; > > } > > > > if (lc++ > 1024) { > > @@ -4087,10 +4018,12 @@ reload: > > if (reload) { > > break; > > } > > + cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD); > > } > > + pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets); > > } > > > > - cycles_count_end(pmd, PMD_CYCLES_IDLE); > > + cycles_count_end(pmd, PMD_CYCLES_OVERHEAD); > > > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > > exiting = latch_is_set(&pmd->exit_latch); > > @@ -4543,6 +4476,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > > } > > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node), > > hash_int(core_id, 0)); > > + pmd_perf_stats_init(&pmd->perf_stats); > > } > > > > static void > > @@ -4740,13 +4674,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size, > > atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags); > > } > > > > -static void > > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd, > > - enum dp_stat_type type, int cnt) > > -{ > > - non_atomic_ullong_add(&pmd->stats.n[type], cnt); > > -} > > - > > static int > > dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, > > struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, > > @@ -4920,6 +4847,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > > int i; > > > > atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min); > > + pmd_perf_update_counter(&pmd->perf_stats, > > + md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, > > + cnt); > > > > DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) { > > struct dp_netdev_flow *flow; > > @@ -4968,24 +4898,24 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > > } > > } > > > > - dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, > > - cnt - n_dropped - n_missed); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT, > > + cnt - n_dropped - n_missed); > > > > return dp_packet_batch_size(packets_); > > } > > > > -static inline void > > +static inline int > > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > > struct dp_packet *packet, > > const struct netdev_flow_key *key, > > struct ofpbuf *actions, struct ofpbuf *put_actions, > > - int *lost_cnt, long long now) > > + long long now) > > { > > struct ofpbuf *add_actions; > > struct dp_packet_batch b; > > struct match match; > > ovs_u128 ufid; > > - int error; > > + int error = 0; > > This shouldn't be needed - error is unconditionally assigned. You are right. Will fix. > > > match.tun_md.valid = false; > > miniflow_expand(&key->mf, &match.flow); > > @@ -4999,8 +4929,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > > put_actions); > > if (OVS_UNLIKELY(error && error != ENOSPC)) { > > dp_packet_delete(packet); > > - (*lost_cnt)++; > > - return; > > + return error; > > } > > > > /* The Netlink encoding of datapath flow keys cannot express > > @@ -5040,6 +4969,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > > ovs_mutex_unlock(&pmd->flow_mutex); > > emc_probabilistic_insert(pmd, key, netdev_flow); > > } > > Should this have some additional cover for the ENOSPC case? Meaning, we > will now increment a failure condition, but I'm not sure it should be > considered an 'upcall' failure? After all, we'll still perform the actions. Good question. I inherited this change from Darrell's original patch set. Looking at ofproto, there are a number of different conditions that return an ENOSPC error from the upcall: reached ukey limit, recirculation without matching recirc_ref, ukey installation failed, possibly more. I guess the common denominator is that we execute the dp actions for the packet just without installing a datapath flow. So we can consider the upcall as successful. I suggest to return zero when we get to the end of the function as the only error possible at that stage is ENOSPC. @Darrell. OK with you? [Darrell] No, I don’t think we can just translate an upcall that fails to install a flow due to lack of space as ‘success/ok’. One of 2 main purposes of a miss upcall is to install a flow in the fast path and I think that is also the common expectation of success. The problem may also be persistent or even as bug. Two possible options are: to keep the upcall_fail_cnt++ increment or to differentiate the 2 cases as something like - upcall_fail_to_install_flow and just plain upcall_fail_cnt and refactor accordingly. > > > + return error; > > } > > > > static inline void > > @@ -5061,7 +4991,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > struct dpcls *cls; > > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > > struct dp_netdev *dp = pmd->dp; > > - int miss_cnt = 0, lost_cnt = 0; > > + int upcall_ok_cnt = 0, upcall_fail_cnt = 0; > > int lookup_cnt = 0, add_lookup_cnt; > > bool any_miss; > > size_t i; > > @@ -5103,9 +5033,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > continue; > > } > > > > - miss_cnt++; > > - handle_packet_upcall(pmd, packet, &keys[i], &actions, > > - &put_actions, &lost_cnt, now); > > + int error = handle_packet_upcall(pmd, packet, &keys[i], > > + &actions, &put_actions, now); > > + > > + if (OVS_UNLIKELY(error)) { > > + upcall_fail_cnt++; > > + } else { > > + upcall_ok_cnt++; > > + } > > } > > > > ofpbuf_uninit(&actions); > > @@ -5115,8 +5050,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > DP_PACKET_BATCH_FOR_EACH (packet, packets_) { > > if (OVS_UNLIKELY(!rules[i])) { > > dp_packet_delete(packet); > > - lost_cnt++; > > - miss_cnt++; > > + upcall_fail_cnt++; > > } > > } > > } > > @@ -5134,10 +5068,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); > > } > > > > - dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); > > - dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); > > - dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > > - dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT, > > + cnt - upcall_ok_cnt - upcall_fail_cnt); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP, > > + lookup_cnt); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS, > > + upcall_ok_cnt); > > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, > > + upcall_fail_cnt); > > } > > > > /* Packets enter the datapath from a port (or from recirculation) here. > > diff --git a/tests/pmd.at b/tests/pmd.at > > index e39a23a..0356f87 100644 > > --- a/tests/pmd.at > > +++ b/tests/pmd.at > > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0 > > p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, > requested_tx_queues=<cleared>) > > ]) > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > > pmd thread numa_id <cleared> core_id <cleared>: > > + packets received:0 > > + packet recirculations:0 > > + avg. datapath passes per packet:0.00 > > emc hits:0 > > megaflow hits:0 > > - avg. subtable lookups per hit:0.00 > > - miss:0 > > - lost:0 > > + avg. subtable lookups per megaflow hit:0.00 > > + miss(i.e. lookup miss with success upcall):0 > > + lost(i.e. lookup miss with failed upcall):0 > > ]) > > > > ovs-appctl time/stop > > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl > > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), > actions: <del> > > ]) > > > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl > > pmd thread numa_id <cleared> core_id <cleared>: > > + packets received:20 > > + packet recirculations:0 > > + avg. datapath passes per packet:1.00 > > emc hits:19 > > megaflow hits:0 > > - avg. subtable lookups per hit:0.00 > > - miss:1 > > - lost:0 > > + avg. subtable lookups per megaflow hit:0.00 > > + miss(i.e. lookup miss with success upcall):1 > > + lost(i.e. lookup miss with failed upcall):0 > > ]) > > > > OVS_VSWITCHD_STOP _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev