Thanks for the review. Find my answers below. /Jan > -----Original Message----- > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] > Sent: Friday, 19 January, 2018 12:31 > To: Jan Scheurich <jan.scheur...@ericsson.com>; d...@openvswitch.org > Cc: ktray...@redhat.com; Stokes, Ian <ian.sto...@intel.com>; > i.maxim...@samsung.com > Subject: RE: [PATCH v7 2/3] dpif-netdev: Detailed performance stats for PMDs > > A few things I didn't come across until reading 3/3 but are not related to > the other atomic/volatile/mutex discussion.. > > All these are suggestions so use only if felt they offer improvements in > clarity. > > > -----Original Message----- > > From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] > > Sent: Tuesday, January 16, 2018 1:51 AM > > To: d...@openvswitch.org > > Cc: ktray...@redhat.com; Stokes, Ian <ian.sto...@intel.com>; > > i.maxim...@samsung.com; O Mahony, Billy <billy.o.mah...@intel.com>; > > Jan Scheurich <jan.scheur...@ericsson.com> > > Subject: [PATCH v7 2/3] dpif-netdev: Detailed performance stats for PMDs > > > > This patch instruments the dpif-netdev datapath to record detailed > > statistics of what is happening in every iteration of a PMD thread. > > > > The collection of detailed statistics can be controlled by a new > > Open_vSwitch configuration parameter "other_config:pmd-perf-metrics". > > By default it is disabled. The run-time overhead, when enabled, is > > in the order of 1%. > > > > The covered metrics per iteration are: > > - cycles > > - packets > > - (rx) batches > > - packets/batch > > - max. vhostuser qlen > > - upcalls > > - cycles spent in upcalls > > > > This raw recorded data is used threefold: > > > > 1. In histograms for each of the following metrics: > > - cycles/iteration (log.) > > - packets/iteration (log.) > > - cycles/packet > > - packets/batch > > - max. vhostuser qlen (log.) > > - upcalls > > - cycles/upcall (log) > > The histograms bins are divided linear or logarithmic. > > > > 2. A cyclic history of the above statistics for 999 iterations > > > > 3. A cyclic history of the cummulative/average values per millisecond > > wall clock for the last 1000 milliseconds: > > - number of iterations > > - avg. cycles/iteration > > - packets (Kpps) > > - avg. packets/batch > > - avg. max vhost qlen > > - upcalls > > - avg. cycles/upcall > > > > The gathered performance metrics can be printed at any time with the > > new CLI command > > > > ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len] > > [-pmd core | dp] > > > > The options are > > > > -nh: Suppress the histograms > > -it iter_len: Display the last iter_len iteration stats > > -ms ms_len: Display the last ms_len millisecond stats > > -pmd core: Display only > > > > The performance statistics are reset with the existing > > dpif-netdev/pmd-stats-clear command. > > > > The output always contains the following global PMD statistics, > > similar to the pmd-stats-show command: > > > > Time: 15:24:55.270 > > Measurement duration: 1.008 s > > > > pmd thread numa_id 0 core_id 1: > > > > Cycles: 2419034712 (2.40 GHz) > > Iterations: 572817 (1.76 us/it) > > - idle: 486808 (15.9 % cycles) > > - busy: 86009 (84.1 % cycles) > > Packets: 2399607 (2381 Kpps, 848 cycles/pkt) > > Datapath passes: 3599415 (1.50 passes/pkt) > > - EMC hits: 336472 ( 9.3 %) > > - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) > > - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) > > - Lost upcalls: 0 ( 0.0 %) > > > > Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> > > --- > > NEWS | 3 + > > lib/automake.mk | 1 + > > lib/dp-packet.h | 1 + > > lib/dpif-netdev-perf.c | 333 > > +++++++++++++++++++++++++++++++++++++++++++- > > lib/dpif-netdev-perf.h | 239 +++++++++++++++++++++++++++++-- > > lib/dpif-netdev.c | 177 +++++++++++++++++++++-- > > lib/netdev-dpdk.c | 13 +- > > lib/netdev-dpdk.h | 14 ++ > > lib/netdev-dpif-unixctl.man | 113 +++++++++++++++ > > manpages.mk | 2 + > > vswitchd/ovs-vswitchd.8.in | 27 +--- > > vswitchd/vswitch.xml | 12 ++ > > 12 files changed, 881 insertions(+), 54 deletions(-) > > create mode 100644 lib/netdev-dpif-unixctl.man > > > > diff --git a/NEWS b/NEWS > > index 2c28456..743528e 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -44,6 +44,9 @@ Post-v2.8.0 > > if available (for OpenFlow 1.4+). > > - Userspace datapath: > > * Output packet batching support. > > + * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a > > single PMD > > + * Detailed PMD performance metrics available with new command > > + ovs-appctl dpif-netdev/pmd-perf-show > > - vswitchd: > > * Datapath IDs may now be specified as 0x1 (etc.) instead of 16 > > digits. > > * Configuring a controller, or unconfiguring all controllers, now > > deletes > > diff --git a/lib/automake.mk b/lib/automake.mk > > index 159319f..d07cbe9 100644 > > --- a/lib/automake.mk > > +++ b/lib/automake.mk > > @@ -468,6 +468,7 @@ MAN_FRAGMENTS += \ > > lib/dpctl.man \ > > lib/memory-unixctl.man \ > > lib/netdev-dpdk-unixctl.man \ > > + lib/netdev-dpif-unixctl.man \ > > lib/ofp-version.man \ > > lib/ovs.tmac \ > > lib/service.man \ > > diff --git a/lib/dp-packet.h b/lib/dp-packet.h > > index b4b721c..3d65088 100644 > > --- a/lib/dp-packet.h > > +++ b/lib/dp-packet.h > > @@ -697,6 +697,7 @@ struct dp_packet_batch { > > size_t count; > > bool trunc; /* true if the batch needs truncate. */ > > struct dp_packet *packets[NETDEV_MAX_BURST]; > > + > > }; > > > > static inline void > > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c > > index f06991a..e0ef15d 100644 > > --- a/lib/dpif-netdev-perf.c > > +++ b/lib/dpif-netdev-perf.c > > @@ -15,6 +15,7 @@ > > */ > > > > #include <config.h> > > +#include <stdint.h> > > > > #include "openvswitch/dynamic-string.h" > > #include "openvswitch/vlog.h" > > @@ -23,10 +24,299 @@ > > > > VLOG_DEFINE_THIS_MODULE(pmd_perf); > > > > +#ifdef DPDK_NETDEV > > +static uint64_t > > +get_tsc_hz(void) > > +{ > > + return rte_get_tsc_hz(); > > +} > > +#else > > +/* This function is only invoked from PMD threads which depend on DPDK. > > + * A dummy function is sufficient when building without DPDK_NETDEV. */ > > +static uint64_t > > +get_tsc_hz(void) > > +{ > > + return 1; > > +} > > +#endif > > + > > +/* Histogram functions. */ > > + > > +static void > > +histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max) > > +{ > > + int i; > > + > > + ovs_assert(min < max); > > + for (i = 0; i < NUM_BINS-1; i++) { > > + hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2); > > + } > > + hist->wall[NUM_BINS-1] = UINT32_MAX; > > +} > > + > > +static void > > +histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t > > max) > > +{ > > + int i, start, bins, wall; > > + double log_min, log_max; > > + > > + ovs_assert(min < max); > > + if (min > 0) { > > + log_min = log(min); > > + log_max = log(max); > > + start = 0; > > + bins = NUM_BINS - 1; > > + } else { > > + hist->wall[0] = 0; > > + log_min = log(1); > > + log_max = log(max); > > + start = 1; > > + bins = NUM_BINS - 2; > > + } > > + wall = start; > > + for (i = 0; i < bins; i++) { > > + /* Make sure each wall is monotonically increasing. */ > > + wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / > > (bins-1))); > > + hist->wall[start + i] = wall++; > > + } > > + if (hist->wall[NUM_BINS-2] < max) { > > + hist->wall[NUM_BINS-2] = max; > > + } > > + hist->wall[NUM_BINS-1] = UINT32_MAX; > > +} > > + > > +uint64_t > > +histogram_samples(const struct histogram *hist) > > +{ > > + uint64_t samples = 0; > > + > > + for (int i = 0; i < NUM_BINS; i++) { > > + samples += hist->bin[i]; > > + } > > + return samples; > > +} > > + > > +static void > > +histogram_clear(struct histogram *hist) > > +{ > > + int i; > > + > > + for (i = 0; i < NUM_BINS; i++) { > > + hist->bin[i] = 0; > > + } > > +} > > + > > +static void > > +history_init(struct history *h) > > +{ > > + memset(h, 0, sizeof(*h)); > > +} > > + > > void > > pmd_perf_stats_init(struct pmd_perf_stats *s) > > { > > - memset(s, 0 , sizeof(*s)); > > + memset(s, 0, sizeof(*s)); > > + histogram_walls_set_log(&s->cycles, 500, 24000000); > > + histogram_walls_set_log(&s->pkts, 0, 1000); > > + histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000); > > + histogram_walls_set_lin(&s->pkts_per_batch, 0, 32); > > + histogram_walls_set_lin(&s->upcalls, 0, 30); > > + histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000); > > + histogram_walls_set_log(&s->max_vhost_qfill, 0, 512); > > + s->start_ms = time_msec(); > > +} > > + > > +void > > +pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, > > + double duration) > > +{ > > + uint64_t stats[PMD_N_STATS]; > > + double us_per_cycle = 1000000.0 / get_tsc_hz(); > > + > > + if (duration == 0) { > > + return; > > + } > > + > > + pmd_perf_read_counters(s, stats); > > + uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] + > > + stats[PMD_CYCLES_ITER_BUSY]; > > + uint64_t packets = stats[PMD_STAT_RECV]; > > + uint64_t passes = stats[PMD_STAT_RECV] + > > + stats[PMD_STAT_RECIRC]; > > + uint64_t upcalls = stats[PMD_STAT_MISS]; > > + uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL]; > > + uint64_t tot_iter = histogram_samples(&s->pkts); > > + uint64_t idle_iter = s->pkts.bin[0]; > > + uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0; > > + > > + ds_put_format(str, > > + " Cycles: %12"PRIu64" (%.2f GHz)\n" > > + " Iterations: %12"PRIu64" (%.2f us/it)\n" > > + " - idle: %12"PRIu64" (%4.1f %% cycles)\n" > > + " - busy: %12"PRIu64" (%4.1f %% cycles)\n", > > + tot_cycles, (tot_cycles / duration) / 1E9, > > + tot_iter, tot_cycles * us_per_cycle / tot_iter, > > + idle_iter, > > + 100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles, > > + busy_iter, > > + 100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles); > > + if (packets > 0) { > > + ds_put_format(str, > > + " Packets: %12"PRIu64" (%.0f Kpps, %.0f > > cycles/pkt)\n" > > + " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n" > > + " - EMC hits: %12"PRIu64" (%4.1f %%)\n" > > + " - Megaflow hits: %12"PRIu64" (%4.1f %%, %.2f subtbl > > lookups/" > > + > > "hit)\n" > > + " - Upcalls: %12"PRIu64" (%4.1f %%, %.1f us/upcall)\n" > > + " - Lost upcalls: %12"PRIu64" (%4.1f %%)\n" > > + "\n", > > + packets, (packets / duration) / 1000, > > + 1.0 * stats[PMD_CYCLES_ITER_BUSY] / packets, > > + passes, packets ? 1.0 * passes / packets : 0, > > + stats[PMD_STAT_EXACT_HIT], > > + 100.0 * stats[PMD_STAT_EXACT_HIT] / passes, > > + stats[PMD_STAT_MASKED_HIT], > > + 100.0 * stats[PMD_STAT_MASKED_HIT] / passes, > > + stats[PMD_STAT_MASKED_HIT] > > + ? 1.0 * stats[PMD_STAT_MASKED_LOOKUP] / > > stats[PMD_STAT_MASKED_HIT] > > + : 0, > > + upcalls, 100.0 * upcalls / passes, > > + upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0, > > + stats[PMD_STAT_LOST], > > + 100.0 * stats[PMD_STAT_LOST] / passes); > > + } else { > > + ds_put_format(str, > > + " Packets: %12"PRIu64"\n" > > + "\n", > > + 0UL); > > + } > > +} > > + > > +void > > +pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s) > > +{ > > + int i; > > + > > + ds_put_cstr(str, "Histograms\n"); > > + ds_put_format(str, > > + " %-21s %-21s %-21s %-21s %-21s %-21s %-21s\n", > > + "cycles/it", "packets/it", "cycles/pkt", "pkts/batch", > > + "max vhost qlen", "upcalls/it", "cycles/upcall"); > > + for (i = 0; i < NUM_BINS-1; i++) { > > + ds_put_format(str, > > + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 > > + " %-9d %-11"PRIu64" %-9d %-11"PRIu64" %-9d %-11"PRIu64 > > + " %-9d %-11"PRIu64"\n", > > + s->cycles.wall[i], s->cycles.bin[i], > > + s->pkts.wall[i],s->pkts.bin[i], > > + s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i], > > + s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i], > > + s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i], > > + s->upcalls.wall[i], s->upcalls.bin[i], > > + s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]); > > + } > > + ds_put_format(str, > > + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s > > %-11"PRIu64 > > + " %-9s %-11"PRIu64" %-9s %-11"PRIu64" %-9s %-11"PRIu64 > > + " %-9s %-11"PRIu64"\n", > > + ">", s->cycles.bin[i], > > + ">", s->pkts.bin[i], > > + ">", s->cycles_per_pkt.bin[i], > > + ">", s->pkts_per_batch.bin[i], > > + ">", s->max_vhost_qfill.bin[i], > > + ">", s->upcalls.bin[i], > > + ">", s->cycles_per_upcall.bin[i]); > > + if (s->totals.iterations > 0) { > > + ds_put_cstr(str, > > + "-----------------------------------------------------" > > + "-----------------------------------------------------" > > + "------------------------------------------------\n"); > > + ds_put_format(str, > > + " %-21s %-21s %-21s %-21s %-21s %-21s > > %-21s\n", > > + "cycles/it", "packets/it", "cycles/pkt", > > "pkts/batch", > > + "vhost qlen", "upcalls/it", "cycles/upcall"); > > + ds_put_format(str, > > + " %-21"PRIu64" %-21.5f %-21"PRIu64 > > + " %-21.5f %-21.5f %-21.5f %-21"PRIu32"\n", > > + s->totals.cycles / s->totals.iterations, > > + 1.0 * s->totals.pkts / s->totals.iterations, > > + s->totals.pkts > > + ? s->totals.busy_cycles / s->totals.pkts : 0, > > + s->totals.batches > > + ? 1.0 * s->totals.pkts / s->totals.batches : 0, > > + 1.0 * s->totals.max_vhost_qfill / > > s->totals.iterations, > > + 1.0 * s->totals.upcalls / s->totals.iterations, > > + s->totals.upcalls > > + ? s->totals.upcall_cycles / s->totals.upcalls : > > 0); > > + } > > +} > > + > > +void > > +pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats > > *s, > > + int n_iter) > > +{ > > + struct iter_stats *is; > > + size_t index; > > + int i; > > + > > + if (n_iter == 0) { > > + return; > > + } > > + ds_put_format(str, " %-17s %-10s %-10s %-10s %-10s " > > + "%-10s %-10s %-10s\n", > > + "tsc", "cycles", "packets", "cycles/pkt", "pkts/batch", > > + "vhost qlen", "upcalls", "cycles/upcall"); > > + for (i = 1; i <= n_iter; i++) { > > + index = (s->iterations.idx + HISTORY_LEN - i) % HISTORY_LEN; > > + is = &s->iterations.sample[index]; > > + ds_put_format(str, > > + " %-17"PRIu64" %-11"PRIu64" %-11"PRIu32 > > + " %-11"PRIu64" %-11"PRIu32" %-11"PRIu32 > > + " %-11"PRIu32" %-11"PRIu32"\n", > > + is->timestamp, > > + is->cycles, > > + is->pkts, > > + is->pkts ? is->cycles / is->pkts : 0, > > + is->batches ? is->pkts / is->batches : 0, > > + is->max_vhost_qfill, > > + is->upcalls, > > + is->upcalls ? is->upcall_cycles / is->upcalls : 0); > > + } > > +} > > + > > +void > > +pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int > > n_ms) > > +{ > > + struct iter_stats *is; > > + size_t index; > > + int i; > > + > > + if (n_ms == 0) { > > + return; > > + } > > + ds_put_format(str, > > + " %-12s %-10s %-10s %-10s %-10s" > > + " %-10s %-10s %-10s %-10s\n", > > + "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt", > > + "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall"); > > + for (i = 1; i <= n_ms; i++) { > > + index = (s->milliseconds.idx + HISTORY_LEN - i) % HISTORY_LEN; > > + is = &s->milliseconds.sample[index]; > > + ds_put_format(str, > > + " %-12"PRIu64" %-11"PRIu32" %-11"PRIu64 > > + " %-11"PRIu32" %-11"PRIu64" %-11"PRIu32 > > + " %-11"PRIu32" %-11"PRIu32" %-11"PRIu32"\n", > > + is->timestamp, > > + is->iterations, > > + is->iterations ? is->cycles / is->iterations : 0, > > + is->pkts, > > + is->pkts ? is->busy_cycles / is->pkts : 0, > > + is->batches ? is->pkts / is->batches : 0, > > + is->iterations > > + ? is->max_vhost_qfill / is->iterations : 0, > > + is->upcalls, > > + is->upcalls ? is->upcall_cycles / is->upcalls : 0); > > + } > > } > > > > void > > @@ -51,10 +341,49 @@ pmd_perf_read_counters(struct pmd_perf_stats > > *s, > > } > > } > > > > +/* This function is executed in the context of the PMD at the start of > > + * a new iteration when requested through pmd_perf_stats_clear(). */ > > void > > -pmd_perf_stats_clear(struct pmd_perf_stats *s) > > +pmd_perf_stats_clear__(struct pmd_perf_stats *s) > > { > > for (int i = 0; i < PMD_N_STATS; i++) { > > atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]); > > } > > + memset(&s->current, 0 , sizeof(struct iter_stats)); > > + memset(&s->totals, 0 , sizeof(struct iter_stats)); > > + histogram_clear(&s->cycles); > > + histogram_clear(&s->pkts); > > + histogram_clear(&s->cycles_per_pkt); > > + histogram_clear(&s->upcalls); > > + histogram_clear(&s->cycles_per_upcall); > > + histogram_clear(&s->pkts_per_batch); > > + histogram_clear(&s->max_vhost_qfill); > > + history_init(&s->iterations); > > + history_init(&s->milliseconds); > > + s->start_ms = time_msec(); > > + s->milliseconds.sample[0].timestamp = s->start_ms; > > + /* Clearing finished. */ > > + s->clear = false; > > +} > > + > > +/* This function must be called from outside the PMD thread to safely > > + * clear the PMD stats at the start of the next iteration. It blocks the > > + * caller until the stats are cleared. */ > > +void > > +pmd_perf_stats_clear(struct pmd_perf_stats *s) > > +{ > > + /* Request the PMD to clear its stats in pmd_perf_start_iteration(). */ > > + s->clear = true; > > + /* Wait a number of milliseconds for the stats to be cleared. */ > > + while (s->clear) { > > + xnanosleep(1000 * 1000); > > + } > > +} > > + > > +/* This function can be called from the anywhere to clear the stats > > + * of the non-pmd thread. */ > > +void > > +non_pmd_perf_stats_clear(struct pmd_perf_stats *s) > > +{ > > + pmd_perf_stats_clear__(s); > > } > > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > > index 5993c25..7a89c40 100644 > > --- a/lib/dpif-netdev-perf.h > > +++ b/lib/dpif-netdev-perf.h > > @@ -38,10 +38,18 @@ > > extern "C" { > > #endif > > > > -/* This module encapsulates data structures and functions to maintain PMD > > - * performance metrics such as packet counters, execution cycles. It > > - * provides a clean API for dpif-netdev to initialize, update and read and > > +/* This module encapsulates data structures and functions to maintain basic > > PMD > > + * performance metrics such as packet counters, execution cycles as well as > > + * histograms and time series recording for more detailed PMD metrics. > > + * > > + * It provides a clean API for dpif-netdev to initialize, update and read > > and > > * reset these metrics. > > + * > > + * The basic set of PMD counters is implemented as atomic_uint64_t > > variables > > + * to guarantee correct read also in 32-bit systems. > > + * > > + * The detailed PMD performance metrics are only supported on 64-bit > > systems > > + * with atomic 64-bit read and store semantics for plain uint64_t counters. > > */ > > > > /* Set of counter types maintained in pmd_perf_stats. */ > > @@ -66,6 +74,7 @@ enum pmd_stat_type { > > PMD_STAT_SENT_BATCHES, /* Number of batches sent. */ > > PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */ > > PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */ > > + PMD_CYCLES_UPCALL, /* Cycles spent processing upcalls. */ > > PMD_N_STATS > > }; > > > > @@ -81,18 +90,78 @@ struct pmd_counters { > > uint64_t zero[PMD_N_STATS]; /* Value at last _clear(). */ > > }; > > > > +/* Data structure to collect statistical distribution of an integer > > measurement > > + * type in form of a histogram. The wall[] array contains the inclusive > > + * upper boundaries of the bins, while the bin[] array contains the actual > > + * counters per bin. The histogram walls are typically set automatically > > + * using the functions provided below.*/ > > + > > +#define NUM_BINS 32 /* Number of histogram bins. */ > > + > > +struct histogram { > > + uint32_t wall[NUM_BINS]; > > + uint64_t bin[NUM_BINS]; > > +}; > > + > > +/* Data structure to record details PMD execution metrics per iteration for > > + * a history period of up to HISTORY_LEN iterations in circular buffer. > > + * Also used to record up to HISTORY_LEN millisecond averages/totals of > > these > > + * metrics.*/ > > + > > +struct iter_stats { > > + uint64_t timestamp; /* TSC or millisecond. */ > > + uint64_t cycles; /* Number of TSC cycles spent in it/ms. */ > > + uint64_t busy_cycles; /* Cycles spent in busy iterations in ms. > > */ > > + uint32_t iterations; /* Iterations in ms. */ > > + uint32_t pkts; /* Packets processed in iteration/ms. */ > > + uint32_t upcalls; /* Number of upcalls in iteration/ms. */ > > + uint32_t upcall_cycles; /* Cycles spent in upcalls in > > iteration/ms. */ > > + uint32_t batches; /* Number of rx batches in iteration/ms. */ > > + uint32_t max_vhost_qfill; /* Maximum fill level encountered in > > it/ms. */ > > +}; > > + > > +#define HISTORY_LEN 1000 /* Length of recorded history > > + (iterations and ms). */ > > +#define DEF_HIST_SHOW 20 /* Default number of history samples to > > + display. */ > > + > > +struct history { > > + uint64_t idx; /* Next slot in history. */ > [[BO'M]] Suggest "Slot where next call to history_store() will write" is more > educational.
OK. > > + struct iter_stats sample[HISTORY_LEN]; > [[BO'M]] sample -> samples. The plural reads more naturally and iirc is in > the coding sytle. You are right about the coding style, but it would affect many other array names also and I don't particularly think it increases readability in many cases. I would prefer to keep names as they are. > > +}; > > + > > /* Container for all performance metrics of a PMD. > > * Part of the struct dp_netdev_pmd_thread. */ > > > > struct pmd_perf_stats { > > - /* Start of the current PMD iteration in TSC cycles.*/ > > - uint64_t start_it_tsc; > > + /* Set by CLI thread to order clearing of PMD stats. */ > > + volatile atomic_bool clear; > > + /* Start of the current performance measurement period. */ > > + uint64_t start_ms; > > /* Latest TSC time stamp taken in PMD. */ > > uint64_t last_tsc; > > + /* Used to space certain checks in time. */ > > + uint64_t next_check_tsc; > > /* If non-NULL, outermost cycle timer currently running in PMD. */ > > struct cycle_timer *cur_timer; > > /* Set of PMD counters with their zero offsets. */ > > struct pmd_counters counters; > > + /* Statistics of the current iteration. */ > > + struct iter_stats current; > > + /* Totals for the current millisecond. */ > > + struct iter_stats totals; > > + /* Histograms for the PMD metrics. */ > > + struct histogram cycles; > > + struct histogram pkts; > > + struct histogram cycles_per_pkt; > > + struct histogram upcalls; > > + struct histogram cycles_per_upcall; > > + struct histogram pkts_per_batch; > > + struct histogram max_vhost_qfill; > > + /* Iteration history buffer. */ > > + struct history iterations; > > + /* Millisecond hitory buffer. */ > > + struct history milliseconds; > [[BO'M]] Would suggest iter_history or iteration_history and ms_history or > millisec(ond)_history would help when reading the code. Don't > shorted history to hist however as that clashes with the shortening used for > histogram. I prefer to keep current names. > > }; > > > > /* Support for accurate timing of PMD execution on TSC clock cycle level. > > @@ -175,8 +244,15 @@ cycle_timer_stop(struct pmd_perf_stats *s, > > return now - timer->start; > > } > > > > +/* Functions to initialize and reset the PMD performance metrics. */ > > + > > void pmd_perf_stats_init(struct pmd_perf_stats *s); > > void pmd_perf_stats_clear(struct pmd_perf_stats *s); > > +void non_pmd_perf_stats_clear(struct pmd_perf_stats *s); > > +void pmd_perf_stats_clear__(struct pmd_perf_stats *s); > > + > > +/* Functions to read and update PMD counters. */ > > + > > void pmd_perf_read_counters(struct pmd_perf_stats *s, > > uint64_t stats[PMD_N_STATS]); > > > > @@ -199,32 +275,175 @@ pmd_perf_update_counter(struct > > pmd_perf_stats *s, > > atomic_store_relaxed(&s->counters.n[counter], tmp); > > } > > > > +/* Functions to manipulate a sample history. */ > > + > > +static inline void > > +histogram_add_sample(struct histogram *hist, uint32_t val) > > +{ > > + /* TODO: Can do better with binary search? */ > > + for (int i = 0; i < NUM_BINS-1; i++) { > > + if (val <= hist->wall[i]) { > > + hist->bin[i]++; > > + return; > > + } > > + } > > + hist->bin[NUM_BINS-1]++; > > +} > > + > > +uint64_t histogram_samples(const struct histogram *hist); > > + > > +static inline struct iter_stats * > > +history_current(struct history *h) > > +{ > > + return &h->sample[h->idx]; > > +} > > + > > +static inline struct iter_stats * > > +history_next(struct history *h) > > +{ > > + struct iter_stats *next; > > + > > + h->idx++; > > + if (h->idx == HISTORY_LEN) { > > + h->idx = 0; > > + } > > + next = &h->sample[h->idx]; > > + memset(next, 0, sizeof(*next)); > > + return next; > > +} > > + > > +static inline struct iter_stats * > > +history_store(struct history *h, struct iter_stats *is) > > +{ > > + if (is) { > > + h->sample[h->idx] = *is; > > + } > > + /* Advance the history pointer */ > > + return history_next(h); > > +} > > + > > +/* Functions recording PMD metrics per iteration. */ > > + > > static inline void > > pmd_perf_start_iteration(struct pmd_perf_stats *s) > > { > > + if (s->clear) { > > + /* Clear the PMD stats before starting next iteration. */ > > + pmd_perf_stats_clear__(s); > > + } > > + /* Initialize the current interval stats. */ > > + memset(&s->current, 0, sizeof(struct iter_stats)); > > if (OVS_LIKELY(s->last_tsc)) { > > /* We assume here that last_tsc was updated immediately prior at > > * the end of the previous iteration, or just before the first > > * iteration. */ > > - s->start_it_tsc = s->last_tsc; > > + s->current.timestamp = s->last_tsc; > > } else { > > /* In case last_tsc has never been set before. */ > > - s->start_it_tsc = cycles_counter_update(s); > > + s->current.timestamp = cycles_counter_update(s); > > } > > } > > > > static inline void > > -pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets) > > +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets, > > + int tx_packets, bool full_metrics) > > { > > - uint64_t cycles = cycles_counter_update(s) - s->start_it_tsc; > > + uint64_t now_tsc = cycles_counter_update(s); > > + struct iter_stats *cum_ms; > > + uint64_t cycles, cycles_per_pkt = 0; > > > > - if (rx_packets > 0) { > > + if (OVS_UNLIKELY(s->current.timestamp == 0)) { > > + /* Stats were cleared during the ongoing iteration. */ > > + return; > > + } > > + > > + cycles = now_tsc - s->current.timestamp; > > + s->current.cycles = cycles; > > + s->current.pkts = rx_packets; > > + > > + if (rx_packets + tx_packets > 0) { > > pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles); > > } else { > > pmd_perf_update_counter(s, PMD_CYCLES_ITER_IDLE, cycles); > > } > > + /* Add iteration samples to histograms. */ > > + histogram_add_sample(&s->cycles, cycles); > > + histogram_add_sample(&s->pkts, rx_packets); > > + > > + if (!full_metrics) { > > + return; > > + } > > + > > + s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles; > > + > > + if (rx_packets > 0) { > > + cycles_per_pkt = cycles / rx_packets; > > + histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt); > > + } > > + if (s->current.batches > 0) { > > + histogram_add_sample(&s->pkts_per_batch, > > + rx_packets / s->current.batches); > > + } > > + histogram_add_sample(&s->upcalls, s->current.upcalls); > > + if (s->current.upcalls > 0) { > > + histogram_add_sample(&s->cycles_per_upcall, > > + s->current.upcall_cycles / > > s->current.upcalls); > > + } > > + histogram_add_sample(&s->max_vhost_qfill, s- > > >current.max_vhost_qfill); > > + > > + /* Add iteration samples to millisecond stats. */ > > + cum_ms = history_current(&s->milliseconds); > > + cum_ms->iterations++; > > + cum_ms->cycles += cycles; > > + if (rx_packets > 0) { > > + cum_ms->busy_cycles += cycles; > > + } > > + cum_ms->pkts += s->current.pkts; > > + cum_ms->upcalls += s->current.upcalls; > > + cum_ms->upcall_cycles += s->current.upcall_cycles; > > + cum_ms->batches += s->current.batches; > > + cum_ms->max_vhost_qfill += s->current.max_vhost_qfill; > > + > > + /* Store in iteration history. */ > > + history_store(&s->iterations, &s->current); > > + if (now_tsc > s->next_check_tsc) { > > + /* Check if ms is completed and store in milliseconds history. */ > > + uint64_t now = time_msec(); > > + if (now != cum_ms->timestamp) { > > + /* Add ms stats to totals. */ > > + s->totals.iterations += cum_ms->iterations; > > + s->totals.cycles += cum_ms->cycles; > > + s->totals.busy_cycles += cum_ms->busy_cycles; > > + s->totals.pkts += cum_ms->pkts; > > + s->totals.upcalls += cum_ms->upcalls; > > + s->totals.upcall_cycles += cum_ms->upcall_cycles; > > + s->totals.batches += cum_ms->batches; > > + s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill; > > + cum_ms = history_next(&s->milliseconds); > > + cum_ms->timestamp = now; > > + } > > + s->next_check_tsc = now_tsc + 10000; > > + } > > } > > > > +/* Functions for formatting the output of commands. */ > > + > > +struct pmd_perf_params { > > + int command_type; > > + bool histograms; > > + size_t iter_hist_len; > > + size_t ms_hist_len; > > +}; > > + > > +void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats > > *s, > > + double duration); > > +void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats > > *s); > > +void pmd_perf_format_iteration_history(struct ds *str, > > + struct pmd_perf_stats *s, > > + int n_iter); > > +void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats > > *s, > > + int n_ms); > > + > > #ifdef __cplusplus > > } > > #endif > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 48a8ebb..f5931bf 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -49,6 +49,7 @@ > > #include "id-pool.h" > > #include "latch.h" > > #include "netdev.h" > > +#include "netdev-provider.h" > > #include "netdev-vport.h" > > #include "netlink.h" > > #include "odp-execute.h" > > @@ -281,6 +282,8 @@ struct dp_netdev { > > > > /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ > > OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min; > > + /* Enable collection of PMD performance metrics. */ > > + ATOMIC(bool) pmd_perf_metrics; > > > > /* Protects access to ofproto-dpif-upcall interface during revalidator > > * thread synchronization. */ > > @@ -712,6 +715,8 @@ static inline bool emc_entry_alive(struct emc_entry > > *ce); > > static void emc_clear_entry(struct emc_entry *ce); > > > > static void dp_netdev_request_reconfigure(struct dp_netdev *dp); > > +static inline bool > > +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd); > > > > static void > > emc_cache_init(struct emc_cache *flow_cache) > > @@ -795,7 +800,8 @@ get_dp_netdev(const struct dpif *dpif) > > enum pmd_info_type { > > PMD_INFO_SHOW_STATS, /* Show how cpu cycles are spent. */ > > PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */ > > - PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */ > > + PMD_INFO_SHOW_RXQ, /* Show poll lists of pmd threads. */ > > + PMD_INFO_PERF_SHOW, /* Show pmd performance details. */ > > }; > > > > static void > > @@ -886,6 +892,44 @@ pmd_info_show_stats(struct ds *reply, > > stats[PMD_CYCLES_ITER_BUSY], total_packets); > > } > > > > +static void > > +pmd_info_show_perf(struct ds *reply, > > + struct dp_netdev_pmd_thread *pmd, > > + struct pmd_perf_params *par) > > +{ > > + if (pmd->core_id != NON_PMD_CORE_ID) { > > + char *time_str = > > + xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true); > > + long long now = time_msec(); > > + double duration = (now - pmd->perf_stats.start_ms) / 1000.0; > > + > > + ds_put_cstr(reply, "\n"); > > + ds_put_format(reply, "Time: %s\n", time_str); > > + ds_put_format(reply, "Measurement duration: %.3f s\n", duration); > > + ds_put_cstr(reply, "\n"); > > + format_pmd_thread(reply, pmd); > > + ds_put_cstr(reply, "\n"); > > + pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration); > > + if (pmd_perf_metrics_enabled(pmd)) { > > + if (par->histograms) { > > + ds_put_cstr(reply, "\n"); > > + pmd_perf_format_histograms(reply, &pmd->perf_stats); > > + } > > + if (par->iter_hist_len > 0) { > > + ds_put_cstr(reply, "\n"); > > + pmd_perf_format_iteration_history(reply, &pmd->perf_stats, > > + par->iter_hist_len); > > + } > > + if (par->ms_hist_len > 0) { > > + ds_put_cstr(reply, "\n"); > > + pmd_perf_format_ms_history(reply, &pmd->perf_stats, > > + par->ms_hist_len); > > + } > > + } > > + free(time_str); > > + } > > +} > > + > > static int > > compare_poll_list(const void *a_, const void *b_) > > { > > @@ -1088,9 +1132,15 @@ dpif_netdev_pmd_info(struct unixctl_conn > > *conn, int argc, const char *argv[], > > if (type == PMD_INFO_SHOW_RXQ) { > > pmd_info_show_rxq(&reply, pmd); > > } else if (type == PMD_INFO_CLEAR_STATS) { > > - pmd_perf_stats_clear(&pmd->perf_stats); > > + if (pmd->core_id == NON_PMD_CORE_ID) { > > + non_pmd_perf_stats_clear(&pmd->perf_stats); > > + } else { > > + pmd_perf_stats_clear(&pmd->perf_stats); > > + } > > } else if (type == PMD_INFO_SHOW_STATS) { > > pmd_info_show_stats(&reply, pmd); > > + } else if (type == PMD_INFO_PERF_SHOW) { > > + pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params > > *)aux); > > } > > } > > free(pmd_list); > > @@ -1100,6 +1150,48 @@ dpif_netdev_pmd_info(struct unixctl_conn > > *conn, int argc, const char *argv[], > > unixctl_command_reply(conn, ds_cstr(&reply)); > > ds_destroy(&reply); > > } > > + > > +static void > > +pmd_perf_show_cmd(struct unixctl_conn *conn, int argc, > > + const char *argv[], > > + void *aux OVS_UNUSED) > > +{ > > + struct pmd_perf_params par; > > + long int it_hist = 0, ms_hist = 0; > > + par.histograms = true; > > + > > + while (argc > 1) { > > + if (!strcmp(argv[1], "-nh")) { > > + par.histograms = false; > > + argc -= 1; > > + argv += 1; > > + } else if (!strcmp(argv[1], "-it") && argc > 2) { > > + it_hist = strtol(argv[2], NULL, 10); > > + if (it_hist < 0) { > > + it_hist = 0; > > + } else if (it_hist > HISTORY_LEN) { > > + it_hist = HISTORY_LEN; > > + } > > + argc -= 2; > > + argv += 2; > > + } else if (!strcmp(argv[1], "-ms") && argc > 2) { > > + ms_hist = strtol(argv[2], NULL, 10); > > + if (ms_hist < 0) { > > + ms_hist = 0; > > + } else if (ms_hist > HISTORY_LEN) { > > + ms_hist = HISTORY_LEN; > > + } > > + argc -= 2; > > + argv += 2; > > + } else { > > + break; > > + } > > + } > > + par.iter_hist_len = it_hist; > > + par.ms_hist_len = ms_hist; > > + par.command_type = PMD_INFO_PERF_SHOW; > > + dpif_netdev_pmd_info(conn, argc, argv, &par); > > +} > > > > > static int > > dpif_netdev_init(void) > > @@ -1117,6 +1209,12 @@ dpif_netdev_init(void) > > unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] > > [dp]", > > 0, 3, dpif_netdev_pmd_info, > > (void *)&poll_aux); > > + unixctl_command_register("dpif-netdev/pmd-perf-show", > > + "[-nh] [-it iter-history-len]" > > + " [-ms ms-history-len]" > > + " [-pmd core | dp]", > > + 0, 7, pmd_perf_show_cmd, > > + NULL); > > unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", > > 0, 1, dpif_netdev_pmd_rebalance, > > NULL); > > @@ -3003,6 +3101,18 @@ dpif_netdev_set_config(struct dpif *dpif, const > > struct smap *other_config) > > } > > } > > > > + bool perf_enabled = smap_get_bool(other_config, "pmd-perf-metrics", > > false); > > + bool cur_perf_enabled; > > + atomic_read_relaxed(&dp->pmd_perf_metrics, &cur_perf_enabled); > > + if (perf_enabled != cur_perf_enabled) { > > + atomic_store_relaxed(&dp->pmd_perf_metrics, perf_enabled); > > + if (perf_enabled) { > > + VLOG_INFO("PMD performance metrics collection enabled"); > > + } else { > > + VLOG_INFO("PMD performance metrics collection disabled"); > > + } > > + } > > + > > return 0; > > } > > > > @@ -3139,6 +3249,20 @@ dp_netdev_rxq_set_cycles(struct dp_netdev_rxq > > *rx, > > atomic_store_relaxed(&rx->cycles[type], cycles); > > } > > > > +static inline bool > > +pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd) > > +{ > > + /* If stores and reads of 64-bit integers are not atomic, the > > + * full PMD performance metrics are not available as locked > > + * access to 64 bit integers would be prohibitively expensive. */ > > + if (sizeof(uint64_t) > sizeof(void *)) { > > + return false; > > + } > > + bool pmd_perf_enabled; > > + atomic_read_relaxed(&pmd->dp->pmd_perf_metrics, > > &pmd_perf_enabled); > > + return pmd_perf_enabled; > > +} > > + > > static void > > dp_netdev_rxq_add_cycles(struct dp_netdev_rxq *rx, > > enum rxq_cycles_counter_type type, > > @@ -3247,10 +3371,11 @@ dp_netdev_process_rxq_port(struct > > dp_netdev_pmd_thread *pmd, > > struct dp_netdev_rxq *rxq, > > odp_port_t port_no) > > { > > + struct pmd_perf_stats *s = &pmd->perf_stats; > > struct dp_packet_batch batch; > > struct cycle_timer timer; > > int error; > > - int batch_cnt = 0, output_cnt = 0; > > + int batch_cnt = 0; > > uint64_t cycles; > > > > /* Measure duration for polling and processing rx burst. */ > > @@ -3264,15 +3389,34 @@ dp_netdev_process_rxq_port(struct > > dp_netdev_pmd_thread *pmd, > > /* At least one packet received. */ > > *recirc_depth_get() = 0; > > pmd_thread_ctx_time_update(pmd); > > - > > batch_cnt = batch.count; > > + if (pmd_perf_metrics_enabled(pmd)) { > > + /* Update batch histogram. */ > > + s->current.batches++; > > + histogram_add_sample(&s->pkts_per_batch, batch_cnt); > > + /* Update the maximum Rx queue fill level. */ > > + int dev_type = netdev_dpdk_get_type( > > + netdev_rxq_get_netdev(rxq->rx)); > > + if (dev_type == DPDK_DEV_VHOST) { > > + /* Check queue fill level for vhostuser ports. */ > > + uint32_t qfill = batch_cnt; > > + if (OVS_UNLIKELY(batch_cnt == NETDEV_MAX_BURST)) { > > + /* Likely more packets in rxq. */ > > + qfill += netdev_rxq_length(rxq->rx); > > + } > > + if (qfill > s->current.max_vhost_qfill) { > > + s->current.max_vhost_qfill = qfill; > > + } > > + } > > + } > > + /* Process packet batch. */ > > dp_netdev_input(pmd, &batch, port_no); > > > > /* Assign processing cycles to rx queue. */ > > cycles = cycle_timer_stop(&pmd->perf_stats, &timer); > > dp_netdev_rxq_add_cycles(rxq, RXQ_CYCLES_PROC_CURR, cycles); > > > > - output_cnt = dp_netdev_pmd_flush_output_packets(pmd, false); > > + dp_netdev_pmd_flush_output_packets(pmd, false); > > } else { > > /* Discard cycles. */ > > cycle_timer_stop(&pmd->perf_stats, &timer); > > @@ -3286,7 +3430,7 @@ dp_netdev_process_rxq_port(struct > > dp_netdev_pmd_thread *pmd, > > > > pmd->ctx.last_rxq = NULL; > > > > - return batch_cnt + output_cnt; > > + return batch_cnt; > > } > > > > static struct tx_port * > > @@ -4119,22 +4263,23 @@ reload: > > > > cycles_counter_update(s); > > for (;;) { > > - uint64_t iter_packets = 0; > > + uint64_t rx_packets = 0, tx_packets = 0; > > > > pmd_perf_start_iteration(s); > > + > > for (i = 0; i < poll_cnt; i++) { > > process_packets = > > dp_netdev_process_rxq_port(pmd, poll_list[i].rxq, > > poll_list[i].port_no); > > - iter_packets += process_packets; > > + rx_packets += process_packets; > > } > > > > - if (!iter_packets) { > > + if (!rx_packets) { > > /* We didn't receive anything in the process loop. > > * Check if we need to send something. > > * There was no time updates on current iteration. */ > > pmd_thread_ctx_time_update(pmd); > > - iter_packets += dp_netdev_pmd_flush_output_packets(pmd, false); > > + tx_packets = dp_netdev_pmd_flush_output_packets(pmd, false); > > } > > > > if (lc++ > 1024) { > > @@ -4153,7 +4298,8 @@ reload: > > break; > > } > > } > > - pmd_perf_end_iteration(s, iter_packets); > > + pmd_perf_end_iteration(s, rx_packets, tx_packets, > > + pmd_perf_metrics_enabled(pmd)); > > } > > > > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); > > @@ -5050,6 +5196,7 @@ handle_packet_upcall(struct > > dp_netdev_pmd_thread *pmd, > > struct match match; > > ovs_u128 ufid; > > int error; > > + uint64_t cycles = cycles_counter_update(&pmd->perf_stats); > > > > match.tun_md.valid = false; > > miniflow_expand(&key->mf, &match.flow); > > @@ -5103,6 +5250,14 @@ handle_packet_upcall(struct > > dp_netdev_pmd_thread *pmd, > > ovs_mutex_unlock(&pmd->flow_mutex); > > emc_probabilistic_insert(pmd, key, netdev_flow); > > } > > + if (pmd_perf_metrics_enabled(pmd)) { > > + /* Update upcall stats. */ > > + cycles = cycles_counter_update(&pmd->perf_stats) - cycles; > > + struct pmd_perf_stats *s = &pmd->perf_stats; > > + s->current.upcalls++; > > + s->current.upcall_cycles += cycles; > > + histogram_add_sample(&s->cycles_per_upcall, cycles); > > + } > > return error; > > } > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 4200556..7a8fdc2 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -36,6 +36,7 @@ > > #include <rte_mbuf.h> > > #include <rte_meter.h> > > #include <rte_pci.h> > > +#include <rte_version.h> > > #include <rte_vhost.h> > > #include <rte_version.h> > > > > @@ -188,11 +189,6 @@ enum { DPDK_RING_SIZE = 256 }; > > BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE)); > > enum { DRAIN_TSC = 200000ULL }; > > > > -enum dpdk_dev_type { > > - DPDK_DEV_ETH = 0, > > - DPDK_DEV_VHOST = 1, > > -}; > > - > > /* Quality of Service */ > > > > /* An instance of a QoS configuration. Always associated with a particular > > @@ -843,6 +839,13 @@ netdev_dpdk_cast(const struct netdev *netdev) > > return CONTAINER_OF(netdev, struct netdev_dpdk, up); > > } > > > > +enum dpdk_dev_type > > +netdev_dpdk_get_type(const struct netdev *netdev) > > +{ > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + return dev->type; > > +} > > + > > static struct netdev * > > netdev_dpdk_alloc(void) > > { > > diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h > > index b7d02a7..2b357db 100644 > > --- a/lib/netdev-dpdk.h > > +++ b/lib/netdev-dpdk.h > > @@ -22,11 +22,18 @@ > > #include "openvswitch/compiler.h" > > > > struct dp_packet; > > +struct netdev; > > + > > +enum dpdk_dev_type { > > + DPDK_DEV_ETH = 0, > > + DPDK_DEV_VHOST = 1, > > +}; > > > > #ifdef DPDK_NETDEV > > > > void netdev_dpdk_register(void); > > void free_dpdk_buf(struct dp_packet *); > > +enum dpdk_dev_type netdev_dpdk_get_type(const struct netdev > > *netdev); > > > > #else > > > > @@ -41,6 +48,13 @@ free_dpdk_buf(struct dp_packet *buf OVS_UNUSED) > > /* Nothing */ > > } > > > > +static inline enum dpdk_dev_type > > +netdev_dpdk_get_type(const struct netdev *netdev OVS_UNUSED) > > +{ > > + /* Nothing to do. Return value zero to make compiler happy. */ > > + return DPDK_DEV_ETH; > > +} > > + > > #endif > > > > #endif /* netdev-dpdk.h */ > > diff --git a/lib/netdev-dpif-unixctl.man b/lib/netdev-dpif-unixctl.man > > new file mode 100644 > > index 0000000..53f4c51 > > --- /dev/null > > +++ b/lib/netdev-dpif-unixctl.man > > @@ -0,0 +1,113 @@ > > +.SS "DPIF-NETDEV COMMANDS" > > +These commands are used to expose internal information (mostly statistics) > > +about the "dpif-netdev" userspace datapath. If there is only one datapath > > +(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR > > +argument can be omitted. By default the commands present data for all > > pmd > > +threads in the datapath. By specifying the "-pmd Core" option one can > > filter > > +the output for a single pmd in the datapath. > > +. > > +.IP "\fBdpif-netdev/pmd-stats-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" > > +Shows performance statistics for one or all pmd threads of the datapath > > +\fIdp\fR. The special thread "main" sums up the statistics of every non pmd > > +thread. > > + > > +The sum of "emc hits", "masked hits" and "miss" is the number of > > +packet lookups performed by the datapath. Beware that a recirculated > > packet > > +experiences one additional lookup per recirculation, so there may be > > +more lookups than forwarded packets in the datapath. > > + > > +Cycles are counted using the TSC or similar facilities (when available on > > +the platform). The duration of one cycle depends on the processing > > platform. > > + > > +"idle cycles" refers to cycles spent in PMD iterations not forwarding any > > +any packets. "processing cycles" refers to cycles spent in PMD iterations > > +forwarding at least one packet, including the cost for polling, processing > > and > > +transmitting said packets. > > + > > +To reset these counters use \fBdpif-netdev/pmd-stats-clear\fR. > > +. > > +.IP "\fBdpif-netdev/pmd-perf-show\fR [\fB-nh\fR] [\fB-it\fR \fIiter_len\fR] > > \ > > +[\fB-ms\fR \fIms_len\fR] [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" > > +Shows detailed performance metrics for one or all pmds threads of the > > +user space datapath. > > + > > +The collection of detailed statistics can be controlled by a new > > +configuration parameter "other_config:pmd-perf-metrics". By default it > > +is disabled. The run-time overhead, when enabled, is in the order of 1%. > > + > > +The covered metrics per iteration are: > > + - used cycles > > + - forwared packets > > + - number of rx batches > > + - packets/rx batch > > + - max. vhostuser queue fill level > > + - number of upcalls > > + - cycles spent in upcalls > > + > > +This raw recorded data is used threefold: > > + > > +1. In histograms for each of the following metrics: > > + - cycles/iteration (logarithmic) > > + - packets/iteration (logarithmic) > > + - cycles/packet > > + - packets/batch > > + - max. vhostuser qlen (logarithmic) > > + - upcalls > > + - cycles/upcall (logarithmic) > > + The histograms bins are divided linear or logarithmic. > > + > > +2. A cyclic history of the above metrics for 1024 iterations > > + > > +3. A cyclic history of the cummulative/average values per millisecond > > + wall clock for the last 1024 milliseconds: > > + - number of iterations > > + - avg. cycles/iteration > > + - packets (Kpps) > > + - avg. packets/batch > > + - avg. max vhost qlen > > + - upcalls > > + - avg. cycles/upcall > > + > > +The command options are > > + > > + \fB-nh\fR: Suppress the histograms > > + \fB-it\fR \fIiter_len\fR: Display the last iter_len iteration stats > > + \fB-ms\fR \fIms_len\fR: Display the last ms_len millisecond stats > > + > > +The output always contains the following global PMD statistics: > > + > > +Time: 15:24:55.270 .br > > +Measurement duration: 1.008 s > > + > > +pmd thread numa_id 0 core_id 1: > > + > > + Cycles: 2419034712 (2.40 GHz) > > + Iterations: 572817 (1.76 us/it) > > + - idle: 486808 (15.9 % cycles) > > + - busy: 86009 (84.1 % cycles) > > + Packets: 2399607 (2381 Kpps, 848 cycles/pkt) > > + Datapath passes: 3599415 (1.50 passes/pkt) > > + - EMC hits: 336472 ( 9.3 %) > > + - Megaflow hits: 3262943 (90.7 %, 1.00 subtbl lookups/hit) > > + - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) > > + - Lost upcalls: 0 ( 0.0 %) > > + > > +Here "Packets" actually reflects the number of packets forwarded by the > > +datapath. "Datapath passes" matches the number of packet lookups as > > +reported by the \fBdpif-netdev/pmd-stats-show\fR command. > > + > > +To reset the counters and start a new measurement use > > +\fBdpif-netdev/pmd-stats-clear\fR. > > +. > > +.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]" > > +Resets to zero the per pmd thread performance numbers shown by the > > +\fBdpif-netdev/pmd-stats-show\fR and \Bdpif-netdev/pmd-perf-show \fR > > commands. > > +It will NOT reset datapath or bridge statistics, only the values shown by > > +the above commands. > > +. > > +.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fB-pmd\fR \fIcore\fR] [\fIdp\fR]" > > +For one or all pmd threads of the datapath \fIdp\fR show the list of queue- > > ids > > +with port names, which this thread polls. > > +. > > +.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" > > +Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current > > usage. > > diff --git a/manpages.mk b/manpages.mk > > index 351155f..9af2fa8 100644 > > --- a/manpages.mk > > +++ b/manpages.mk > > @@ -256,6 +256,7 @@ vswitchd/ovs-vswitchd.8: \ > > lib/dpctl.man \ > > lib/memory-unixctl.man \ > > lib/netdev-dpdk-unixctl.man \ > > + lib/netdev-dpif-unixctl.man \ > > lib/service.man \ > > lib/ssl-bootstrap.man \ > > lib/ssl.man \ > > @@ -272,6 +273,7 @@ lib/daemon.man: > > lib/dpctl.man: > > lib/memory-unixctl.man: > > lib/netdev-dpdk-unixctl.man: > > +lib/netdev-dpif-unixctl.man: > > lib/service.man: > > lib/ssl-bootstrap.man: > > lib/ssl.man: > > diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in > > index 80e5f53..7e4714a 100644 > > --- a/vswitchd/ovs-vswitchd.8.in > > +++ b/vswitchd/ovs-vswitchd.8.in > > @@ -256,32 +256,7 @@ type). > > .. > > .so lib/dpctl.man > > . > > -.SS "DPIF-NETDEV COMMANDS" > > -These commands are used to expose internal information (mostly statistics) > > -about the ``dpif-netdev'' userspace datapath. If there is only one datapath > > -(as is often the case, unless \fBdpctl/\fR commands are used), the \fIdp\fR > > -argument can be omitted. > > -.IP "\fBdpif-netdev/pmd-stats-show\fR [\fIdp\fR]" > > -Shows performance statistics for each pmd thread of the datapath \fIdp\fR. > > -The special thread ``main'' sums up the statistics of every non pmd thread. > > -The sum of ``emc hits'', ``masked hits'' and ``miss'' is the number of > > -packets received by the datapath. Cycles are counted using the TSC or > > similar > > -facilities (when available on the platform). To reset these counters use > > -\fBdpif-netdev/pmd-stats-clear\fR. The duration of one cycle depends on > > the > > -measuring infrastructure. ``idle cycles'' refers to cycles spent polling > > -devices but not receiving any packets. ``processing cycles'' refers to > > cycles > > -spent polling devices and successfully receiving packets, plus the cycles > > -spent processing said packets. > > -.IP "\fBdpif-netdev/pmd-stats-clear\fR [\fIdp\fR]" > > -Resets to zero the per pmd thread performance numbers shown by the > > -\fBdpif-netdev/pmd-stats-show\fR command. It will NOT reset datapath or > > -bridge statistics, only the values shown by the above command. > > -.IP "\fBdpif-netdev/pmd-rxq-show\fR [\fIdp\fR]" > > -For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with > > -port names, which this thread polls. > > -.IP "\fBdpif-netdev/pmd-rxq-rebalance\fR [\fIdp\fR]" > > -Reassigns rxqs to pmds in the datapath \fIdp\fR based on their current > > usage. > > -. > > +.so lib/netdev-dpif-unixctl.man > > .so lib/netdev-dpdk-unixctl.man > > .so ofproto/ofproto-dpif-unixctl.man > > .so ofproto/ofproto-unixctl.man > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 61fb7b1..3aa8e8e 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -375,6 +375,18 @@ > > </p> > > </column> > > > > + <column name="other_config" key="pmd-perf-metrics" > > + type='{"type": "boolean"}'> > > + <p> > > + Enables recording of detailed PMD performance metrics for > > analysis > > + and trouble-shooting. This can have a performance impact in the > > + order of 1%. > > + </p> > > + <p> > > + Defaults to false but can be changed at any time. > > + </p> > > + </column> > > + > > <column name="other_config" key="n-handler-threads" > > type='{"type": "integer", "minInteger": 1}'> > > <p> > > -- > > 1.9.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev