Hi Jan, I added some comments prefixed with [Antonio F]. Thanks, Antonio
> -----Original Message----- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jan > Scheurich > Sent: Friday, July 15, 2016 5:35 PM > To: dev@openvswitch.org > Subject: [ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with > sorted subtables > > This turns the previous RFC PATCH dpif-netdev: dpcls per in_port with > sorted > subtables into a non-RFC patch v2. > > The user-space datapath (dpif-netdev) consists of a first level > "exact match > cache" (EMC) matching on 5-tuples and the normal megaflow classifier. > With > many parallel packet flows (e.g. TCP connections) the EMC becomes > inefficient > and the OVS forwarding performance is determined by the megaflow > classifier. > > The megaflow classifier (dpcls) consists of a variable number of hash > tables > (aka subtables), each containing megaflow entries with the same mask > of > packet header and metadata fields to match upon. A dpcls lookup > matches a > given packet against all subtables in sequence until it hits a match. > As > megaflow cache entries are by construction non-overlapping, the first > match > is the only match. > > Today the order of the subtables in the dpcls is essentially random > so that > on average a dpcsl lookup has to visit N/2 subtables for a hit, when > N is the > total number of subtables. Even though every single hash-table lookup > is > fast, the performance of the current dpcls degrades when there are > many > subtables. > > How does the patch address this issue: > > In reality there is often a strong correlation between the ingress > port and a > small subset of subtables that have hits. The entire megaflow cache > typically > decomposes nicely into partitions that are hit only by packets > entering from > a range of similar ports (e.g. traffic from Phy -> VM vs. traffic > from VM -> > Phy). > > Therefore, maintaining a separate dpcls instance per ingress port > with its > subtable vector sorted by frequency of hits reduces the average > number of > subtables lookups in the dpcls to a minimum, even if the total number > of > subtables gets large. This is possible because megaflows always have > an exact > match on in_port, so every megaflow belongs to unique dpcls instance. > > For thread safety, the PMD thread needs to block out revalidators > during the > periodic optimization. We use ovs_mutex_trylock() to avoid blocking > the PMD. > > To monitor the effectiveness of the patch we have enhanced the ovs- > appctl > dpif-netdev/pmd-stats-show command with an extra line "avg. subtable > lookups > per hit" to report the average number of subtable lookup needed for a > megaflow match. Ideally, this should be close to 1 and almost all > cases much > smaller than N/2. > > I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay > mesh. > With pure L3 tenant traffic between VMs on different nodes the > resulting > netdev dpcls contains N=4 subtables. > > Disabling the EMC, I have measured a baseline performance (in+out) of > ~1.32 > Mpps (64 bytes, 1000 L4 flows). The average number of subtable > lookups per > dpcls match is 2.5. > > With the patch the average number of subtable lookups per dpcls match > is > reduced 1 and the forwarding performance grows by ~30% to 1.72 Mpps. > > As the actual number of subtables will often be higher in reality, we > can > assume that this is at the lower end of the speed-up one can expect > from this > optimization. Just running a parallel ping between the VXLAN tunnel > endpoints > increases the number of subtables and hence the average number of > subtable > lookups from 2.5 to 3.5 with a corresponding decrease of throughput > to 1.14 > Mpps. With the patch the parallel ping has no impact on average > number of > subtable lookups and performance. The performance gain is then ~50%. > > The main change to the previous patch is that instead of having a > subtable > vector per in_port in a single dplcs instance, we now have one dpcls > instance > with a single subtable per ingress port. This is better aligned with > the > design base code and also improves the number of subtable lookups in > a miss > case. > > The PMD tests have been adjusted to the additional line in pmd-stats- > show. > > Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> > > > Changes in v2: > - Rebased to master (commit 3041e1fc9638) > - Take the pmd->flow_mutex during optimization to block out > revalidators > Use trylock in order to not block the PMD thread > - Made in_port an explicit input parameter to fast_path_processing() > - Fixed coding style issues > > > --- > > lib/dpif-netdev.c | 195 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------- > ---- > tests/pmd.at | 6 +++-- > 2 files changed, 173 insertions(+), 28 deletions(-) > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e0107b7..e7eaab0 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -149,7 +149,11 @@ struct emc_cache { > ^L > /* Simple non-wildcarding single-priority classifier. */ > > +#define DPCLS_OPTIMIZATION_INTERVAL 1000 [Antonio F] I'd prefer to add a comment to specify if these are sec, msec,.. > + > struct dpcls { > + struct cmap_node node; > + odp_port_t in_port; > struct cmap subtables_map; > struct pvector subtables; > }; > @@ -164,12 +168,14 @@ struct dpcls_rule { > > static void dpcls_init(struct dpcls *); > static void dpcls_destroy(struct dpcls *); > +static void dpcls_optimize(struct dpcls *); > static void dpcls_insert(struct dpcls *, struct dpcls_rule *, > const struct netdev_flow_key *mask); > static void dpcls_remove(struct dpcls *, struct dpcls_rule *); > -static bool dpcls_lookup(const struct dpcls *cls, > +static bool dpcls_lookup(struct dpcls *cls, > const struct netdev_flow_key keys[], > - struct dpcls_rule **rules, size_t cnt); > + struct dpcls_rule **rules, size_t cnt, > + int *num_lookups_p); > ^L > /* Datapath based on the network device interface from netdev.h. > * > @@ -239,6 +245,7 @@ enum dp_stat_type { > DP_STAT_MASKED_HIT, /* Packets that matched in the flow > table. */ > DP_STAT_MISS, /* Packets that did not match. */ > DP_STAT_LOST, /* Packets not passed up to the > client. */ > + DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for > flow table hits */ [Antonio F] Line longer than 80 chars. > DP_N_STATS > }; > > @@ -419,15 +426,19 @@ struct dp_netdev_pmd_thread { > * will only be accessed by its own pmd thread. */ > struct emc_cache flow_cache; > > - /* Classifier and Flow-Table. > + /* Flow-Table and classifiers > * > * Writers of 'flow_table' must take the 'flow_mutex'. > Corresponding > - * changes to 'cls' must be made while still holding the > 'flow_mutex'. > + * changes to 'classifiesr' must be made while still holding the > 'flow_mutex'. [Antonio F] Line longer than 80 chars. [Antonio F] Trivial typo: 'classifiers' > */ > struct ovs_mutex flow_mutex; > - struct dpcls cls; > struct cmap flow_table OVS_GUARDED; /* Flow table. */ > > + /* One classifier per in_port polled by the pmd */ > + struct cmap classifiers; > + /* Periodically sort subtable vectors according to hit > frequencies */ > + long long int next_optimization; > + > /* Statistics. */ > struct dp_netdev_pmd_stats stats; > > @@ -540,6 +551,8 @@ static void dp_netdev_pmd_unref(struct > dp_netdev_pmd_thread *pmd); > static void dp_netdev_pmd_flow_flush(struct dp_netdev_pmd_thread > *pmd); > static void pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) > OVS_REQUIRES(pmd->port_mutex); > +static inline void > +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd); > > static inline bool emc_entry_alive(struct emc_entry *ce); > static void emc_clear_entry(struct emc_entry *ce); > @@ -659,8 +672,10 @@ pmd_info_show_stats(struct ds *reply, > > ds_put_format(reply, > "\temc hits:%llu\n\tmegaflow hits:%llu\n" > + "\tavg. subtable lookups per hit:%.2f\n" > "\tmiss:%llu\n\tlost:%llu\n", > stats[DP_STAT_EXACT_HIT], > stats[DP_STAT_MASKED_HIT], > + stats[DP_STAT_MASKED_HIT] > 0 ? > (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT] : 0, [Antonio F] Line longer than 80 chars. > stats[DP_STAT_MISS], stats[DP_STAT_LOST]); > > if (total_cycles == 0) { > @@ -1503,14 +1518,51 @@ dp_netdev_flow_hash(const ovs_u128 *ufid) > return ufid->u32[0]; > } > > +static inline struct dpcls * > +dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, > + odp_port_t in_port) > +{ > + struct dpcls *cls; > + uint32_t hash = hash_port_no(in_port); > + CMAP_FOR_EACH_WITH_HASH (cls, node, hash, &pmd->classifiers) { > + if (cls->in_port == in_port) { > + /* Port classifier exists already */ > + return cls; > + } > + } > + return NULL; > +} > + > +static inline struct dpcls * > +dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd, > + odp_port_t in_port) > +{ > + struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > + uint32_t hash = hash_port_no(in_port); > + > + if (!cls) { > + /* Create new classifier for in_port */ > + cls = xmalloc(sizeof(*cls)); > + dpcls_init(cls); > + cls->in_port = in_port; > + cmap_insert(&pmd->classifiers, &cls->node, hash); > + VLOG_DBG("Creating dpcls %p for in_port %d", cls, in_port); > + } > + return cls; > +} > + > static void > dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd, > struct dp_netdev_flow *flow) > OVS_REQUIRES(pmd->flow_mutex) > { > struct cmap_node *node = CONST_CAST(struct cmap_node *, &flow- > >node); > + struct dpcls *cls; > + odp_port_t in_port = flow->flow.in_port.odp_port; > > - dpcls_remove(&pmd->cls, &flow->cr); > + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > + ovs_assert(cls != NULL); > + dpcls_remove(cls, &flow->cr); > cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow- > >ufid)); > flow->dead = true; > > @@ -1857,15 +1909,20 @@ emc_lookup(struct emc_cache *cache, const > struct netdev_flow_key *key) > } > > static struct dp_netdev_flow * > -dp_netdev_pmd_lookup_flow(const struct dp_netdev_pmd_thread *pmd, > - const struct netdev_flow_key *key) > +dp_netdev_pmd_lookup_flow(struct dp_netdev_pmd_thread *pmd, > + const struct netdev_flow_key *key, > + int *lookup_num_p) > { > - struct dp_netdev_flow *netdev_flow; > + struct dpcls *cls; > struct dpcls_rule *rule; > + odp_port_t in_port = MINIFLOW_GET_U32(&key->mf, in_port); > + struct dp_netdev_flow *netdev_flow = NULL; > > - dpcls_lookup(&pmd->cls, key, &rule, 1); > - netdev_flow = dp_netdev_flow_cast(rule); > - > + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > + if (OVS_LIKELY(cls)) { > + dpcls_lookup(cls, key, &rule, 1, lookup_num_p); > + netdev_flow = dp_netdev_flow_cast(rule); > + } > return netdev_flow; > } > > @@ -2098,6 +2155,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread > *pmd, > { > struct dp_netdev_flow *flow; > struct netdev_flow_key mask; > + struct dpcls *cls; > + odp_port_t in_port = match->flow.in_port.odp_port; > > netdev_flow_mask_init(&mask, match); > /* Make sure wc does not have metadata. */ > @@ -2116,7 +2175,11 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread > *pmd, > ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions, > actions_len)); > > netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, > &mask); > - dpcls_insert(&pmd->cls, &flow->cr, &mask); > + > + /* Select dpcls for in_port. Relies on in_port to be exact match > */ > + ovs_assert(match->wc.masks.in_port.odp_port == UINT32_MAX); > + cls = dp_netdev_pmd_find_dpcls(pmd, in_port); > + dpcls_insert(cls, &flow->cr, &mask); > > cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, > &flow->node), > dp_netdev_flow_hash(&flow->ufid)); > @@ -2197,7 +2260,7 @@ dpif_netdev_flow_put(struct dpif *dpif, const > struct dpif_flow_put *put) > } > > ovs_mutex_lock(&pmd->flow_mutex); > - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &key); > + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &key, NULL); > if (!netdev_flow) { > if (put->flags & DPIF_FP_CREATE) { > if (cmap_count(&pmd->flow_table) < MAX_FLOWS) { > @@ -2872,6 +2935,7 @@ reload: > lc = 0; > > coverage_try_clear(); > + dp_netdev_pmd_try_optimize(pmd); > if (!ovsrcu_try_quiesce()) { > emc_cache_slow_sweep(&pmd->flow_cache); > } > @@ -3033,8 +3097,9 @@ dp_netdev_configure_pmd(struct > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > ovs_mutex_init(&pmd->cond_mutex); > ovs_mutex_init(&pmd->flow_mutex); > ovs_mutex_init(&pmd->port_mutex); > - dpcls_init(&pmd->cls); > cmap_init(&pmd->flow_table); > + cmap_init(&pmd->classifiers); > + pmd->next_optimization = time_msec() + > DPCLS_OPTIMIZATION_INTERVAL; > ovs_list_init(&pmd->poll_list); > hmap_init(&pmd->tx_ports); > hmap_init(&pmd->port_cache); > @@ -3050,10 +3115,16 @@ dp_netdev_configure_pmd(struct > dp_netdev_pmd_thread *pmd, struct dp_netdev *dp, > static void > dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd) > { > + struct dpcls *cls; > + > dp_netdev_pmd_flow_flush(pmd); > - dpcls_destroy(&pmd->cls); > hmap_destroy(&pmd->port_cache); > hmap_destroy(&pmd->tx_ports); > + /* All flows (including their dpcls_rules) have been deleted > already */ > + CMAP_FOR_EACH(cls, node, &pmd->classifiers) { > + dpcls_destroy(cls); > + } > + cmap_destroy(&pmd->classifiers); > cmap_destroy(&pmd->flow_table); > ovs_mutex_destroy(&pmd->flow_mutex); > latch_destroy(&pmd->exit_latch); > @@ -3798,7 +3869,7 @@ handle_packet_upcall(struct > dp_netdev_pmd_thread *pmd, struct dp_packet *packet, > * to be locking everyone out of making flow installs. If > we > * move to a per-core classifier, it would be reasonable. */ > ovs_mutex_lock(&pmd->flow_mutex); > - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key); > + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL); > if (OVS_LIKELY(!netdev_flow)) { > netdev_flow = dp_netdev_flow_add(pmd, &match, &ufid, > add_actions->data, > @@ -3814,7 +3885,8 @@ static inline void > fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dp_packet_batch *packets_, > struct netdev_flow_key *keys, > - struct packet_batch_per_flow batches[], size_t > *n_batches) > + struct packet_batch_per_flow batches[], size_t > *n_batches, > + odp_port_t in_port) > { > int cnt = packets_->count; > #if !defined(__CHECKER__) && !defined(_WIN32) > @@ -3824,10 +3896,12 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST }; > #endif > struct dp_packet **packets = packets_->packets; > + struct dpcls *cls; > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > struct emc_cache *flow_cache = &pmd->flow_cache; > int miss_cnt = 0, lost_cnt = 0; > + int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > size_t i; > > @@ -3835,7 +3909,14 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > /* Key length is needed in all the cases, hash computed on > demand. */ > keys[i].len = > netdev_flow_key_size(miniflow_n_values(&keys[i].mf)); > } > - any_miss = !dpcls_lookup(&pmd->cls, keys, rules, cnt); > + /* Get the classifier for the in_port */ > + cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); > + if (OVS_LIKELY(cls)) { > + any_miss = !dpcls_lookup(cls, keys, rules, cnt, > &lookup_cnt); > + } else { > + any_miss = true; > + memset(rules, 0, sizeof(rules)); > + } > if (OVS_UNLIKELY(any_miss) && !fat_rwlock_tryrdlock(&dp- > >upcall_rwlock)) { > uint64_t actions_stub[512 / 8], slow_stub[512 / 8]; > struct ofpbuf actions, put_actions; > @@ -3853,8 +3934,9 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > /* It's possible that an earlier slow path execution > installed > * a rule covering this flow. In this case, it's a lot > cheaper > * to catch it here than execute a miss. */ > - netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i]); > + netdev_flow = dp_netdev_pmd_lookup_flow(pmd, &keys[i], > &add_lookup_cnt); [Antonio F] Line longer than 80 chars. > if (netdev_flow) { > + lookup_cnt += add_lookup_cnt; > rules[i] = &netdev_flow->cr; > continue; > } > @@ -3893,6 +3975,7 @@ fast_path_processing(struct > dp_netdev_pmd_thread *pmd, > } > > dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); > + dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); [Antonio F] Should we divide lookup_cnt by the processed packets, I mean dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt/cnt); ? > dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt); > dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt); > } > @@ -3919,13 +4002,16 @@ dp_netdev_input__(struct dp_netdev_pmd_thread > *pmd, > struct packet_batch_per_flow batches[PKT_ARRAY_SIZE]; > long long now = time_msec(); > size_t newcnt, n_batches, i; > + odp_port_t in_port; > > n_batches = 0; > newcnt = emc_processing(pmd, packets, keys, batches, &n_batches, > md_is_valid, port_no); > if (OVS_UNLIKELY(newcnt)) { > packets->count = newcnt; > - fast_path_processing(pmd, packets, keys, batches, > &n_batches); > + /* Get ingress port from first packet's metadata. */ > + in_port = packets->packets[0]->md.in_port.odp_port; > + fast_path_processing(pmd, packets, keys, batches, > &n_batches, in_port); > } > > for (i = 0; i < n_batches; i++) { > @@ -3942,14 +4028,14 @@ dp_netdev_input(struct dp_netdev_pmd_thread > *pmd, > struct dp_packet_batch *packets, > odp_port_t port_no) > { > - dp_netdev_input__(pmd, packets, false, port_no); > + dp_netdev_input__(pmd, packets, false, port_no); > } > > static void > dp_netdev_recirculate(struct dp_netdev_pmd_thread *pmd, > struct dp_packet_batch *packets) > { > - dp_netdev_input__(pmd, packets, true, 0); > + dp_netdev_input__(pmd, packets, true, 0); > } > > struct dp_netdev_execute_aux { > @@ -4374,6 +4460,8 @@ struct dpcls_subtable { > > /* These fields are accessed by readers. */ > struct cmap rules; /* Contains "struct dpcls_rule"s. > */ > + uint32_t hit_cnt; /* Number of match hits in subtable > in current > + optimization interval. */ > struct netdev_flow_key mask; /* Wildcards for fields (const). */ > /* 'mask' must be the last field, additional space is allocated > here. */ > }; > @@ -4390,6 +4478,7 @@ dpcls_init(struct dpcls *cls) > static void > dpcls_destroy_subtable(struct dpcls *cls, struct dpcls_subtable > *subtable) > { > + VLOG_DBG("Destroying subtable %p for in_port %d", subtable, cls- > >in_port); > pvector_remove(&cls->subtables, subtable); > cmap_remove(&cls->subtables_map, &subtable->cmap_node, > subtable->mask.hash); > @@ -4424,10 +4513,13 @@ dpcls_create_subtable(struct dpcls *cls, > const struct netdev_flow_key *mask) > subtable = xmalloc(sizeof *subtable > - sizeof subtable->mask.mf + mask->len); > cmap_init(&subtable->rules); > + subtable->hit_cnt = 0; > netdev_flow_key_clone(&subtable->mask, mask); > cmap_insert(&cls->subtables_map, &subtable->cmap_node, mask- > >hash); > + /* Insert new subtable with priority zero (no hits yet) */ > pvector_insert(&cls->subtables, subtable, 0); > pvector_publish(&cls->subtables); > + VLOG_DBG("Creating %lu. subtable %p for in_port %d", > cmap_count(&cls->subtables_map), subtable, cls->in_port); [Antonio F] Line too long. > > return subtable; > } > @@ -4446,6 +4538,43 @@ dpcls_find_subtable(struct dpcls *cls, const > struct netdev_flow_key *mask) > return dpcls_create_subtable(cls, mask); > } > > + > +/* Periodically sort the dpcls subtable vectors according to hit > counts */ > +static inline void > +dpcls_optimize(struct dpcls *cls) [Antonio F] Why not calling this function 'dpcls_sort_subtables' or 'dpcls_upd_subtbl_priority' ? > + OVS_REQUIRES(pmd->flow_mutex) > +{ > + struct dpcls_subtable *subtable; > + > + struct pvector *pvec = &cls->subtables; > + PVECTOR_FOR_EACH (subtable, pvec) { > + pvector_change_priority(pvec, subtable, subtable->hit_cnt); > + subtable->hit_cnt = 0; > + } > + pvector_publish(pvec); > +} > + > +static inline void > +dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd) > +{ > + struct dpcls *cls; > + long long int now = time_msec(); > + > + if (now > pmd->next_optimization) { > + /* Try to obtain the flow lock to block out revalidator > threads. > + * If not possible, just try next time. */ > + if (ovs_mutex_trylock(&pmd->flow_mutex)) { > + /* Optimize each classifier */ > + CMAP_FOR_EACH(cls, node, &pmd->classifiers) { > + dpcls_optimize(cls); > + } > + ovs_mutex_unlock(&pmd->flow_mutex); > + /* Start new measuring interval */ > + pmd->next_optimization = now + > DPCLS_OPTIMIZATION_INTERVAL; > + } > + } > +} > + > /* Insert 'rule' into 'cls'. */ > static void > dpcls_insert(struct dpcls *cls, struct dpcls_rule *rule, > @@ -4453,6 +4582,7 @@ dpcls_insert(struct dpcls *cls, struct > dpcls_rule *rule, > { > struct dpcls_subtable *subtable = dpcls_find_subtable(cls, > mask); > > + /* Refer to subtable's mask, also for later removal. */ > rule->mask = &subtable->mask; > cmap_insert(&subtable->rules, &rule->cmap_node, rule- > >flow.hash); > } > @@ -4465,10 +4595,11 @@ dpcls_remove(struct dpcls *cls, struct > dpcls_rule *rule) > > ovs_assert(rule->mask); > > + /* Get subtable from reference in rule->mask. */ > INIT_CONTAINER(subtable, rule->mask, mask); > - > if (cmap_remove(&subtable->rules, &rule->cmap_node, rule- > >flow.hash) > == 0) { > + /* Delete empty subtable. */ > dpcls_destroy_subtable(cls, subtable); > pvector_publish(&cls->subtables); > } > @@ -4503,8 +4634,9 @@ dpcls_rule_matches_key(const struct dpcls_rule > *rule, > * > * Returns true if all flows found a corresponding rule. */ > static bool > -dpcls_lookup(const struct dpcls *cls, const struct netdev_flow_key > keys[], > - struct dpcls_rule **rules, const size_t cnt) > +dpcls_lookup(struct dpcls *cls, const struct netdev_flow_key keys[], > + struct dpcls_rule **rules, const size_t cnt, > + int *num_lookups_p) > { > /* The batch size 16 was experimentally found faster than 8 or > 32. */ > typedef uint16_t map_type; > @@ -4524,6 +4656,8 @@ dpcls_lookup(const struct dpcls *cls, const > struct netdev_flow_key keys[], > } > memset(rules, 0, cnt * sizeof *rules); > > + int lookups_match = 0, subtable_pos = 1; > + > PVECTOR_FOR_EACH (subtable, &cls->subtables) { > const struct netdev_flow_key *mkeys = keys; > struct dpcls_rule **mrules = rules; > @@ -4556,6 +4690,8 @@ dpcls_lookup(const struct dpcls *cls, const > struct netdev_flow_key keys[], > CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) { > if (OVS_LIKELY(dpcls_rule_matches_key(rule, > &mkeys[i]))) { > mrules[i] = rule; > + subtable->hit_cnt++; [Antonio F] I think we can be enough confident that hit_cnt does not overflow. It's a uint32_d var and the ranking refresh it's every 1000 msec. Even if a port is hitting the same subtable at the line-rate - say 14 Mpps - it's enough less than 2^32-1. So we don't need to check if hit_cnt overflows, or am I missing something? If that's the case, maybe a comment here would help? > + lookups_match += subtable_pos; > goto next; > } > } > @@ -4567,8 +4703,15 @@ dpcls_lookup(const struct dpcls *cls, const > struct netdev_flow_key keys[], > remains |= maps[m]; > } > if (!remains) { > + if (num_lookups_p) { > + *num_lookups_p = lookups_match; > + } > return true; /* All found. */ > } > + subtable_pos++; > + } > + if (num_lookups_p) { > + *num_lookups_p = lookups_match; > } > return false; /* Some misses. */ > } > diff --git a/tests/pmd.at b/tests/pmd.at > index 3216762..c033047 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -162,10 +162,11 @@ dummy@ovs-dummy: hit:0 missed:0 > p0 7/1: (dummy-pmd: configured_rx_queues=4, > configured_tx_queues=<cleared>, requested_rx_queues=4, > requested_tx_queues=<cleared>) > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > emc hits:0 > megaflow hits:0 > + avg. subtable lookups per hit:0.00 > miss:0 > lost:0 > ]) > @@ -188,10 +189,11 @@ AT_CHECK([cat ovs-vswitchd.log | > filter_flow_install | strip_xout], [0], [dnl > > recirc_id(0),in_port(1),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01: > 78),eth_type(0x0800),ipv4(frag=no), actions: <del> > ]) > > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 4], [0], [dnl > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl > pmd thread numa_id <cleared> core_id <cleared>: > emc hits:19 > megaflow hits:0 > + avg. subtable lookups per hit:0.00 > miss:1 > lost:0 > ]) > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev