On 02/14/2017 11:46 AM, Ciara Loftus wrote: > Unconditional insertion of EMC entries results in EMC thrashing at high > numbers of parallel flows. When this occurs, the performance of the EMC > often falls below that of the dpcls classifier, rendering the EMC > practically useless. > > Instead of unconditionally inserting entries into the EMC when a miss > occurs, use a 1% probability of insertion. This ensures that the most > frequent flows have the highest chance of creating an entry in the EMC, > and the probability of thrashing the EMC is also greatly reduced. > > The probability of insertion is configurable, via the > other_config:emc-insert-inv-prob option. This value sets the average > probability of insertion to 1/emc-insert-inv-prob. > > For example the following command changes the insertion proability to
typo > (on average) 1 in every 20 packets ie. 1/20 ie. 5%. > > ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=20 > Overall LGTM. Couple of minor comments below. I see around 10% throughput increase for 1M flows. Acked-by: Kevin Traynor <ktray...@redhat.com> > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > Signed-off-by: Georg Schmuecking <georg.schmueck...@ericsson.com> > Co-authored-by: Georg Schmuecking <georg.schmueck...@ericsson.com> > --- > v9: > - Revert back to original 1/N formula for configuring the probability > (don't use percentiles). > - Rename option to reflect inverse probability & update documentation to > make the wording clearer. > v8: > - Floating point precision percentiles > - Moved NEWS entry to post-2.7 section and is no longer in the DPDK > specific section. > v7: > - Remove further code duplication > v6: > - Refactor the code to remove duplication around calls to > emc_probabilistic_insert() > v5: > - Use percentiles for emc-insert-prob (0-100%) > - Update docs to reflect the option not exclusive to the DPDK datapath. > v4: > - Added Georg to Authors file > - Set emc-insert-prob=1 for 'PMD - stats' unit test > - Use read_relaxed on atomic var > - Correctly allow for 0 and 100% probababilites > - Cache align new element in dp_netdev struct > - Revert to default probability if invalid emc-insert-prob set > - Allow configurability for non-DPDK case > v3: > - Use new dpif other_config infrastructure to tidy up how the > emc-insert-prob value is passed to dpif-netdev. > v2: > - Enable probability configurability via other_config:emc-insert-prob > option. > > AUTHORS.rst | 1 + > Documentation/howto/dpdk.rst | 20 ++++++++++++++++++ > NEWS | 2 ++ > lib/dpif-netdev.c | 49 > ++++++++++++++++++++++++++++++++++++++++---- > tests/pmd.at | 1 + > vswitchd/vswitch.xml | 17 +++++++++++++++ > 6 files changed, 86 insertions(+), 4 deletions(-) > > diff --git a/AUTHORS.rst b/AUTHORS.rst > index f247df5..9a37423 100644 > --- a/AUTHORS.rst > +++ b/AUTHORS.rst > @@ -385,6 +385,7 @@ Eric Lopez elo...@nicira.com > Frido Roose fr.ro...@gmail.com > Gaetano Catalli gaetano.cata...@gmail.com > Gavin Remaley gavin_rema...@selinc.com > +Georg Schmuecking georg.schmueck...@ericsson.com > George Shuklin ama...@desunote.ru > Gerald Rogers gerald.rog...@intel.com > Ghanem Bahri bahri.gha...@gmail.com > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > index d1e6e89..52cb3fc 100644 > --- a/Documentation/howto/dpdk.rst > +++ b/Documentation/howto/dpdk.rst > @@ -354,6 +354,26 @@ the `DPDK documentation > > Note: Not all DPDK virtual PMD drivers have been tested and verified to work. > > +EMC Insertion Probability > +------------------------- > +By default 1 in every 100 flows are inserted into the Exact Match Cache > (EMC). > +It is possible to change this insertion probability by setting the > +``emc-insert-inv-prob`` option:: > + > + $ ovs-vsctl --no-wait set Open_vSwitch . > other_config:emc-insert-inv-prob=N > + > +where: > + > +``N`` > + is a positive integer representing the inverse probability of insertion ie. > + on average 1 in every N packets with a unique flow will generate an EMC > + insertion. > + > +If ``N`` is set to 1, an insertion will be performed for every flow. If set > to > +0, no insertions will be performed and the EMC will effectively be disabled. > + > +For more information on the EMC refer to :doc:`/intro/install/dpdk` . > + > .. _dpdk-ovs-in-guest: > > OVS with DPDK Inside VMs > diff --git a/NEWS b/NEWS > index aebd99c..b14c76d 100644 > --- a/NEWS > +++ b/NEWS > @@ -3,6 +3,8 @@ Post-v2.7.0 > - Tunnels: > * Added support to set packet mark for tunnel endpoint using > `egress_pkt_mark` OVSDB option. > + - EMC insertion probability is reduced to 1% and is configurable via > + the new 'other_config:emc-insert-inv-prob' option. > > v2.7.0 - xx xxx xxxx > --------------------- > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 0be5db5..54af4ac 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -144,6 +144,11 @@ struct netdev_flow_key { > #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) > #define EM_FLOW_HASH_SEGS 2 > > +/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ > +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 > +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ > + DEFAULT_EM_FLOW_INSERT_INV_PROB) > + > struct emc_entry { > struct dp_netdev_flow *flow; > struct netdev_flow_key key; /* key.hash used for emc hash value. */ > @@ -254,6 +259,9 @@ struct dp_netdev { > uint64_t last_tnl_conf_seq; > > struct conntrack conntrack; > + > + /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ > + OVS_ALIGNED_VAR(CACHE_LINE_SIZE)atomic_uint32_t emc_insert_min; > }; > > static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev > *dp, > @@ -1066,6 +1074,8 @@ create_dp_netdev(const char *name, const struct > dpif_class *class, > > conntrack_init(&dp->conntrack); > > + atomic_init(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN); > + > cmap_init(&dp->poll_threads); > ovs_mutex_init_recursive(&dp->non_pmd_mutex); > ovsthread_key_create(&dp->per_pmd_key, NULL); > @@ -1943,6 +1953,27 @@ emc_insert(struct emc_cache *cache, const struct > netdev_flow_key *key, > emc_change_entry(to_be_replaced, flow, key); > } > > +static inline void > +emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, > + const struct netdev_flow_key *key, > + struct dp_netdev_flow *flow) > +{ > + /* Insert an entry into the EMC based on probability value 'min'. By > + * default the value is UINT32_MAX / 100 which yields an insertion > + * probability of 1/100 ie. 1% */ > + > + uint32_t min; > + atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); > + > +#ifdef DPDK_NETDEV > + if (min && (key->hash ^ (uint32_t)pmd->last_cycles) <= min) { > +#else > + if (min && (key->hash ^ random_uint32()) <= min) { > +#endif > + emc_insert(&pmd->flow_cache, key, flow); > + } > +} > + > static inline struct dp_netdev_flow * > emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) > { > @@ -2731,6 +2762,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct > smap *other_config) > { > struct dp_netdev *dp = get_dp_netdev(dpif); > const char *cmask = smap_get(other_config, "pmd-cpu-mask"); > + int insert_prob = smap_get_int(other_config, "emc-insert-inv-prob", -1); > > if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) { > free(dp->pmd_cmask); > @@ -2738,6 +2770,17 @@ dpif_netdev_set_config(struct dpif *dpif, const struct > smap *other_config) > dp_netdev_request_reconfigure(dp); > } > > + if (insert_prob >= 0 && insert_prob <= UINT32_MAX) { > + uint32_t insert_min = insert_prob == 0 ? 0 : UINT32_MAX / > insert_prob; > + uint32_t cur_min; > + atomic_read_relaxed(&dp->emc_insert_min, &cur_min); > + if (insert_min != cur_min) { > + atomic_store_relaxed(&dp->emc_insert_min, insert_min); > + } > + } else { > + atomic_store_relaxed(&dp->emc_insert_min, > DEFAULT_EM_FLOW_INSERT_MIN); > + } > + You could 'if' on just setting the insert_min, and combine 'read, if !=cur_min, store'. That would avoid storing emc_insert_min every time another parameter is changed and emc-inv-prob is not in the db (the default). You would still have to do a read instead of the store, so I'm not sure if it's much/any better. A VLOG_INFO of a new setting might be useful also, especially where someone sets it =0. > return 0; > } > > @@ -4193,8 +4236,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, > struct dp_packet *packet, > add_actions->size); > } > ovs_mutex_unlock(&pmd->flow_mutex); > - > - emc_insert(&pmd->flow_cache, key, netdev_flow); > + emc_probabilistic_insert(pmd, key, netdev_flow); > } > } > > @@ -4217,7 +4259,6 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dpcls *cls; > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > - struct emc_cache *flow_cache = &pmd->flow_cache; > int miss_cnt = 0, lost_cnt = 0; > int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > @@ -4288,7 +4329,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > > flow = dp_netdev_flow_cast(rules[i]); > > - emc_insert(flow_cache, &keys[i], flow); > + emc_probabilistic_insert(pmd, &keys[i], flow); > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, > n_batches); > } > > diff --git a/tests/pmd.at b/tests/pmd.at > index 35dbcfe..5686bed 100644 > --- a/tests/pmd.at > +++ b/tests/pmd.at > @@ -158,6 +158,7 @@ CHECK_PMD_THREADS_CREATED() > > AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg]) > AT_CHECK([ovs-ofctl add-flow br0 action=normal]) > +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=1]) > > sleep 1 > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 146a816..782417f 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -335,6 +335,23 @@ > datapaths. > </p> > </column> > + > + <column name="other_config" key="emc-insert-inv-prob" > + type='{"type": "integer", "minInteger": 0, "maxInteger": > 4294967295}'> > + <p> > + Specifies the inverse probability (1/emc-insert-inv-prob) of a flow > + being inserted into the Exact Match Cache (EMC). On average one in > + every <code>emc-insert-inv-prob</code> packets that generate a > unique > + flow will cause an insertion into the EMC. > + > + A value of 1 will result in an insertion for every flow (1/1 = > 100%) > + whereas a value of zero will result in no insertions and > essentially > + disable the EMC. > + </p> > + <p> > + Defaults to 100 ie. there is (1/100 =) 1% chance of EMC insertion. > + </p> > + </column> > </group> > > <group title="Status"> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev