Unconditional insertion of EMC entries results in EMC thrashing at high numbers of parallel flows. When this occurs, the performance of the EMC often falls below that of the dpcls classifier, rendering the EMC practically useless.
Instead of unconditionally inserting entries into the EMC when a miss occurs, use a 1% probability of insertion. This ensures that the most frequent flows have the highest chance of creating an entry in the EMC, and the probability of thrashing the EMC is also greatly reduced. The probability of insertion is configurable, via the other_config:emc-insert-inv-prob option. This value sets the average probability of insertion to 1/emc-insert-inv-prob. For example the following command changes the insertion proability to (on average) 1 in every 20 packets ie. 1/20 ie. 5%. ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=20 Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> Signed-off-by: Georg Schmuecking <georg.schmueck...@ericsson.com> Co-authored-by: Georg Schmuecking <georg.schmueck...@ericsson.com> --- v9: - Revert back to original 1/N formula for configuring the probability (don't use percentiles). - Rename option to reflect inverse probability & update documentation to make the wording clearer. v8: - Floating point precision percentiles - Moved NEWS entry to post-2.7 section and is no longer in the DPDK specific section. v7: - Remove further code duplication v6: - Refactor the code to remove duplication around calls to emc_probabilistic_insert() v5: - Use percentiles for emc-insert-prob (0-100%) - Update docs to reflect the option not exclusive to the DPDK datapath. v4: - Added Georg to Authors file - Set emc-insert-prob=1 for 'PMD - stats' unit test - Use read_relaxed on atomic var - Correctly allow for 0 and 100% probababilites - Cache align new element in dp_netdev struct - Revert to default probability if invalid emc-insert-prob set - Allow configurability for non-DPDK case v3: - Use new dpif other_config infrastructure to tidy up how the emc-insert-prob value is passed to dpif-netdev. v2: - Enable probability configurability via other_config:emc-insert-prob option. AUTHORS.rst | 1 + Documentation/howto/dpdk.rst | 20 ++++++++++++++++++ NEWS | 2 ++ lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++++++++++---- tests/pmd.at | 1 + vswitchd/vswitch.xml | 17 +++++++++++++++ 6 files changed, 86 insertions(+), 4 deletions(-) diff --git a/AUTHORS.rst b/AUTHORS.rst index f247df5..9a37423 100644 --- a/AUTHORS.rst +++ b/AUTHORS.rst @@ -385,6 +385,7 @@ Eric Lopez elo...@nicira.com Frido Roose fr.ro...@gmail.com Gaetano Catalli gaetano.cata...@gmail.com Gavin Remaley gavin_rema...@selinc.com +Georg Schmuecking georg.schmueck...@ericsson.com George Shuklin ama...@desunote.ru Gerald Rogers gerald.rog...@intel.com Ghanem Bahri bahri.gha...@gmail.com diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index d1e6e89..52cb3fc 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -354,6 +354,26 @@ the `DPDK documentation Note: Not all DPDK virtual PMD drivers have been tested and verified to work. +EMC Insertion Probability +------------------------- +By default 1 in every 100 flows are inserted into the Exact Match Cache (EMC). +It is possible to change this insertion probability by setting the +``emc-insert-inv-prob`` option:: + + $ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N + +where: + +``N`` + is a positive integer representing the inverse probability of insertion ie. + on average 1 in every N packets with a unique flow will generate an EMC + insertion. + +If ``N`` is set to 1, an insertion will be performed for every flow. If set to +0, no insertions will be performed and the EMC will effectively be disabled. + +For more information on the EMC refer to :doc:`/intro/install/dpdk` . + .. _dpdk-ovs-in-guest: OVS with DPDK Inside VMs diff --git a/NEWS b/NEWS index aebd99c..b14c76d 100644 --- a/NEWS +++ b/NEWS @@ -3,6 +3,8 @@ Post-v2.7.0 - Tunnels: * Added support to set packet mark for tunnel endpoint using `egress_pkt_mark` OVSDB option. + - EMC insertion probability is reduced to 1% and is configurable via + the new 'other_config:emc-insert-inv-prob' option. v2.7.0 - xx xxx xxxx --------------------- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0be5db5..54af4ac 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -144,6 +144,11 @@ struct netdev_flow_key { #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) #define EM_FLOW_HASH_SEGS 2 +/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ + DEFAULT_EM_FLOW_INSERT_INV_PROB) + struct emc_entry { struct dp_netdev_flow *flow; struct netdev_flow_key key; /* key.hash used for emc hash value. */ @@ -254,6 +259,9 @@ struct dp_netdev { uint64_t last_tnl_conf_seq; struct conntrack conntrack; + + /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/ + OVS_ALIGNED_VAR(CACHE_LINE_SIZE)atomic_uint32_t emc_insert_min; }; static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp, @@ -1066,6 +1074,8 @@ create_dp_netdev(const char *name, const struct dpif_class *class, conntrack_init(&dp->conntrack); + atomic_init(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN); + cmap_init(&dp->poll_threads); ovs_mutex_init_recursive(&dp->non_pmd_mutex); ovsthread_key_create(&dp->per_pmd_key, NULL); @@ -1943,6 +1953,27 @@ emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key, emc_change_entry(to_be_replaced, flow, key); } +static inline void +emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, + const struct netdev_flow_key *key, + struct dp_netdev_flow *flow) +{ + /* Insert an entry into the EMC based on probability value 'min'. By + * default the value is UINT32_MAX / 100 which yields an insertion + * probability of 1/100 ie. 1% */ + + uint32_t min; + atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); + +#ifdef DPDK_NETDEV + if (min && (key->hash ^ (uint32_t)pmd->last_cycles) <= min) { +#else + if (min && (key->hash ^ random_uint32()) <= min) { +#endif + emc_insert(&pmd->flow_cache, key, flow); + } +} + static inline struct dp_netdev_flow * emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key) { @@ -2731,6 +2762,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) { struct dp_netdev *dp = get_dp_netdev(dpif); const char *cmask = smap_get(other_config, "pmd-cpu-mask"); + int insert_prob = smap_get_int(other_config, "emc-insert-inv-prob", -1); if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) { free(dp->pmd_cmask); @@ -2738,6 +2770,17 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) dp_netdev_request_reconfigure(dp); } + if (insert_prob >= 0 && insert_prob <= UINT32_MAX) { + uint32_t insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob; + uint32_t cur_min; + atomic_read_relaxed(&dp->emc_insert_min, &cur_min); + if (insert_min != cur_min) { + atomic_store_relaxed(&dp->emc_insert_min, insert_min); + } + } else { + atomic_store_relaxed(&dp->emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN); + } + return 0; } @@ -4193,8 +4236,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet, add_actions->size); } ovs_mutex_unlock(&pmd->flow_mutex); - - emc_insert(&pmd->flow_cache, key, netdev_flow); + emc_probabilistic_insert(pmd, key, netdev_flow); } } @@ -4217,7 +4259,6 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, struct dpcls *cls; struct dpcls_rule *rules[PKT_ARRAY_SIZE]; struct dp_netdev *dp = pmd->dp; - struct emc_cache *flow_cache = &pmd->flow_cache; int miss_cnt = 0, lost_cnt = 0; int lookup_cnt = 0, add_lookup_cnt; bool any_miss; @@ -4288,7 +4329,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, flow = dp_netdev_flow_cast(rules[i]); - emc_insert(flow_cache, &keys[i], flow); + emc_probabilistic_insert(pmd, &keys[i], flow); dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } diff --git a/tests/pmd.at b/tests/pmd.at index 35dbcfe..5686bed 100644 --- a/tests/pmd.at +++ b/tests/pmd.at @@ -158,6 +158,7 @@ CHECK_PMD_THREADS_CREATED() AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg]) AT_CHECK([ovs-ofctl add-flow br0 action=normal]) +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=1]) sleep 1 diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 146a816..782417f 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -335,6 +335,23 @@ datapaths. </p> </column> + + <column name="other_config" key="emc-insert-inv-prob" + type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'> + <p> + Specifies the inverse probability (1/emc-insert-inv-prob) of a flow + being inserted into the Exact Match Cache (EMC). On average one in + every <code>emc-insert-inv-prob</code> packets that generate a unique + flow will cause an insertion into the EMC. + + A value of 1 will result in an insertion for every flow (1/1 = 100%) + whereas a value of zero will result in no insertions and essentially + disable the EMC. + </p> + <p> + Defaults to 100 ie. there is (1/100 =) 1% chance of EMC insertion. + </p> + </column> </group> <group title="Status"> -- 2.4.11 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev