Currently, real insertion probability is higher than configured for the maximum case because of wrong usage of the random value.
i.e. if 'emc-invert-inv-prob' == UINT32_MAX, then 'emc_insert_min' equals to 1. In this case we're allowing insert if random vailue is less or equal to 1. So, two of 2**32 values (0 and 1) are allowed and real probability is 2 times higher than configured. This happens because 'random_uint32()' returns value in range [0; UINT32_MAX], but for the checking to be correct we should generate random value in range [0; UINT32_MAX - 1]. To fix this we have 4 possible solutions: 1. need to use uint64_t for 'emc-insert-min' and calculate it as '(UINT32_MAX + 1) / inverse_prob' to fairly check the full range [0; UINT32_MAX]. This may decrease performance becaue of 64 bit atomic ops. 2. Forbid the '2**32 - 1' as the value for 'emc-insert-min' because it's the only value we have issues with. This will require additional explanations and not very friendly for users. 3. Generate random value in range [0; UINT32_MAX - 1]. This will require heavy division operation. 4. Decrease the range of available values for 'emc-insert-inv-prob'. Actually, we don't need to have so much different values for that option. I beleve that values higher than 1M are completely useless. Choosing the upper limit as a power of 2 like 2**20 we will be able to mask the generated random value in a fast way and also avoid range issue, because same uint32_t can be used to store 2**20. This patch implements solution #4. CC: Ciara Loftus <ciara.lof...@intel.com> Fixes: 4c30b24602c3 ("dpif-netdev: Conditional EMC insert") Signed-off-by: Ilya Maximets <i.maxim...@samsung.com> Acked-by: Antonio Fischetti <antonio.fische...@intel.com> --- Infrastructure and logic introduced here will be used for fixing emc replacement policy. lib/dpif-netdev.c | 14 ++++++++++---- vswitchd/vswitch.xml | 3 ++- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8ecc9d1..e23c674 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -153,9 +153,14 @@ struct netdev_flow_key { #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) #define EM_FLOW_HASH_SEGS 2 +/* Set up maximum inverse EMC insertion probability to 2^20 - 1. + * Higher values considered useless in practice. */ +#define EM_FLOW_INSERT_INV_PROB_SHIFT 20 +#define EM_FLOW_INSERT_INV_PROB_MAX (1 << EM_FLOW_INSERT_INV_PROB_SHIFT) +#define EM_FLOW_INSERT_INV_PROB_MASK (EM_FLOW_INSERT_INV_PROB_MAX - 1) /* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_INV_PROB */ #define DEFAULT_EM_FLOW_INSERT_INV_PROB 100 -#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / \ +#define DEFAULT_EM_FLOW_INSERT_MIN (EM_FLOW_INSERT_INV_PROB_MAX / \ DEFAULT_EM_FLOW_INSERT_INV_PROB) struct emc_entry { @@ -2092,7 +2097,7 @@ emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd, uint32_t min; atomic_read_relaxed(&pmd->dp->emc_insert_min, &min); - if (min && random_uint32() <= min) { + if (min && (random_uint32() & EM_FLOW_INSERT_INV_PROB_MASK) < min) { emc_insert(&pmd->flow_cache, key, flow); } } @@ -2909,8 +2914,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) } atomic_read_relaxed(&dp->emc_insert_min, &cur_min); - if (insert_prob <= UINT32_MAX) { - insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob; + if (insert_prob < EM_FLOW_INSERT_INV_PROB_MAX) { + insert_min = insert_prob == 0 + ? 0 : EM_FLOW_INSERT_INV_PROB_MAX / insert_prob; } else { insert_min = DEFAULT_EM_FLOW_INSERT_MIN; insert_prob = DEFAULT_EM_FLOW_INSERT_INV_PROB; diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 074535b..61f252e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -381,7 +381,8 @@ </column> <column name="other_config" key="emc-insert-inv-prob" - type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'> + type='{"type": "integer", + "minInteger": 0, "maxInteger": 1048575}'> <p> Specifies the inverse probability (1/emc-insert-inv-prob) of a flow being inserted into the Exact Match Cache (EMC). On average one in -- 2.7.4 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev