Hi All, This proposal is an attempt to make a more general solution to the same issue of EMC thrashing addressed by https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335940.html. That patch proposes that when the EMC is overloaded that recirculated packets are neither inserted into the EMC nor are they looked up in the EMC.
However, while it is a cheap decision, picking 'recirculated' as the category of packet to drop from the EMC is not very granular - dropping them from the EMC may result in an EMC that is now under-utilized or still overloaded - it all depends on what proportion of packets were in the recirculated category. Once there are too many entries contending for any cache it's going to become a liability as the lookup_cost + (miss_rate * cost_of_miss) grows to be greater than cost_of_miss. And in that overloaded case any scheme where a very cheap decision can be made to not insert a certain category of packets and to also not check the EMC for that same category will reduce the cache miss rate. Looking at a packet's RSS hash can also provide a "very cheap decision that can be made to not insert a certain category of packets and to also not check the EMC for that same category" I don't want to propose an actual patch right now but if the general principle is agreed then I would be happy to write at least a first iteration of the patch or maybe the authors of the patch above would like to do so? So I suggest that when the EMC becomes "overloaded" load is shed based on RSS /5-tuple hash. If the hash is above a certain threshold it becomes eligible to be inserted into the EMC. Packets with a hash below the threshold are non inserted nor are they looked up in the EMC. The threshold can be changed in an ongoing and granular way to adapt to current traffic conditions and maintain an optimum EMC utilization. So in simple, non-optimized (and non-compiling) terms: diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 47a9fa0..df5b9db 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4599,11 +4599,12 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, miniflow_extract(packet, &key->mf); key->len = 0; /* Not computed yet. */ key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); /* If EMC is disabled skip emc_lookup */ - flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); + if (key->hash > flow_cache.shed_threshold) { + flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); + } if (OVS_LIKELY(flow)) { dp_netdev_queue_batches(packet, flow, &key->mf, batches, n_batches); } else { /* Exact match cache missed. Group missed packets together at @@ -4777,11 +4778,12 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, continue; } flow = dp_netdev_flow_cast(rules[i]); - emc_probabilistic_insert(pmd, &keys[i], flow); + if (keys[i].hash > pmd->flow_cache->shed_threshold) { + emc_probabilistic_insert(pmd, &keys[i], flow); + } dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches); } dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt); dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt); The other question is setting the shed_threshold value. Generally: sched_threshold = some_function_of(emc_load) Some suggestions for calculating emc_load: * the num_alive_entries/num_entries * the num_evictions/num_insertions (where num_evictions is the number of insertions that overwrote and existing alive entry). * something more adaptable: emc_load = some_function_of (cost_of_emc_miss, cost_of_emc_lookup, probability_of_emc_miss) I'd be interested to know what people think of doing something like this and any more details that can be fleshed out, corner cases and so on. Regards, Billy _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev