Re: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache

Ferriter, Cian Mon, 11 Dec 2017 08:32:11 -0800

Hi Jan,

This is a very interesting patch with compelling results.


I had a few questions after reading the commit message for this patch: 
How did you decide on 1M as the proposed size for the DFC?
Will the size of this DFC be configurable (i.e. map a different number of hash 
bits)?

I was mostly interested in how the performance of the very basic phy2phy 
testcase would be affected by this change.
Below are my results from 1-10000 flows, 64B packets, unidirectional with 1 
PMD. One OpenFlow rule is installed for each stream of traffic being sent:
Flows   master  DFC+EMC  Gain
        [Mpps]  [Mpps]
------------------------------
1       12.34   13.12    6.3%
10      7.83    8.19     4.5%
100     4.71    4.78     1.3%
1000    3.77    3.83     1.5%
10000   3.27    3.51     7.1%

This shows a performance improvement for this testcase also.

Tested-by: Cian Ferriter <cian.ferri...@intel.com>

Let me know your thoughts on the above questions.
Thanks,
Cian

> -----Original Message-----
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Jan Scheurich
> Sent: 20 November 2017 17:33
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache
> 
> So far the netdev datapath uses an 8K EMC to speed up the lookup of
> frequently used flows by comparing the parsed packet headers against the
> miniflow of a cached flow, using 13 bits of the packet RSS hash as index. The
> EMC is too small for many applications with 100K or more parallel packet
> flows so that EMC threshing actually degrades performance.
> Furthermore, the size of struct miniflow and the flow copying cost prevents
> us from making it much larger.
> 
> At the same time the lookup cost of the megaflow classifier (DPCLS) is
> increasing as the number of frequently hit subtables grows with the
> complexity of pipeline and the number of recirculations.
> 
> To close the performance gap for many parallel flows, this patch introduces
> the datapath flow cache (DFC) with 1M entries as lookup stage between EMC
> and DPCLS. It directly maps 20 bits of the RSS hash to a pointer to the last 
> hit
> megaflow entry and performs a masked comparison of the packet flow with
> the megaflow key to confirm the hit. This avoids the costly DPCLS lookup
> even for very large number of parallel flows with a small memory overhead.
> 
> Due the large size of the DFC and the low risk of DFC thrashing, any DPCLS hit
> immediately inserts an entry in the DFC so that subsequent packets get
> speeded up. The DFC, thus, accelerate also short-lived flows.
> 
> To further accelerate the lookup of few elephant flows, every DFC hit
> triggers a probabilistic EMC insertion of the flow. As the DFC entry is 
> already
> in place the default EMC insertion probability can be reduced to
> 1/1000 to minimize EMC thrashing should there still be many fat flows.
> The inverse EMC insertion probability remains configurable.
> 
> The EMC implementation is simplified by removing the possibility to store a
> flow in two slots, as there is no particular reason why two flows should
> systematically collide (the RSS hash is not symmetric).
> The maximum size of the EMC flow key is limited to 256 bytes to reduce the
> memory footprint. This should be sufficient to hold most real life packet flow
> keys. Larger flows are not installed in the EMC.
> 
> The pmd-stats-show command is enhanced to show both EMC and DFC hits
> separately.
> 
> The sweep speed for cleaning up obsolete EMC and DFC flow entries and
> freeing dead megaflow entries is increased. With a typical PMD cycle
> duration of 100us under load and checking one DFC entry per cycle, the DFC
> sweep should normally complete within in 100s.
> 
> In PVP performance tests with an L3 pipeline over VXLAN we determined the
> optimal EMC size to be 16K entries to obtain a uniform speedup compared to
> the master branch over the full range of parallel flows. The measurement
> below is for 64 byte packets and the average number of subtable lookups per
> DPCLS hit in this pipeline is 1.0, i.e. the acceleration already starts for a 
> single
> busy mask. Tests with many visited subtables should show a strong increase
> of the gain through DFC.
> 
> Flows   master  DFC+EMC  Gain
>         [Mpps]  [Mpps]
> ------------------------------
> 8       4.45    4.62     3.8%
> 100     4.17    4.47     7.2%
> 1000    3.88    4.34    12.0%
> 2000    3.54    4.17    17.8%
> 5000    3.01    3.82    27.0%
> 10000   2.75    3.63    31.9%
> 20000   2.64    3.50    32.8%
> 50000   2.60    3.33    28.1%
> 100000  2.59    3.23    24.7%
> 500000  2.59    3.16    21.9%
> 
> 
> Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com>
> ---
>  lib/dpif-netdev.c | 349 ++++++++++++++++++++++++++++++++++++-------
> -----------
>  1 file changed, 235 insertions(+), 114 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index db78318..efcf2e9
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -127,19 +127,19 @@ struct netdev_flow_key {
>      uint64_t buf[FLOW_MAX_PACKET_U64S];  };
> 
> -/* Exact match cache for frequently used flows
> +/* Datapath flow cache (DFC) for frequently used flows
>   *
> - * The cache uses a 32-bit hash of the packet (which can be the RSS hash) to
> - * search its entries for a miniflow that matches exactly the miniflow of the
> - * packet. It stores the 'dpcls_rule' (rule) that matches the miniflow.
> + * The cache uses the 32-bit hash of the packet (which can be the RSS
> + hash) to
> + * directly look up a pointer to the matching megaflow. To check for a
> + match
> + * the packet's flow key is compared against the key and mask of the
> megaflow.
>   *
> - * A cache entry holds a reference to its 'dp_netdev_flow'.
> - *
> - * A miniflow with a given hash can be in one of EM_FLOW_HASH_SEGS
> different
> - * entries. The 32-bit hash is split into EM_FLOW_HASH_SEGS values (each
> of
> - * them is EM_FLOW_HASH_SHIFT bits wide and the remainder is thrown
> away). Each
> - * value is the index of a cache entry where the miniflow could be.
> + * For even faster lookup, the most frequently used packet flows are
> + also
> + * inserted into a small exact match cache (EMC). The EMC uses a part
> + of the
> + * packet hash to look up a miniflow that matches exactly the miniflow
> + of the
> + * packet. The matching EMC also returns a reference to the megaflow.
>   *
> + * Flows are promoted from the DFC to the EMC through probabilistic
> + insertion
> + * after successful DFC lookup with minor probability to favor elephant
> flows.
>   *
>   * Thread-safety
>   * =============
> @@ -148,33 +148,38 @@ struct netdev_flow_key {
>   * If dp_netdev_input is not called from a pmd thread, a mutex is used.
>   */
> 
> -#define EM_FLOW_HASH_SHIFT 13
> -#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT) -
> #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1) -#define
> EM_FLOW_HASH_SEGS 2
> +#define DFC_MASK_LEN 20
> +#define DFC_ENTRIES (1u << DFC_MASK_LEN) #define DFC_MASK
> (DFC_ENTRIES
> +- 1) #define EMC_MASK_LEN 14 #define EMC_ENTRIES (1u <<
> EMC_MASK_LEN)
> +#define EMC_MASK (EMC_ENTRIES - 1)
> 
>  /* Default EMC insert probability is 1 /
> DEFAULT_EM_FLOW_INSERT_INV_PROB */ -#define
> DEFAULT_EM_FLOW_INSERT_INV_PROB 100
> +#define DEFAULT_EM_FLOW_INSERT_INV_PROB 1000
>  #define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX /                     \
>                                      DEFAULT_EM_FLOW_INSERT_INV_PROB)
> 
>  struct emc_entry {
>      struct dp_netdev_flow *flow;
> -    struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> +    char key[248];   /* Holds struct netdev_flow_key of limited size. */
>  };
> 
>  struct emc_cache {
> -    struct emc_entry entries[EM_FLOW_HASH_ENTRIES];
> -    int sweep_idx;                /* For emc_cache_slow_sweep(). */
> +    struct emc_entry entries[EMC_ENTRIES];
> +    int sweep_idx;
> +};
> +
> +struct dfc_entry {
> +    struct dp_netdev_flow *flow;
> +};
> +
> +struct dfc_cache {
> +    struct emc_cache emc_cache;
> +    struct dfc_entry entries[DFC_ENTRIES];
> +    int sweep_idx;
>  };
> 
> -/* Iterate in the exact match cache through every entry that might contain a
> - * miniflow with hash 'HASH'. */
> -#define EMC_FOR_EACH_POS_WITH_HASH(EMC, CURRENT_ENTRY, HASH)
> \
> -    for (uint32_t i__ = 0, srch_hash__ = (HASH);                             
> \
> -         (CURRENT_ENTRY) = &(EMC)->entries[srch_hash__ &
> EM_FLOW_HASH_MASK], \
> -         i__ < EM_FLOW_HASH_SEGS;                                            
> \
> -         i__++, srch_hash__ >>= EM_FLOW_HASH_SHIFT)
> 
>  /* Simple non-wildcarding single-priority classifier. */
> 
> @@ -214,6 +219,8 @@ static bool dpcls_lookup(struct dpcls *cls,
>                           const struct netdev_flow_key keys[],
>                           struct dpcls_rule **rules, size_t cnt,
>                           int *num_lookups_p);
> +static bool dpcls_rule_matches_key(const struct dpcls_rule *rule,
> +                                   const struct netdev_flow_key
> +*target);
> 
>  /* Set of supported meter flags */
>  #define DP_SUPPORTED_METER_FLAGS_MASK \ @@ -332,6 +339,7 @@
> static struct dp_netdev_port *dp_netdev_lookup_port(const struct
> dp_netdev *dp,
> 
>  enum dp_stat_type {
>      DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
> +    DP_STAT_DFC_HIT,            /* Packets that had a flow cache hit (dfc). 
> */
>      DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
>      DP_STAT_MISS,               /* Packets that did not match. */
>      DP_STAT_LOST,               /* Packets not passed up to the client. */
> @@ -565,7 +573,7 @@ struct dp_netdev_pmd_thread {
>       * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
>       * need to be protected by 'non_pmd_mutex'.  Every other instance
>       * will only be accessed by its own pmd thread. */
> -    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
> +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct dfc_cache flow_cache;
>      struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. 
> */
> 
>      /* Queue id used by this pmd thread to send packets on all netdevs if @@
> -744,46 +752,59 @@ dpif_netdev_xps_revalidate_pmd(const struct
> dp_netdev_pmd_thread *pmd,  static int
> dpif_netdev_xps_get_tx_qid(const struct dp_netdev_pmd_thread *pmd,
>                                        struct tx_port *tx, long long now);
> 
> -static inline bool emc_entry_alive(struct emc_entry *ce);
> +static inline bool dfc_entry_alive(struct dfc_entry *ce);
>  static void emc_clear_entry(struct emc_entry *ce);
> +static void dfc_clear_entry(struct dfc_entry *ce);
> 
>  static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
> 
>  static void
> -emc_cache_init(struct emc_cache *flow_cache)
> +emc_cache_init(struct emc_cache *emc)
>  {
>      int i;
> 
> -    flow_cache->sweep_idx = 0;
> +    for (i = 0; i < ARRAY_SIZE(emc->entries); i++) {
> +        emc->entries[i].flow = NULL;
> +        struct netdev_flow_key *key =
> +                (struct netdev_flow_key *) emc->entries[i].key;
> +        key->hash = 0;
> +        key->len = sizeof(struct miniflow);
> +        flowmap_init(&key->mf.map);
> +    }
> +    emc->sweep_idx = 0;
> +}
> +
> +static void
> +dfc_cache_init(struct dfc_cache *flow_cache) {
> +    int i;
> +
> +    emc_cache_init(&flow_cache->emc_cache);
>      for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
>          flow_cache->entries[i].flow = NULL;
> -        flow_cache->entries[i].key.hash = 0;
> -        flow_cache->entries[i].key.len = sizeof(struct miniflow);
> -        flowmap_init(&flow_cache->entries[i].key.mf.map);
>      }
> +    flow_cache->sweep_idx = 0;
>  }
> 
>  static void
> -emc_cache_uninit(struct emc_cache *flow_cache)
> +emc_cache_uninit(struct emc_cache *emc)
>  {
>      int i;
> 
> -    for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
> -        emc_clear_entry(&flow_cache->entries[i]);
> +    for (i = 0; i < ARRAY_SIZE(emc->entries); i++) {
> +        emc_clear_entry(&emc->entries[i]);
>      }
>  }
> 
> -/* Check and clear dead flow references slowly (one entry at each
> - * invocation).  */
>  static void
> -emc_cache_slow_sweep(struct emc_cache *flow_cache)
> +dfc_cache_uninit(struct dfc_cache *flow_cache)
>  {
> -    struct emc_entry *entry = &flow_cache->entries[flow_cache-
> >sweep_idx];
> +    int i;
> 
> -    if (!emc_entry_alive(entry)) {
> -        emc_clear_entry(entry);
> +    for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) {
> +        dfc_clear_entry(&flow_cache->entries[i]);
>      }
> -    flow_cache->sweep_idx = (flow_cache->sweep_idx + 1) &
> EM_FLOW_HASH_MASK;
> +    emc_cache_uninit(&flow_cache->emc_cache);
>  }
> 
>  /* Returns true if 'dpif' is a netdev or dummy dpif, false otherwise. */ @@ -
> 837,8 +858,8 @@ pmd_info_show_stats(struct ds *reply,
>      }
> 
>      /* Sum of all the matched and not matched packets gives the total.  */
> -    total_packets = stats[DP_STAT_EXACT_HIT] +
> stats[DP_STAT_MASKED_HIT]
> -                    + stats[DP_STAT_MISS];
> +    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_DFC_HIT]
> +                    + stats[DP_STAT_MASKED_HIT] + stats[DP_STAT_MISS];
> 
>      for (i = 0; i < PMD_N_CYCLES; i++) {
>          if (cycles[i] > pmd->cycles_zero[i]) { @@ -862,10 +883,13 @@
> pmd_info_show_stats(struct ds *reply,
>      ds_put_cstr(reply, ":\n");
> 
>      ds_put_format(reply,
> -                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> +                  "\temc hits:%llu\n\tdfc hits:%llu\n"
> +                  "\tmegaflow hits:%llu\n"
>                    "\tavg. subtable lookups per hit:%.2f\n"
>                    "\tmiss:%llu\n\tlost:%llu\n",
> -                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> +                  stats[DP_STAT_EXACT_HIT],
> +                  stats[DP_STAT_DFC_HIT],
> +                  stats[DP_STAT_MASKED_HIT],
>                    stats[DP_STAT_MASKED_HIT] > 0
>                    ?
> (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
>                    : 0,
> @@ -1492,6 +1516,8 @@ dpif_netdev_get_stats(const struct dpif *dpif,
> struct dpif_dp_stats *stats)
>          stats->n_hit += n;
>          atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
>          stats->n_hit += n;
> +        atomic_read_relaxed(&pmd->stats.n[DP_STAT_DFC_HIT], &n);
> +        stats->n_hit += n;
>          atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
>          stats->n_missed += n;
>          atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n); @@ -2111,6
> +2137,16 @@ netdev_flow_key_hash_in_mask(const struct
> netdev_flow_key *key,
>      return hash_finish(hash, (p - miniflow_get_values(&mask->mf)) * 8);  }
> 
> +/*
> + * Datapath Flow Cache and EMC implementation  */
> +
> +static inline struct emc_entry *
> +emc_entry_get(struct emc_cache *emc, const uint32_t hash) {
> +    return &emc->entries[hash & EMC_MASK]; }
> +
>  static inline bool
>  emc_entry_alive(struct emc_entry *ce)
>  {
> @@ -2127,9 +2163,15 @@ emc_clear_entry(struct emc_entry *ce)  }
> 
>  static inline void
> -emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
> -                 const struct netdev_flow_key *key)
> +emc_change_entry(struct emc_entry *ce, const struct netdev_flow_key
> *key,
> +                 struct dp_netdev_flow *flow)
>  {
> +    /* We only store small enough flows in the EMC. */
> +    size_t key_size = offsetof(struct netdev_flow_key, mf) + key->len;
> +    if (key_size > sizeof(ce->key)) {
> +        return;
> +    }
> +
>      if (ce->flow != flow) {
>          if (ce->flow) {
>              dp_netdev_flow_unref(ce->flow); @@ -2141,73 +2183,148 @@
> emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
>              ce->flow = NULL;
>          }
>      }
> -    if (key) {
> -        netdev_flow_key_clone(&ce->key, key);
> -    }
> +    netdev_flow_key_clone((struct netdev_flow_key *) ce->key, key);
>  }
> 
>  static inline void
> -emc_insert(struct emc_cache *cache, const struct netdev_flow_key *key,
> -           struct dp_netdev_flow *flow)
> +emc_probabilistic_insert(struct emc_cache *emc,
> +                         const struct netdev_flow_key *key,
> +                         struct dp_netdev_flow *flow)
>  {
> -    struct emc_entry *to_be_replaced = NULL;
> -    struct emc_entry *current_entry;
> +    const uint32_t threshold = UINT32_MAX/1000;
> 
> -    EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) {
> -        if (netdev_flow_key_equal(&current_entry->key, key)) {
> -            /* We found the entry with the 'mf' miniflow */
> -            emc_change_entry(current_entry, flow, NULL);
> -            return;
> +    if (random_uint32() <= threshold) {
> +
> +        struct emc_entry *current_entry = emc_entry_get(emc, key->hash);
> +        emc_change_entry(current_entry, key, flow);
> +    }
> +}
> +
> +static inline struct dp_netdev_flow *
> +emc_lookup(struct emc_cache *emc, const struct netdev_flow_key *key) {
> +    struct emc_entry *current_entry = emc_entry_get(emc, key->hash);
> +    struct netdev_flow_key *current_key =
> +            (struct netdev_flow_key *) current_entry->key;
> +
> +    if (current_key->hash == key->hash
> +        && emc_entry_alive(current_entry)
> +        && netdev_flow_key_equal_mf(current_key, &key->mf)) {
> +
> +        /* We found the entry with the 'key->mf' miniflow */
> +        return current_entry->flow;
> +    }
> +    return NULL;
> +}
> +
> +static inline struct dfc_entry *
> +dfc_entry_get(struct dfc_cache *cache, const uint32_t hash) {
> +    return &cache->entries[hash & DFC_MASK]; }
> +
> +static inline bool
> +dfc_entry_alive(struct dfc_entry *ce)
> +{
> +    return ce->flow && !ce->flow->dead; }
> +
> +static void
> +dfc_clear_entry(struct dfc_entry *ce)
> +{
> +    if (ce->flow) {
> +        dp_netdev_flow_unref(ce->flow);
> +        ce->flow = NULL;
> +    }
> +}
> +
> +static inline void
> +dfc_change_entry(struct dfc_entry *ce, struct dp_netdev_flow *flow) {
> +    if (ce->flow != flow) {
> +        if (ce->flow) {
> +            dp_netdev_flow_unref(ce->flow);
>          }
> 
> -        /* Replacement policy: put the flow in an empty (not alive) entry, or
> -         * in the first entry where it can be */
> -        if (!to_be_replaced
> -            || (emc_entry_alive(to_be_replaced)
> -                && !emc_entry_alive(current_entry))
> -            || current_entry->key.hash < to_be_replaced->key.hash) {
> -            to_be_replaced = current_entry;
> +        if (dp_netdev_flow_ref(flow)) {
> +            ce->flow = flow;
> +        } else {
> +            ce->flow = NULL;
>          }
>      }
> -    /* We didn't find the miniflow in the cache.
> -     * The 'to_be_replaced' entry is where the new flow will be stored */
> -
> -    emc_change_entry(to_be_replaced, flow, key);
>  }
> 
>  static inline void
> -emc_probabilistic_insert(struct dp_netdev_pmd_thread *pmd,
> -                         const struct netdev_flow_key *key,
> -                         struct dp_netdev_flow *flow)
> +dfc_insert(struct dp_netdev_pmd_thread *pmd,
> +           const struct netdev_flow_key *key,
> +           struct dp_netdev_flow *flow)
>  {
> -    /* Insert an entry into the EMC based on probability value 'min'. By
> -     * default the value is UINT32_MAX / 100 which yields an insertion
> -     * probability of 1/100 ie. 1% */
> +    struct dfc_cache *cache = &pmd->flow_cache;
> +    struct dfc_entry *current_entry;
> +
> +    current_entry = dfc_entry_get(cache, key->hash);
> +    dfc_change_entry(current_entry, flow);
> +}
> +
> +static inline struct dp_netdev_flow *
> +dfc_lookup(struct dfc_cache *cache, const struct netdev_flow_key *key,
> +           bool *exact_match)
> +{
> +    struct dp_netdev_flow *flow;
> 
> -    uint32_t min;
> -    atomic_read_relaxed(&pmd->dp->emc_insert_min, &min);
> +    /* Try an EMC lookup first. */
> +    flow = emc_lookup(&cache->emc_cache, key);
> +    if (flow) {
> +        *exact_match = true;
> +        return flow;
> +    }
> +
> +    /* EMC lookup not successful: try DFC lookup. */
> +    struct dfc_entry *current_entry = dfc_entry_get(cache, key->hash);
> +    flow = current_entry->flow;
> 
> -    if (min && random_uint32() <= min) {
> -        emc_insert(&pmd->flow_cache, key, flow);
> +    if (dfc_entry_alive(current_entry) &&
> +        dpcls_rule_matches_key(&flow->cr, key)) {
> +
> +        /* Found a match in DFC. Insert into EMC for subsequent lookups.
> +         * We use probabilistic insertion here so that mainly elephant
> +         * flows enter EMC. */
> +        emc_probabilistic_insert(&cache->emc_cache, key, flow);
> +        *exact_match = false;
> +        return flow;
> +    } else {
> +
> +        /* No match. Need to go to DPCLS lookup. */
> +        return NULL;
>      }
>  }
> 
> -static inline struct dp_netdev_flow *
> -emc_lookup(struct emc_cache *cache, const struct netdev_flow_key *key)
> +/* Check and clear dead flow references slowly (one entry at each
> + * invocation).  */
> +static void
> +emc_slow_sweep(struct emc_cache *emc)
>  {
> -    struct emc_entry *current_entry;
> +    struct emc_entry *entry = &emc->entries[emc->sweep_idx];
> 
> -    EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, key->hash) {
> -        if (current_entry->key.hash == key->hash
> -            && emc_entry_alive(current_entry)
> -            && netdev_flow_key_equal_mf(&current_entry->key, &key->mf)) {
> +    if (!emc_entry_alive(entry)) {
> +        emc_clear_entry(entry);
> +    }
> +    emc->sweep_idx = (emc->sweep_idx + 1) & EMC_MASK;
> +}
> 
> -            /* We found the entry with the 'key->mf' miniflow */
> -            return current_entry->flow;
> -        }
> +static void
> +dfc_slow_sweep(struct dfc_cache *cache)
> +{
> +    /* Sweep the EMC so that both finish in the same time. */
> +    if ((cache->sweep_idx & (DFC_ENTRIES/EMC_ENTRIES - 1)) == 0) {
> +        emc_slow_sweep(&cache->emc_cache);
>      }
> 
> -    return NULL;
> +    struct dfc_entry *entry = &cache->entries[cache->sweep_idx];
> +    if (!dfc_entry_alive(entry)) {
> +        dfc_clear_entry(entry);
> +    }
> +    cache->sweep_idx = (cache->sweep_idx + 1) & DFC_MASK;
>  }
> 
>  static struct dp_netdev_flow *
> @@ -4048,7 +4165,7 @@ pmd_thread_main(void *f_)
>      ovs_numa_thread_setaffinity_core(pmd->core_id);
>      dpdk_set_lcore_id(pmd->core_id);
>      poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> -    emc_cache_init(&pmd->flow_cache);
> +    dfc_cache_init(&pmd->flow_cache);
>  reload:
>      pmd_alloc_static_tx_qid(pmd);
> 
> @@ -4078,17 +4195,16 @@ reload:
>                                                        : PMD_CYCLES_IDLE);
>          }
> 
> -        if (lc++ > 1024) {
> -            bool reload;
> +        dfc_slow_sweep(&pmd->flow_cache);
> 
> +        if (lc++ > 1024) {
>              lc = 0;
> 
>              coverage_try_clear();
>              dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
> -            if (!ovsrcu_try_quiesce()) {
> -                emc_cache_slow_sweep(&pmd->flow_cache);
> -            }
> +            ovsrcu_try_quiesce();
> 
> +            bool reload;
>              atomic_read_relaxed(&pmd->reload, &reload);
>              if (reload) {
>                  break;
> @@ -4110,7 +4226,7 @@ reload:
>          goto reload;
>      }
> 
> -    emc_cache_uninit(&pmd->flow_cache);
> +    dfc_cache_uninit(&pmd->flow_cache);
>      free(poll_list);
>      pmd_free_cached_ports(pmd);
>      return NULL;
> @@ -4544,7 +4660,7 @@ dp_netdev_configure_pmd(struct
> dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
>      /* init the 'flow_cache' since there is no
>       * actual thread created for NON_PMD_CORE_ID. */
>      if (core_id == NON_PMD_CORE_ID) {
> -        emc_cache_init(&pmd->flow_cache);
> +        dfc_cache_init(&pmd->flow_cache);
>          pmd_alloc_static_tx_qid(pmd);
>      }
>      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *,
> &pmd->node),
> @@ -4586,7 +4702,7 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct
> dp_netdev_pmd_thread *pmd)
>       * but extra cleanup is necessary */
>      if (pmd->core_id == NON_PMD_CORE_ID) {
>          ovs_mutex_lock(&dp->non_pmd_mutex);
> -        emc_cache_uninit(&pmd->flow_cache);
> +        dfc_cache_uninit(&pmd->flow_cache);
>          pmd_free_cached_ports(pmd);
>          pmd_free_static_tx_qid(pmd);
>          ovs_mutex_unlock(&dp->non_pmd_mutex);
> @@ -4896,7 +5012,7 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>      packet_batch_per_flow_update(batch, pkt, mf);
>  }
> 
> -/* Try to process all ('cnt') the 'packets' using only the exact match cache
> +/* Try to process all ('cnt') the 'packets' using only the PMD flow cache
>   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
>   * miniflow is copied into 'keys' and the packet pointer is moved at the
>   * beginning of the 'packets' array.
> @@ -4911,19 +5027,20 @@ dp_netdev_queue_batches(struct dp_packet
> *pkt,
>   * will be ignored.
>   */
>  static inline size_t
> -emc_processing(struct dp_netdev_pmd_thread *pmd,
> +dfc_processing(struct dp_netdev_pmd_thread *pmd,
>                 struct dp_packet_batch *packets_,
>                 struct netdev_flow_key *keys,
>                 struct packet_batch_per_flow batches[], size_t *n_batches,
>                 bool md_is_valid, odp_port_t port_no)
>  {
> -    struct emc_cache *flow_cache = &pmd->flow_cache;
> +    struct dfc_cache *flow_cache = &pmd->flow_cache;
>      struct netdev_flow_key *key = &keys[0];
> -    size_t n_missed = 0, n_dropped = 0;
> +    size_t n_missed = 0, n_dfc_hit = 0, n_emc_hit = 0;
>      struct dp_packet *packet;
>      const size_t cnt = dp_packet_batch_size(packets_);
>      uint32_t cur_min;
>      int i;
> +    bool exact_match;
> 
>      atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
> 
> @@ -4932,7 +5049,6 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
> 
>          if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
>              dp_packet_delete(packet);
> -            n_dropped++;
>              continue;
>          }
> 
> @@ -4948,7 +5064,7 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>          }
>          miniflow_extract(packet, &key->mf);
>          key->len = 0; /* Not computed yet. */
> -        /* If EMC is disabled skip hash computation and emc_lookup */
> +        /* If DFC is disabled skip hash computation and DFC lookup */
>          if (cur_min) {
>              if (!md_is_valid) {
>                  key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
> @@ -4956,11 +5072,16 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>              } else {
>                  key->hash = dpif_netdev_packet_get_rss_hash(packet, 
> &key->mf);
>              }
> -            flow = emc_lookup(flow_cache, key);
> +            flow = dfc_lookup(flow_cache, key, &exact_match);
>          } else {
>              flow = NULL;
>          }
>          if (OVS_LIKELY(flow)) {
> +            if (exact_match) {
> +                n_emc_hit++;
> +            } else {
> +                n_dfc_hit++;
> +            }
>              dp_netdev_queue_batches(packet, flow, &key->mf, batches,
>                                      n_batches);
>          } else {
> @@ -4974,8 +5095,8 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>          }
>      }
> 
> -    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> -                           cnt - n_dropped - n_missed);
> +    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, n_emc_hit);
> +    dp_netdev_count_packet(pmd, DP_STAT_DFC_HIT, n_dfc_hit);
> 
>      return dp_packet_batch_size(packets_);
>  }
> @@ -5044,7 +5165,7 @@ handle_packet_upcall(struct
> dp_netdev_pmd_thread *pmd,
>                                               add_actions->size);
>          }
>          ovs_mutex_unlock(&pmd->flow_mutex);
> -        emc_probabilistic_insert(pmd, key, netdev_flow);
> +        dfc_insert(pmd, key, netdev_flow);
>      }
>  }
> 
> @@ -5136,7 +5257,7 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
> 
>          flow = dp_netdev_flow_cast(rules[i]);
> 
> -        emc_probabilistic_insert(pmd, &keys[i], flow);
> +        dfc_insert(pmd, &keys[i], flow);
>          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> n_batches);
>      }
> 
> @@ -5169,7 +5290,7 @@ dp_netdev_input__(struct
> dp_netdev_pmd_thread *pmd,
>      odp_port_t in_port;
> 
>      n_batches = 0;
> -    emc_processing(pmd, packets, keys, batches, &n_batches,
> +    dfc_processing(pmd, packets, keys, batches, &n_batches,
>                              md_is_valid, port_no);
>      if (!dp_packet_batch_is_empty(packets)) {
>          /* Get ingress port from first packet's metadata. */
> @@ -6063,7 +6184,7 @@ dpcls_remove(struct dpcls *cls, struct dpcls_rule
> *rule)
> 
>  /* Returns true if 'target' satisfies 'key' in 'mask', that is, if each 1-bit
>   * in 'mask' the values in 'key' and 'target' are the same. */
> -static inline bool
> +static bool
>  dpcls_rule_matches_key(const struct dpcls_rule *rule,
>                         const struct netdev_flow_key *target)
>  {
> --
> 1.9.1
> 
> 
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] dpif-netdev: Refactor datapath flow cache

Reply via email to