lpm: memory orderings to avoid race conditions for v1604

Ruifeng Wang (Arm Technology China) Wed, 05 Jun 2019 07:12:41 -0700

Hi Vladimir,

> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medved...@intel.com>
> Sent: Wednesday, June 5, 2019 18:50
> To: Ruifeng Wang (Arm Technology China) <ruifeng.w...@arm.com>;
> bruce.richard...@intel.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China)
> <gavin...@arm.com>; nd <n...@arm.com>
> Subject: Re: [PATCH v1 1/2] lib/lpm: memory orderings to avoid race
> conditions for v1604
> 
> Hi Wang,
> 
> On 05/06/2019 06:54, Ruifeng Wang wrote:
> > When a tbl8 group is getting attached to a tbl24 entry, lookup might
> > fail even though the entry is configured in the table.
> >
> > For ex: consider a LPM table configured with 10.10.10.1/24.
> > When a new entry 10.10.10.32/28 is being added, a new tbl8 group is
> > allocated and tbl24 entry is changed to point to the tbl8 group. If
> > the tbl24 entry is written without the tbl8 group entries updated, a
> > lookup on 10.10.10.9 will return failure.
> >
> > Correct memory orderings are required to ensure that the store to
> > tbl24 does not happen before the stores to tbl8 group entries
> > complete.
> >
> > The orderings have impact on LPM performance test.
> > On Arm A72 platform, delete operation has 2.7% degradation, while add
> > / lookup has no notable performance change.
> > On x86 E5 platform, add operation has 4.3% degradation, delete
> > operation has 2.2% - 10.2% degradation, lookup has no performance
> > change.
> 
> I think it is possible to avoid add/del performance degradation
> 
> 1. Explicitly mark struct rte_lpm_tbl_entry 4-byte aligned
> 
> 2. Cast value to uint32_t (uint16_t for 2.0 version) on memory write
> 
> 3. Use rte_wmb() after memory write
>


Thanks for your suggestions.
Point 1 & 2 make sense.

For point 3, are you suggesting using rte_wmb() instead of __atomic_store()? 
rte_wmb() is DPDK made memory model. Maybe we can use __atomic_store() with 
'RTE_USE_C11_MEM_MODEL=y', and use rte_wmb() otherwise?

> >
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > ---
> >   lib/librte_lpm/rte_lpm.c | 32 +++++++++++++++++++++++++-------
> >   lib/librte_lpm/rte_lpm.h |  4 ++++
> >   2 files changed, 29 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 6b7b28a2e..6ec450a08 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -806,7 +806,8 @@ add_depth_small_v1604(struct rte_lpm *lpm,
> uint32_t ip, uint8_t depth,
> >                     /* Setting tbl24 entry in one go to avoid race
> >                      * conditions
> >                      */
> > -                   lpm->tbl24[i] = new_tbl24_entry;
> > +                   __atomic_store(&lpm->tbl24[i], &new_tbl24_entry,
> > +                                   __ATOMIC_RELEASE);
> >
> >                     continue;
> >             }
> > @@ -1017,7 +1018,11 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked, uint8_t depth,
> >                     .depth = 0,
> >             };
> >
> > -           lpm->tbl24[tbl24_index] = new_tbl24_entry;
> > +           /* The tbl24 entry must be written only after the
> > +            * tbl8 entries are written.
> > +            */
> > +           __atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> > +                           __ATOMIC_RELEASE);
> >
> >     } /* If valid entry but not extended calculate the index into Table8. */
> >     else if (lpm->tbl24[tbl24_index].valid_group == 0) { @@ -1063,7
> > +1068,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked, uint8_t depth,
> >                             .depth = 0,
> >             };
> >
> > -           lpm->tbl24[tbl24_index] = new_tbl24_entry;
> > +           /* The tbl24 entry must be written only after the
> > +            * tbl8 entries are written.
> > +            */
> > +           __atomic_store(&lpm->tbl24[tbl24_index],
> &new_tbl24_entry,
> > +                           __ATOMIC_RELEASE);
> >
> >     } else { /*
> >             * If it is valid, extended entry calculate the index into tbl8.
> > @@ -1391,6 +1400,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >     /* Calculate the range and index into Table24. */
> >     tbl24_range = depth_to_range(depth);
> >     tbl24_index = (ip_masked >> 8);
> > +   struct rte_lpm_tbl_entry zero_tbl24_entry = {0};
> >
> >     /*
> >      * Firstly check the sub_rule_index. A -1 indicates no replacement
> > rule @@ -1405,7 +1415,8 @@ delete_depth_small_v1604(struct rte_lpm
> > *lpm, uint32_t ip_masked,
> >
> >                     if (lpm->tbl24[i].valid_group == 0 &&
> >                                     lpm->tbl24[i].depth <= depth) {
> > -                           lpm->tbl24[i].valid = INVALID;
> > +                           __atomic_store(&lpm->tbl24[i],
> > +                                   &zero_tbl24_entry,
> __ATOMIC_RELEASE);
> >                     } else if (lpm->tbl24[i].valid_group == 1) {
> >                             /*
> >                              * If TBL24 entry is extended, then there has
> @@ -1450,7 +1461,8
> > @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked,
> >
> >                     if (lpm->tbl24[i].valid_group == 0 &&
> >                                     lpm->tbl24[i].depth <= depth) {
> > -                           lpm->tbl24[i] = new_tbl24_entry;
> > +                           __atomic_store(&lpm->tbl24[i],
> &new_tbl24_entry,
> > +                                           __ATOMIC_RELEASE);
> >                     } else  if (lpm->tbl24[i].valid_group == 1) {
> >                             /*
> >                              * If TBL24 entry is extended, then there has
> @@ -1713,8
> > +1725,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t
> ip_masked,
> >     tbl8_recycle_index = tbl8_recycle_check_v1604(lpm->tbl8,
> > tbl8_group_start);
> >
> >     if (tbl8_recycle_index == -EINVAL) {
> > -           /* Set tbl24 before freeing tbl8 to avoid race condition. */
> > +           /* Set tbl24 before freeing tbl8 to avoid race condition.
> > +            * Prevent the free of the tbl8 group from hoisting.
> > +            */
> >             lpm->tbl24[tbl24_index].valid = 0;
> > +           __atomic_thread_fence(__ATOMIC_RELEASE);
> >             tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> >     } else if (tbl8_recycle_index > -1) {
> >             /* Update tbl24 entry. */
> > @@ -1725,8 +1740,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm,
> uint32_t ip_masked,
> >                     .depth = lpm->tbl8[tbl8_recycle_index].depth,
> >             };
> >
> > -           /* Set tbl24 before freeing tbl8 to avoid race condition. */
> > +           /* Set tbl24 before freeing tbl8 to avoid race condition.
> > +            * Prevent the free of the tbl8 group from hoisting.
> > +            */
> >             lpm->tbl24[tbl24_index] = new_tbl24_entry;
> > +           __atomic_thread_fence(__ATOMIC_RELEASE);
> >             tbl8_free_v1604(lpm->tbl8, tbl8_group_start);
> >     }
> >   #undef group_idx
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b886f54b4..6f5704c5c 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -354,6 +354,10 @@ rte_lpm_lookup(struct rte_lpm *lpm, uint32_t ip,
> uint32_t *next_hop)
> >     ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]);
> >     tbl_entry = *ptbl;
> >
> > +   /* Memory ordering is not required in lookup. Because dataflow
> > +    * dependency exists, compiler or HW won't be able to re-order
> > +    * the operations.
> > +    */
> >     /* Copy tbl8 entry (only if needed) */
> >     if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
> >                     RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
> 
> --
> Regards,
> Vladimir

Regards,
/Ruifeng

Re: [dpdk-dev] [PATCH v1 1/2] lib/lpm: memory orderings to avoid race conditions for v1604

Reply via email to