Hi Vladimir, > -----Original Message----- > From: Medvedkin, Vladimir <vladimir.medved...@intel.com> > Sent: Wednesday, June 5, 2019 18:50 > To: Ruifeng Wang (Arm Technology China) <ruifeng.w...@arm.com>; > bruce.richard...@intel.com > Cc: dev@dpdk.org; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology China) > <gavin...@arm.com>; nd <n...@arm.com> > Subject: Re: [PATCH v1 1/2] lib/lpm: memory orderings to avoid race > conditions for v1604 > > Hi Wang, > > On 05/06/2019 06:54, Ruifeng Wang wrote: > > When a tbl8 group is getting attached to a tbl24 entry, lookup might > > fail even though the entry is configured in the table. > > > > For ex: consider a LPM table configured with 10.10.10.1/24. > > When a new entry 10.10.10.32/28 is being added, a new tbl8 group is > > allocated and tbl24 entry is changed to point to the tbl8 group. If > > the tbl24 entry is written without the tbl8 group entries updated, a > > lookup on 10.10.10.9 will return failure. > > > > Correct memory orderings are required to ensure that the store to > > tbl24 does not happen before the stores to tbl8 group entries > > complete. > > > > The orderings have impact on LPM performance test. > > On Arm A72 platform, delete operation has 2.7% degradation, while add > > / lookup has no notable performance change. > > On x86 E5 platform, add operation has 4.3% degradation, delete > > operation has 2.2% - 10.2% degradation, lookup has no performance > > change. > > I think it is possible to avoid add/del performance degradation > > 1. Explicitly mark struct rte_lpm_tbl_entry 4-byte aligned > > 2. Cast value to uint32_t (uint16_t for 2.0 version) on memory write > > 3. Use rte_wmb() after memory write >
Thanks for your suggestions. Point 1 & 2 make sense. For point 3, are you suggesting using rte_wmb() instead of __atomic_store()? rte_wmb() is DPDK made memory model. Maybe we can use __atomic_store() with 'RTE_USE_C11_MEM_MODEL=y', and use rte_wmb() otherwise? > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > > --- > > lib/librte_lpm/rte_lpm.c | 32 +++++++++++++++++++++++++------- > > lib/librte_lpm/rte_lpm.h | 4 ++++ > > 2 files changed, 29 insertions(+), 7 deletions(-) > > > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index > > 6b7b28a2e..6ec450a08 100644 > > --- a/lib/librte_lpm/rte_lpm.c > > +++ b/lib/librte_lpm/rte_lpm.c > > @@ -806,7 +806,8 @@ add_depth_small_v1604(struct rte_lpm *lpm, > uint32_t ip, uint8_t depth, > > /* Setting tbl24 entry in one go to avoid race > > * conditions > > */ > > - lpm->tbl24[i] = new_tbl24_entry; > > + __atomic_store(&lpm->tbl24[i], &new_tbl24_entry, > > + __ATOMIC_RELEASE); > > > > continue; > > } > > @@ -1017,7 +1018,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, > uint32_t ip_masked, uint8_t depth, > > .depth = 0, > > }; > > > > - lpm->tbl24[tbl24_index] = new_tbl24_entry; > > + /* The tbl24 entry must be written only after the > > + * tbl8 entries are written. > > + */ > > + __atomic_store(&lpm->tbl24[tbl24_index], > &new_tbl24_entry, > > + __ATOMIC_RELEASE); > > > > } /* If valid entry but not extended calculate the index into Table8. */ > > else if (lpm->tbl24[tbl24_index].valid_group == 0) { @@ -1063,7 > > +1068,11 @@ add_depth_big_v1604(struct rte_lpm *lpm, uint32_t > ip_masked, uint8_t depth, > > .depth = 0, > > }; > > > > - lpm->tbl24[tbl24_index] = new_tbl24_entry; > > + /* The tbl24 entry must be written only after the > > + * tbl8 entries are written. > > + */ > > + __atomic_store(&lpm->tbl24[tbl24_index], > &new_tbl24_entry, > > + __ATOMIC_RELEASE); > > > > } else { /* > > * If it is valid, extended entry calculate the index into tbl8. > > @@ -1391,6 +1400,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm, > uint32_t ip_masked, > > /* Calculate the range and index into Table24. */ > > tbl24_range = depth_to_range(depth); > > tbl24_index = (ip_masked >> 8); > > + struct rte_lpm_tbl_entry zero_tbl24_entry = {0}; > > > > /* > > * Firstly check the sub_rule_index. A -1 indicates no replacement > > rule @@ -1405,7 +1415,8 @@ delete_depth_small_v1604(struct rte_lpm > > *lpm, uint32_t ip_masked, > > > > if (lpm->tbl24[i].valid_group == 0 && > > lpm->tbl24[i].depth <= depth) { > > - lpm->tbl24[i].valid = INVALID; > > + __atomic_store(&lpm->tbl24[i], > > + &zero_tbl24_entry, > __ATOMIC_RELEASE); > > } else if (lpm->tbl24[i].valid_group == 1) { > > /* > > * If TBL24 entry is extended, then there has > @@ -1450,7 +1461,8 > > @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > > > > if (lpm->tbl24[i].valid_group == 0 && > > lpm->tbl24[i].depth <= depth) { > > - lpm->tbl24[i] = new_tbl24_entry; > > + __atomic_store(&lpm->tbl24[i], > &new_tbl24_entry, > > + __ATOMIC_RELEASE); > > } else if (lpm->tbl24[i].valid_group == 1) { > > /* > > * If TBL24 entry is extended, then there has > @@ -1713,8 > > +1725,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t > ip_masked, > > tbl8_recycle_index = tbl8_recycle_check_v1604(lpm->tbl8, > > tbl8_group_start); > > > > if (tbl8_recycle_index == -EINVAL) { > > - /* Set tbl24 before freeing tbl8 to avoid race condition. */ > > + /* Set tbl24 before freeing tbl8 to avoid race condition. > > + * Prevent the free of the tbl8 group from hoisting. > > + */ > > lpm->tbl24[tbl24_index].valid = 0; > > + __atomic_thread_fence(__ATOMIC_RELEASE); > > tbl8_free_v1604(lpm->tbl8, tbl8_group_start); > > } else if (tbl8_recycle_index > -1) { > > /* Update tbl24 entry. */ > > @@ -1725,8 +1740,11 @@ delete_depth_big_v1604(struct rte_lpm *lpm, > uint32_t ip_masked, > > .depth = lpm->tbl8[tbl8_recycle_index].depth, > > }; > > > > - /* Set tbl24 before freeing tbl8 to avoid race condition. */ > > + /* Set tbl24 before freeing tbl8 to avoid race condition. > > + * Prevent the free of the tbl8 group from hoisting. > > + */ > > lpm->tbl24[tbl24_index] = new_tbl24_entry; > > + __atomic_thread_fence(__ATOMIC_RELEASE); > > tbl8_free_v1604(lpm->tbl8, tbl8_group_start); > > } > > #undef group_idx > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index > > b886f54b4..6f5704c5c 100644 > > --- a/lib/librte_lpm/rte_lpm.h > > +++ b/lib/librte_lpm/rte_lpm.h > > @@ -354,6 +354,10 @@ rte_lpm_lookup(struct rte_lpm *lpm, uint32_t ip, > uint32_t *next_hop) > > ptbl = (const uint32_t *)(&lpm->tbl24[tbl24_index]); > > tbl_entry = *ptbl; > > > > + /* Memory ordering is not required in lookup. Because dataflow > > + * dependency exists, compiler or HW won't be able to re-order > > + * the operations. > > + */ > > /* Copy tbl8 entry (only if needed) */ > > if (unlikely((tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > > RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > > -- > Regards, > Vladimir Regards, /Ruifeng