On Thu, Oct 26, 2023 at 04:24:54AM +0000, Ruifeng Wang wrote: > > -----Original Message----- > > From: Tyler Retzlaff <[email protected]> > > Sent: Thursday, October 26, 2023 6:38 AM > > To: Ruifeng Wang <[email protected]> > > Cc: [email protected]; Akhil Goyal <[email protected]>; Anatoly Burakov > > <[email protected]>; Andrew Rybchenko > > <[email protected]>; Bruce > > Richardson <[email protected]>; Chenbo Xia <[email protected]>; > > Ciara Power > > <[email protected]>; David Christensen <[email protected]>; David > > Hunt > > <[email protected]>; Dmitry Kozlyuk <[email protected]>; Dmitry > > Malloy > > <[email protected]>; Elena Agostini <[email protected]>; Erik > > Gabriel Carrillo > > <[email protected]>; Fan Zhang <[email protected]>; Ferruh > > Yigit > > <[email protected]>; Harman Kalra <[email protected]>; Harry van Haaren > > <[email protected]>; Honnappa Nagarahalli > > <[email protected]>; > > [email protected]; Konstantin Ananyev <[email protected]>; > > Matan Azrad > > <[email protected]>; Maxime Coquelin <[email protected]>; Narcisa > > Ana Maria Vasile > > <[email protected]>; Nicolas Chautru > > <[email protected]>; Olivier Matz > > <[email protected]>; Ori Kam <[email protected]>; Pallavi Kadam > > <[email protected]>; Pavan Nikhilesh <[email protected]>; > > Reshma Pattan > > <[email protected]>; Sameh Gobriel <[email protected]>; Shijith > > Thotton > > <[email protected]>; Sivaprasad Tummala <[email protected]>; > > Stephen Hemminger > > <[email protected]>; Suanming Mou <[email protected]>; Sunil > > Kumar Kori > > <[email protected]>; [email protected]; Viacheslav Ovsiienko > > <[email protected]>; > > Vladimir Medvedkin <[email protected]>; Yipeng Wang > > <[email protected]>; > > nd <[email protected]> > > Subject: Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API > > > > On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote: > > > > -----Original Message----- > > > > From: Tyler Retzlaff <[email protected]> > > > > Sent: Wednesday, October 18, 2023 4:31 AM > > > > To: [email protected] > > > > Cc: Akhil Goyal <[email protected]>; Anatoly Burakov > > > > <[email protected]>; Andrew Rybchenko > > > > <[email protected]>; Bruce Richardson > > > > <[email protected]>; Chenbo Xia <[email protected]>; > > > > Ciara Power <[email protected]>; David Christensen > > > > <[email protected]>; David Hunt <[email protected]>; Dmitry > > > > Kozlyuk <[email protected]>; Dmitry Malloy > > > > <[email protected]>; Elena Agostini <[email protected]>; Erik > > > > Gabriel Carrillo <[email protected]>; Fan Zhang > > > > <[email protected]>; Ferruh Yigit <[email protected]>; > > > > Harman Kalra <[email protected]>; Harry van Haaren > > > > <[email protected]>; Honnappa Nagarahalli > > > > <[email protected]>; [email protected]; Konstantin > > > > Ananyev <[email protected]>; Matan Azrad > > > > <[email protected]>; Maxime Coquelin <[email protected]>; > > > > Narcisa Ana Maria Vasile <[email protected]>; Nicolas > > > > Chautru <[email protected]>; Olivier Matz > > > > <[email protected]>; Ori Kam <[email protected]>; Pallavi Kadam > > > > <[email protected]>; Pavan Nikhilesh > > > > <[email protected]>; Reshma Pattan <[email protected]>; > > > > Sameh Gobriel <[email protected]>; Shijith Thotton > > > > <[email protected]>; Sivaprasad Tummala > > > > <[email protected]>; Stephen Hemminger > > > > <[email protected]>; Suanming Mou <[email protected]>; > > > > Sunil Kumar Kori <[email protected]>; [email protected]; > > > > Viacheslav Ovsiienko <[email protected]>; Vladimir Medvedkin > > > > <[email protected]>; Yipeng Wang > > > > <[email protected]>; Tyler Retzlaff > > > > <[email protected]> > > > > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API > > > > > > > > Replace the use of gcc builtin __atomic_xxx intrinsics with > > > > corresponding rte_atomic_xxx optional stdatomic API > > > > > > > > Signed-off-by: Tyler Retzlaff <[email protected]> > > > > --- > > > > lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------ > > > > lib/rcu/rte_rcu_qsbr.h | 68 > > > > +++++++++++++++++++++++++------------------------- > > > > 2 files changed, 58 insertions(+), 58 deletions(-) > > > > > > > > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index > > > > 17be93e..4dc7714 100644 > > > > --- a/lib/rcu/rte_rcu_qsbr.c > > > > +++ b/lib/rcu/rte_rcu_qsbr.c > > > > @@ -102,21 +102,21 @@ > > > > * go out of sync. Hence, additional checks are required. > > > > */ > > > > /* Check if the thread is already registered */ > > > > - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > - __ATOMIC_RELAXED); > > > > + old_bmap = > > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > + rte_memory_order_relaxed); > > > > if (old_bmap & 1UL << id) > > > > return 0; > > > > > > > > do { > > > > new_bmap = old_bmap | (1UL << id); > > > > - success = __atomic_compare_exchange( > > > > + success = rte_atomic_compare_exchange_strong_explicit( > > > > __RTE_QSBR_THRID_ARRAY_ELM(v, > > > > i), > > > > - &old_bmap, &new_bmap, 0, > > > > - __ATOMIC_RELEASE, > > > > __ATOMIC_RELAXED); > > > > + &old_bmap, new_bmap, > > > > + rte_memory_order_release, > > > > rte_memory_order_relaxed); > > > > > > > > if (success) > > > > - __atomic_fetch_add(&v->num_threads, > > > > - 1, __ATOMIC_RELAXED); > > > > + rte_atomic_fetch_add_explicit(&v->num_threads, > > > > + 1, > > > > rte_memory_order_relaxed); > > > > else if (old_bmap & (1UL << id)) > > > > /* Someone else registered this thread. > > > > * Counter should not be incremented. > > > > @@ -154,8 +154,8 @@ > > > > * go out of sync. Hence, additional checks are required. > > > > */ > > > > /* Check if the thread is already unregistered */ > > > > - old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > - __ATOMIC_RELAXED); > > > > + old_bmap = > > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > + rte_memory_order_relaxed); > > > > if (!(old_bmap & (1UL << id))) > > > > return 0; > > > > > > > > @@ -165,14 +165,14 @@ > > > > * completed before removal of the thread from the list > > > > of > > > > * reporting threads. > > > > */ > > > > - success = __atomic_compare_exchange( > > > > + success = rte_atomic_compare_exchange_strong_explicit( > > > > __RTE_QSBR_THRID_ARRAY_ELM(v, > > > > i), > > > > - &old_bmap, &new_bmap, 0, > > > > - __ATOMIC_RELEASE, > > > > __ATOMIC_RELAXED); > > > > + &old_bmap, new_bmap, > > > > + rte_memory_order_release, > > > > rte_memory_order_relaxed); > > > > > > > > if (success) > > > > - __atomic_fetch_sub(&v->num_threads, > > > > - 1, __ATOMIC_RELAXED); > > > > + rte_atomic_fetch_sub_explicit(&v->num_threads, > > > > + 1, > > > > rte_memory_order_relaxed); > > > > else if (!(old_bmap & (1UL << id))) > > > > /* Someone else unregistered this thread. > > > > * Counter should not be incremented. > > > > @@ -227,8 +227,8 @@ > > > > > > > > fprintf(f, " Registered thread IDs = "); > > > > for (i = 0; i < v->num_elems; i++) { > > > > - bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > - __ATOMIC_ACQUIRE); > > > > + bmap = > > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > + rte_memory_order_acquire); > > > > id = i << __RTE_QSBR_THRID_INDEX_SHIFT; > > > > while (bmap) { > > > > t = __builtin_ctzl(bmap); > > > > @@ -241,26 +241,26 @@ > > > > fprintf(f, "\n"); > > > > > > > > fprintf(f, " Token = %" PRIu64 "\n", > > > > - __atomic_load_n(&v->token, __ATOMIC_ACQUIRE)); > > > > + rte_atomic_load_explicit(&v->token, > > > > rte_memory_order_acquire)); > > > > > > > > fprintf(f, " Least Acknowledged Token = %" PRIu64 "\n", > > > > - __atomic_load_n(&v->acked_token, > > > > __ATOMIC_ACQUIRE)); > > > > + rte_atomic_load_explicit(&v->acked_token, > > > > +rte_memory_order_acquire)); > > > > > > > > fprintf(f, "Quiescent State Counts for readers:\n"); > > > > for (i = 0; i < v->num_elems; i++) { > > > > - bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > - __ATOMIC_ACQUIRE); > > > > + bmap = > > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i), > > > > + rte_memory_order_acquire); > > > > id = i << __RTE_QSBR_THRID_INDEX_SHIFT; > > > > while (bmap) { > > > > t = __builtin_ctzl(bmap); > > > > fprintf(f, "thread ID = %u, count = %" PRIu64 > > > > ", lock count = %u\n", > > > > id + t, > > > > - __atomic_load_n( > > > > + rte_atomic_load_explicit( > > > > &v->qsbr_cnt[id + t].cnt, > > > > - __ATOMIC_RELAXED), > > > > - __atomic_load_n( > > > > + rte_memory_order_relaxed), > > > > + rte_atomic_load_explicit( > > > > &v->qsbr_cnt[id + t].lock_cnt, > > > > - __ATOMIC_RELAXED)); > > > > + rte_memory_order_relaxed)); > > > > bmap &= ~(1UL << t); > > > > } > > > > } > > > > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index > > > > 87e1b55..9f4aed2 100644 > > > > --- a/lib/rcu/rte_rcu_qsbr.h > > > > +++ b/lib/rcu/rte_rcu_qsbr.h > > > > @@ -63,11 +63,11 @@ > > > > * Given thread id needs to be converted to index into the array and > > > > * the id within the array element. > > > > */ > > > > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8) > > > > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE > > > > +(sizeof(RTE_ATOMIC(uint64_t)) * > > > > +8) > > > > #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \ > > > > RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \ > > > > __RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, > > > > RTE_CACHE_LINE_SIZE) > > > > -#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \ > > > > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) > > > > +\ > > > > > > Is it equivalent to ((RTE_ATOMIC(uint64_t) *)? > > > > i'm not sure if you're asking about the resultant type of the expression or > > not? > > I see other places are using specifier hence the question. > > > > > in this context we aren't specifying an atomic type but rather adding the > > atomic qualifier > > to what should already be a variable that has an atomic specified type with > > a cast which > > is why we use __rte_atomic. > > I read from document [1] that atomic qualified type may have a different size > from the original type. > If that is the case, the size difference could cause issue in bitmap array > accessing. > Did I misunderstand? > > [1] https://en.cppreference.com/w/c/language/atomic >
you do not misunderstand, the standard allows atomic specified type sizes to differ from their ordinary native type sizes. though i have some issues with how cppreference is wording things here as compared with the actual standard. one of the reasons is it allows all standard atomic functions to be 'generic' which means they can be used on objects of arbitrary size instead of just integer and pointer types. i.e. you can use it on struct, union and array types. it's implementation defined how the operations are made atomic and is obviously target processor dependent, but in cases when the processor has no intrinsic support to perform the operation atomically the toolchain may generate the code that uses a hidden lock. you can test whether this is the case for arbitrary objects using standard specified atomic_is_lock_free. https://en.cppreference.com/w/c/atomic/atomic_is_lock_free so that's the long answer form of why they *may* be different size, alignment etc.. but the real question is in this instance will it be? probably not. mainly because it wouldn't make a lot of sense for clang/gcc to suddenly decide that sizeof(uint64_t) != sizeof(_Atomic(uint64_t)) or that they should need to use a lock on amd64 processor to load/store atomically (assuming native alignment) etc.. a lot of the above is why we had a lot of discussion about how and when we could enable the use of standard C11 atomics in dpdk. as you've probably noticed for existing platforms, toolchains and targets it is actually defaulted off, but it does allow binary packagers or users to build with it on. for compatibility only the strictest of guarantees can be made when dpdk and the application are both built consistently to use or not use standard atomics. it is strongly cautioned that applications should not attempt to use an unmatched atomic operation on a dpdk atomic object. i.e. if you enabled standard atomics, don't use __atomic_load_n directly on a field from a public dpdk structure, instead use rte_atomic_load_explicit and make sure your application defines RTE_ENABLE_STDATOMIC. hope this explanation helps. > > > > > > > > > ((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i) > > > > #define __RTE_QSBR_THRID_INDEX_SHIFT 6 #define > > > > __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@ > > > > > > > > > > <snip>

