Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API

Tyler Retzlaff Thu, 26 Oct 2023 09:37:01 -0700

On Thu, Oct 26, 2023 at 04:24:54AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Tyler Retzlaff <[email protected]>
> > Sent: Thursday, October 26, 2023 6:38 AM
> > To: Ruifeng Wang <[email protected]>
> > Cc: [email protected]; Akhil Goyal <[email protected]>; Anatoly Burakov
> > <[email protected]>; Andrew Rybchenko 
> > <[email protected]>; Bruce
> > Richardson <[email protected]>; Chenbo Xia <[email protected]>; 
> > Ciara Power
> > <[email protected]>; David Christensen <[email protected]>; David 
> > Hunt
> > <[email protected]>; Dmitry Kozlyuk <[email protected]>; Dmitry 
> > Malloy
> > <[email protected]>; Elena Agostini <[email protected]>; Erik 
> > Gabriel Carrillo
> > <[email protected]>; Fan Zhang <[email protected]>; Ferruh 
> > Yigit
> > <[email protected]>; Harman Kalra <[email protected]>; Harry van Haaren
> > <[email protected]>; Honnappa Nagarahalli 
> > <[email protected]>;
> > [email protected]; Konstantin Ananyev <[email protected]>; 
> > Matan Azrad
> > <[email protected]>; Maxime Coquelin <[email protected]>; Narcisa 
> > Ana Maria Vasile
> > <[email protected]>; Nicolas Chautru 
> > <[email protected]>; Olivier Matz
> > <[email protected]>; Ori Kam <[email protected]>; Pallavi Kadam
> > <[email protected]>; Pavan Nikhilesh <[email protected]>; 
> > Reshma Pattan
> > <[email protected]>; Sameh Gobriel <[email protected]>; Shijith 
> > Thotton
> > <[email protected]>; Sivaprasad Tummala <[email protected]>; 
> > Stephen Hemminger
> > <[email protected]>; Suanming Mou <[email protected]>; Sunil 
> > Kumar Kori
> > <[email protected]>; [email protected]; Viacheslav Ovsiienko 
> > <[email protected]>;
> > Vladimir Medvedkin <[email protected]>; Yipeng Wang 
> > <[email protected]>;
> > nd <[email protected]>
> > Subject: Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > 
> > On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote:
> > > > -----Original Message-----
> > > > From: Tyler Retzlaff <[email protected]>
> > > > Sent: Wednesday, October 18, 2023 4:31 AM
> > > > To: [email protected]
> > > > Cc: Akhil Goyal <[email protected]>; Anatoly Burakov
> > > > <[email protected]>; Andrew Rybchenko
> > > > <[email protected]>; Bruce Richardson
> > > > <[email protected]>; Chenbo Xia <[email protected]>;
> > > > Ciara Power <[email protected]>; David Christensen
> > > > <[email protected]>; David Hunt <[email protected]>; Dmitry
> > > > Kozlyuk <[email protected]>; Dmitry Malloy
> > > > <[email protected]>; Elena Agostini <[email protected]>; Erik
> > > > Gabriel Carrillo <[email protected]>; Fan Zhang
> > > > <[email protected]>; Ferruh Yigit <[email protected]>;
> > > > Harman Kalra <[email protected]>; Harry van Haaren
> > > > <[email protected]>; Honnappa Nagarahalli
> > > > <[email protected]>; [email protected]; Konstantin
> > > > Ananyev <[email protected]>; Matan Azrad
> > > > <[email protected]>; Maxime Coquelin <[email protected]>;
> > > > Narcisa Ana Maria Vasile <[email protected]>; Nicolas
> > > > Chautru <[email protected]>; Olivier Matz
> > > > <[email protected]>; Ori Kam <[email protected]>; Pallavi Kadam
> > > > <[email protected]>; Pavan Nikhilesh
> > > > <[email protected]>; Reshma Pattan <[email protected]>;
> > > > Sameh Gobriel <[email protected]>; Shijith Thotton
> > > > <[email protected]>; Sivaprasad Tummala
> > > > <[email protected]>; Stephen Hemminger
> > > > <[email protected]>; Suanming Mou <[email protected]>;
> > > > Sunil Kumar Kori <[email protected]>; [email protected];
> > > > Viacheslav Ovsiienko <[email protected]>; Vladimir Medvedkin
> > > > <[email protected]>; Yipeng Wang
> > > > <[email protected]>; Tyler Retzlaff
> > > > <[email protected]>
> > > > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > > >
> > > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > > corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > > Signed-off-by: Tyler Retzlaff <[email protected]>
> > > > ---
> > > >  lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
> > > >  lib/rcu/rte_rcu_qsbr.h | 68
> > > > +++++++++++++++++++++++++-------------------------
> > > >  2 files changed, 58 insertions(+), 58 deletions(-)
> > > >
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index
> > > > 17be93e..4dc7714 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.c
> > > > +++ b/lib/rcu/rte_rcu_qsbr.c
> > > > @@ -102,21 +102,21 @@
> > > >          * go out of sync. Hence, additional checks are required.
> > > >          */
> > > >         /* Check if the thread is already registered */
> > > > -       old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -                                       __ATOMIC_RELAXED);
> > > > +       old_bmap = 
> > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +                                       rte_memory_order_relaxed);
> > > >         if (old_bmap & 1UL << id)
> > > >                 return 0;
> > > >
> > > >         do {
> > > >                 new_bmap = old_bmap | (1UL << id);
> > > > -               success = __atomic_compare_exchange(
> > > > +               success = rte_atomic_compare_exchange_strong_explicit(
> > > >                                         __RTE_QSBR_THRID_ARRAY_ELM(v, 
> > > > i),
> > > > -                                       &old_bmap, &new_bmap, 0,
> > > > -                                       __ATOMIC_RELEASE, 
> > > > __ATOMIC_RELAXED);
> > > > +                                       &old_bmap, new_bmap,
> > > > +                                       rte_memory_order_release, 
> > > > rte_memory_order_relaxed);
> > > >
> > > >                 if (success)
> > > > -                       __atomic_fetch_add(&v->num_threads,
> > > > -                                               1, __ATOMIC_RELAXED);
> > > > +                       rte_atomic_fetch_add_explicit(&v->num_threads,
> > > > +                                               1, 
> > > > rte_memory_order_relaxed);
> > > >                 else if (old_bmap & (1UL << id))
> > > >                         /* Someone else registered this thread.
> > > >                          * Counter should not be incremented.
> > > > @@ -154,8 +154,8 @@
> > > >          * go out of sync. Hence, additional checks are required.
> > > >          */
> > > >         /* Check if the thread is already unregistered */
> > > > -       old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -                                       __ATOMIC_RELAXED);
> > > > +       old_bmap = 
> > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +                                       rte_memory_order_relaxed);
> > > >         if (!(old_bmap & (1UL << id)))
> > > >                 return 0;
> > > >
> > > > @@ -165,14 +165,14 @@
> > > >                  * completed before removal of the thread from the list 
> > > > of
> > > >                  * reporting threads.
> > > >                  */
> > > > -               success = __atomic_compare_exchange(
> > > > +               success = rte_atomic_compare_exchange_strong_explicit(
> > > >                                         __RTE_QSBR_THRID_ARRAY_ELM(v, 
> > > > i),
> > > > -                                       &old_bmap, &new_bmap, 0,
> > > > -                                       __ATOMIC_RELEASE, 
> > > > __ATOMIC_RELAXED);
> > > > +                                       &old_bmap, new_bmap,
> > > > +                                       rte_memory_order_release, 
> > > > rte_memory_order_relaxed);
> > > >
> > > >                 if (success)
> > > > -                       __atomic_fetch_sub(&v->num_threads,
> > > > -                                               1, __ATOMIC_RELAXED);
> > > > +                       rte_atomic_fetch_sub_explicit(&v->num_threads,
> > > > +                                               1, 
> > > > rte_memory_order_relaxed);
> > > >                 else if (!(old_bmap & (1UL << id)))
> > > >                         /* Someone else unregistered this thread.
> > > >                          * Counter should not be incremented.
> > > > @@ -227,8 +227,8 @@
> > > >
> > > >         fprintf(f, "  Registered thread IDs = ");
> > > >         for (i = 0; i < v->num_elems; i++) {
> > > > -               bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -                                       __ATOMIC_ACQUIRE);
> > > > +               bmap = 
> > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +                                       rte_memory_order_acquire);
> > > >                 id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > >                 while (bmap) {
> > > >                         t = __builtin_ctzl(bmap);
> > > > @@ -241,26 +241,26 @@
> > > >         fprintf(f, "\n");
> > > >
> > > >         fprintf(f, "  Token = %" PRIu64 "\n",
> > > > -                       __atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> > > > +                       rte_atomic_load_explicit(&v->token, 
> > > > rte_memory_order_acquire));
> > > >
> > > >         fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
> > > > -                       __atomic_load_n(&v->acked_token, 
> > > > __ATOMIC_ACQUIRE));
> > > > +                       rte_atomic_load_explicit(&v->acked_token,
> > > > +rte_memory_order_acquire));
> > > >
> > > >         fprintf(f, "Quiescent State Counts for readers:\n");
> > > >         for (i = 0; i < v->num_elems; i++) {
> > > > -               bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -                                       __ATOMIC_ACQUIRE);
> > > > +               bmap = 
> > > > rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +                                       rte_memory_order_acquire);
> > > >                 id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > >                 while (bmap) {
> > > >                         t = __builtin_ctzl(bmap);
> > > >                         fprintf(f, "thread ID = %u, count = %" PRIu64 
> > > > ", lock count = %u\n",
> > > >                                 id + t,
> > > > -                               __atomic_load_n(
> > > > +                               rte_atomic_load_explicit(
> > > >                                         &v->qsbr_cnt[id + t].cnt,
> > > > -                                       __ATOMIC_RELAXED),
> > > > -                               __atomic_load_n(
> > > > +                                       rte_memory_order_relaxed),
> > > > +                               rte_atomic_load_explicit(
> > > >                                         &v->qsbr_cnt[id + t].lock_cnt,
> > > > -                                       __ATOMIC_RELAXED));
> > > > +                                       rte_memory_order_relaxed));
> > > >                         bmap &= ~(1UL << t);
> > > >                 }
> > > >         }
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index
> > > > 87e1b55..9f4aed2 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.h
> > > > +++ b/lib/rcu/rte_rcu_qsbr.h
> > > > @@ -63,11 +63,11 @@
> > > >   * Given thread id needs to be converted to index into the array and
> > > >   * the id within the array element.
> > > >   */
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE
> > > > +(sizeof(RTE_ATOMIC(uint64_t)) *
> > > > +8)
> > > >  #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > > >         RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > > >                 __RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, 
> > > > RTE_CACHE_LINE_SIZE)
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *)
> > > > +\
> > >
> > > Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?
> > 
> > i'm not sure if you're asking about the resultant type of the expression or 
> > not?
> 
> I see other places are using specifier hence the question.
> 
> > 
> > in this context we aren't specifying an atomic type but rather adding the 
> > atomic qualifier
> > to what should already be a variable that has an atomic specified type with 
> > a cast which
> > is why we use __rte_atomic.
> 
> I read from document [1] that atomic qualified type may have a different size 
> from the original type.
> If that is the case, the size difference could cause issue in bitmap array 
> accessing.
> Did I misunderstand?
> 
> [1] https://en.cppreference.com/w/c/language/atomic
>


you do not misunderstand, the standard allows atomic specified type
sizes to differ from their ordinary native type sizes. though i have
some issues with how cppreference is wording things here as compared
with the actual standard.

one of the reasons is it allows all standard atomic functions to be
'generic' which means they can be used on objects of arbitrary size
instead of just integer and pointer types. i.e. you can use it on
struct, union and array types.

it's implementation defined how the operations are made atomic and
is obviously target processor dependent, but in cases when the processor
has no intrinsic support to perform the operation atomically the toolchain
may generate the code that uses a hidden lock. you can test whether this
is the case for arbitrary objects using standard specified atomic_is_lock_free.
https://en.cppreference.com/w/c/atomic/atomic_is_lock_free

so that's the long answer form of why they *may* be different size,
alignment etc.. but the real question is in this instance will it be?

probably not.

mainly because it wouldn't make a lot of sense for clang/gcc to suddenly
decide that sizeof(uint64_t) != sizeof(_Atomic(uint64_t)) or that they
should need to use a lock on amd64 processor to load/store atomically
(assuming native alignment) etc..

a lot of the above is why we had a lot of discussion about how and when
we could enable the use of standard C11 atomics in dpdk. as you've
probably noticed for existing platforms, toolchains and targets it is
actually defaulted off, but it does allow binary packagers or users to
build with it on.

for compatibility only the strictest of guarantees can be made when dpdk
and the application are both built consistently to use or not use
standard atomics. it is strongly cautioned that applications should not
attempt to use an unmatched atomic operation on a dpdk atomic object.
i.e. if you enabled standard atomics, don't use __atomic_load_n directly
on a field from a public dpdk structure, instead use
rte_atomic_load_explicit and make sure your application defines
RTE_ENABLE_STDATOMIC.

hope this explanation helps.

> > 
> > >
> > > >         ((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > > #define __RTE_QSBR_THRID_INDEX_SHIFT 6  #define
> > > > __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> > > >
> > >
> > > <snip>

Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API

Reply via email to