On 2024-04-25 20:05, Tyler Retzlaff wrote:
On Thu, Apr 25, 2024 at 10:58:47AM +0200, Mattias Rönnblom wrote:
This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.



The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign]() which provides no memory ordering or
atomicity guarantees and no read-once or write-once semantics (e.g.,
no use of volatile), but does provide the best performance. The
performance degradation resulting from the use of volatile (e.g.,
forcing loads and stores to actually occur and in the number
specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

understood. i think the only downside is that if you do have an
_Atomic-specified type you'll have to cast the qualification away
to use the function like macro.


This is tricky, and I can't say I've really converged on an opinion, but it seems to me at this point you shouldn't mark anything _Atomic.

as a suggestion the _Generic legs could include both _Atomic-specified
and non-_Atomic-specified types where an intermediate inline function
could strip the qualification to use your core inline implementations.

_Generic((v), int *: __foo32, RTE_ATOMIC(int) *: __foo32_unqual)(v))

static inline void
__foo32(int *a) { ... }

static inline void
__foo32_unqual(RTE_ATOMIC(int) *a) { __foo32((int *)(uintptr_t)(a)); }

there is some similar prior art in newer ISO C23 with typeof_unqual.

https://en.cppreference.com/w/c/language/typeof


This is an interesting solution, but I'm not sure it's a problem that needs to be solved.


Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions (or more correctly, generic selection macros)
operate on both 32 and 64-bit words, with type checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

ack, can you confirm _Generic is usable from a C++ TU? i may be making a
mistake locally but using g++ version 11.4.0 -std=c++20 it wasn't
accepting it.

i think using _Generic is ideal, it eliminates mistakes when handling
the different integer sizes so if it turns out C++ doesn't want to
cooperate the function like macro can conditionally expand to a C++
template this will need to be done for MSVC since i can confirm
_Generic does not work with MSVC C++.


That's unfortunate.

No, I didn't try it with C++. I just assumed _Generic was C++ as well.

The naive solution would be to include two overloaded functions per function-like macro.

#ifdef __cplusplus

#undef rte_bit_set

static inline void
rte_bit_set(uint32_t *addr, unsigned int nr)
{
    __rte_bit_set32(addr, nr);
}

static inline void
rte_bit_set(uint64_t *addr, unsigned int nr)
{
    __rte_bit_set64(addr, nr);
}
#endif

Did you have something more clever/less verbose in mind? The best would if one could have a completely generic C++-compatible replacement of _Generic, but it's not obvious how that would work.

What's the minimum C++ version required by DPDK? C++11?


The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

  * Have all functions returning a bit number have the same return type
    (i.e., unsigned int).
  * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
  * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
    for useful/used bit-level GCC builtins.
  * Eliminate the MSVC #ifdef-induced documentation duplication.
  * _Generic versions of things like rte_popcount32(). (?)

it would be nice to see them all converted, at the time i added them we
still hadn't adopted C11 so was limited. but certainly not asking for it
as a part of this series.


Mattias Rönnblom (6):
   eal: extend bit manipulation functionality
   eal: add unit tests for bit operations
   eal: add exactly-once bit access functions
   eal: add unit tests for exactly-once bit access functions
   eal: add atomic bit operations
   eal: add unit tests for atomic bit access functions

  app/test/test_bitops.c       | 319 +++++++++++++++++-
  lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
  2 files changed, 925 insertions(+), 18 deletions(-)

--

Series-acked-by: Tyler Retzlaff <roret...@linux.microsoft.com>

2.34.1

Reply via email to