Introduce DPDK per-lcore id variables, or lcore variables for short.
An lcore variable has one value for every current and future lcore
id-equipped thread.
The primary <rte_lcore_var.h> use case is for statically allocating
small, frequently-accessed data structures, for which one instance
should exist for each lcore.
Lcore variables are similar to thread-local storage (TLS, e.g., C11
_Thread_local), but decoupling the values' life time with that of the
threads.
Lcore variables are also similar in terms of functionality provided by
FreeBSD kernel's DPCPU_*() family of macros and the associated
build-time machinery. DPCPU uses linker scripts, which effectively
prevents the reuse of its, otherwise seemingly viable, approach.
The currently-prevailing way to solve the same problem as lcore
variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
lcore variables over this approach is that data related to the same
lcore now is close (spatially, in memory), rather than data used by
the same module, which in turn avoid excessive use of padding,
polluting caches with unused data.
Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
Acked-by: Morten Brørup <m...@smartsharesystems.com>
--
PATCH v2:
* Add Windows support. (Morten Brørup)
* Fix lcore variables API index reference. (Morten Brørup)
* Various improvements of the API documentation. (Morten Brørup)
* Elimination of unused symbol in version.map. (Morten Brørup)
PATCH:
* Update MAINTAINERS and release notes.
* Stop covering included files in extern "C" {}.
RFC v6:
* Include <stdlib.h> to get aligned_alloc().
* Tweak documentation (grammar).
* Provide API-level guarantees that lcore variable values take on an
initial value of zero.
* Fix misplaced __rte_cache_aligned in the API doc example.
RFC v5:
* In Doxygen, consistenly use @<cmd> (and not \<cmd>).
* The RTE_LCORE_VAR_GET() and SET() convience access macros
covered an uncommon use case, where the lcore value is of a
primitive type, rather than a struct, and is thus eliminated
from the API. (Morten Brørup)
* In the wake up GET()/SET() removeal, rename RTE_LCORE_VAR_PTR()
RTE_LCORE_VAR_VALUE().
* The underscores are removed from __rte_lcore_var_lcore_ptr() to
signal that this function is a part of the public API.
* Macro arguments are documented.
RFV v4:
* Replace large static array with libc heap-allocated memory. One
implication of this change is there no longer exists a fixed upper
bound for the total amount of memory used by lcore variables.
RTE_MAX_LCORE_VAR has changed meaning, and now represent the
maximum size of any individual lcore variable value.
* Fix issues in example. (Morten Brørup)
* Improve access macro type checking. (Morten Brørup)
* Refer to the lcore variable handle as "handle" and not "name" in
various macros.
* Document lack of thread safety in rte_lcore_var_alloc().
* Provide API-level assurance the lcore variable handle is
always non-NULL, to all applications to use NULL to mean
"not yet allocated".
* Note zero-sized allocations are not allowed.
* Give API-level guarantee the lcore variable values are zeroed.
RFC v3:
* Replace use of GCC-specific alignof(<expression>) with alignof(<type>).
* Update example to reflect FOREACH macro name change (in RFC v2).
RFC v2:
* Use alignof to derive alignment requirements. (Morten Brørup)
* Change name of FOREACH to make it distinct from <rte_lcore.h>'s
*per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup)
* Allow user-specified alignment, but limit max to cache line size.
---
MAINTAINERS | 6 +
config/rte_config.h | 1 +
doc/api/doxy-api-index.md | 1 +
doc/guides/rel_notes/release_24_11.rst | 14 +
lib/eal/common/eal_common_lcore_var.c | 78 +++++
lib/eal/common/meson.build | 1 +
lib/eal/include/meson.build | 1 +
lib/eal/include/rte_lcore_var.h | 385 +++++++++++++++++++++++++
lib/eal/version.map | 2 +
9 files changed, 489 insertions(+)
create mode 100644 lib/eal/common/eal_common_lcore_var.c
create mode 100644 lib/eal/include/rte_lcore_var.h
diff --git a/MAINTAINERS b/MAINTAINERS
index c5a703b5c0..362d9a3f28 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -282,6 +282,12 @@ F: lib/eal/include/rte_random.h
F: lib/eal/common/rte_random.c
F: app/test/test_rand_perf.c
+Lcore Variables
+M: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
+F: lib/eal/include/rte_lcore_var.h
+F: lib/eal/common/eal_common_lcore_var.c
+F: app/test/test_lcore_var.c
+
ARM v7
M: Wathsala Vithanage <wathsala.vithan...@arm.com>
F: config/arm/
diff --git a/config/rte_config.h b/config/rte_config.h
index dd7bb0d35b..311692e498 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -41,6 +41,7 @@
/* EAL defines */
#define RTE_CACHE_GUARD_LINES 1
#define RTE_MAX_HEAPS 32
+#define RTE_MAX_LCORE_VAR 1048576
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9f0300126..ed577f14ee 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -99,6 +99,7 @@ The public API headers are grouped by topics:
[interrupts](@ref rte_interrupts.h),
[launch](@ref rte_launch.h),
[lcore](@ref rte_lcore.h),
+ [lcore variables](@ref rte_lcore_var.h),
[per-lcore](@ref rte_per_lcore.h),
[service cores](@ref rte_service.h),
[keepalive](@ref rte_keepalive.h),
diff --git a/doc/guides/rel_notes/release_24_11.rst
b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..a3884f7491 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,20 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Added EAL per-lcore static memory allocation facility.**
+
+ Added EAL API <rte_lcore_var.h> for statically allocating small,
+ frequently-accessed data structures, for which one instance should
+ exist for each EAL thread and registered non-EAL thread.
+
+ With lcore variables, data is organized spatially on a per-lcore id
+ basis, rather than per library or PMD, avoiding the need for cache
+ aligning (or RTE_CACHE_GUARDing) data structures, which in turn
+ reduces CPU cache internal fragmentation, improving performance.
+
+ Lcore variables are similar to thread-local storage (TLS, e.g.,
+ C11 _Thread_local), but decoupling the values' life time from that
+ of the threads.
Removed Items
-------------
diff --git a/lib/eal/common/eal_common_lcore_var.c
b/lib/eal/common/eal_common_lcore_var.c
new file mode 100644
index 0000000000..309822039b
--- /dev/null
+++ b/lib/eal/common/eal_common_lcore_var.c
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Ericsson AB
+ */
+
+#include <inttypes.h>
+#include <stdlib.h>
+
+#ifdef RTE_EXEC_ENV_WINDOWS
+#include <malloc.h>
+#endif
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include <rte_lcore_var.h>
+
+#include "eal_private.h"
+
+#define LCORE_BUFFER_SIZE (RTE_MAX_LCORE_VAR * RTE_MAX_LCORE)
+
+static void *lcore_buffer;
+static size_t offset = RTE_MAX_LCORE_VAR;
+
+static void *
+lcore_var_alloc(size_t size, size_t align)
+{
+ void *handle;
+ void *value;
+
+ offset = RTE_ALIGN_CEIL(offset, align);
+
+ if (offset + size > RTE_MAX_LCORE_VAR) {
+#ifdef RTE_EXEC_ENV_WINDOWS
+ lcore_buffer = _aligned_malloc(LCORE_BUFFER_SIZE,
+ RTE_CACHE_LINE_SIZE);
+#else
+ lcore_buffer = aligned_alloc(RTE_CACHE_LINE_SIZE,
+ LCORE_BUFFER_SIZE);
+#endif
+ RTE_VERIFY(lcore_buffer != NULL);
+
+ offset = 0;
+ }
+
+ handle = RTE_PTR_ADD(lcore_buffer, offset);
+
+ offset += size;
+
+ RTE_LCORE_VAR_FOREACH_VALUE(value, handle)
+ memset(value, 0, size);
+
+ EAL_LOG(DEBUG, "Allocated %"PRIuPTR" bytes of per-lcore data with a "
+ "%"PRIuPTR"-byte alignment", size, align);