This RFC introduces fastmem, a general-purpose small-object allocator for DPDK. It is intended to replace per-type mempools with a single allocator that handles arbitrary sizes, grows on demand, and matches mempool-level performance on the hot path.
Motivation ---------- DPDK applications commonly maintain many mempools — one per object type (connections, sessions, timers, work items). Each must be sized up front, wastes memory when over-provisioned, and cannot serve objects of a different size. Fastmem eliminates this by accepting arbitrary sizes at runtime, backed by a slab allocator that repurposes memory across size classes as demand shifts. Design ------ Three-layer architecture: 1. Backing memory: 128 MiB IOVA-contiguous memzones from EAL, reserved lazily (or pre-reserved for deterministic latency). 2. Slabs: 2 MiB, 2 MiB-aligned regions carved from memzones. The alignment enables O(1) slab lookup from any object pointer via bitmask — no radix tree or index structure. Slabs move freely between 18 power-of-2 size classes (8 B to 1 MiB). 3. Per-lcore caches: bounded LIFO stacks (no locks on the hot path). Cache misses trigger bulk transfers to/from the shared bin under a spinlock. Key properties: - Zero per-object metadata in the production build. - NUMA-aware, with per-socket bins and free-slab pools. - DMA-usable memory with O(1) virt-to-IOVA translation. - Bulk alloc/free with all-or-nothing semantics. - Backing memory never returned during lifetime (slabs recycled). - Non-EAL threads supported (bypass cache, take bin lock). API surface ----------- rte_fastmem_init / deinit rte_fastmem_reserve rte_fastmem_set_limit / get_limit rte_fastmem_alloc / alloc_socket rte_fastmem_alloc_bulk / alloc_bulk_socket rte_fastmem_free / free_bulk rte_fastmem_virt2iova rte_fastmem_cache_flush rte_fastmem_max_size / classes rte_fastmem_stats / stats_class / stats_lcore / stats_lcore_class rte_fastmem_stats_reset All APIs are marked __rte_experimental. Performance ----------- The single-object hot path is roughly 2-3x the cost of mempool and an order of magnitude faster than rte_malloc. Under multi-lcore contention, fastmem scales similarly to mempool, while rte_malloc collapses. Limitations ----------- - Maximum allocation: 1 MiB. Larger requests should use rte_malloc. - Power-of-2 classes only; worst-case internal fragmentation ~50%. - Backing memory not reclaimable short of deinit. Future work ----------- - Lcore-affine allocations (false-sharing-free by construction). - Mempool ops driver for transparent drop-in use. - Pre-resolved allocator handle binding size class and socket, eliminating per-call class lookup and enabling an inline cache-hit fast path. - Debug mode (cookies, double-free detection, poison-on-free). - Telemetry integration. - EAL integration, allowing EAL-internal subsystems to use fastmem for their small-object allocations. Mattias Rönnblom (3): doc: add fastmem programming guide lib: add fastmem library app/test: add fastmem test suite app/test/meson.build | 3 + app/test/test_fastmem.c | 1682 +++++++++++++++++++++++++ app/test/test_fastmem_perf.c | 997 +++++++++++++++ app/test/test_fastmem_profile.c | 157 +++ doc/api/doxy-api-index.md | 1 + doc/api/doxy-api.conf.in | 1 + doc/guides/prog_guide/fastmem_lib.rst | 301 +++++ doc/guides/prog_guide/index.rst | 1 + lib/fastmem/meson.build | 6 + lib/fastmem/rte_fastmem.c | 1486 ++++++++++++++++++++++ lib/fastmem/rte_fastmem.h | 644 ++++++++++ lib/meson.build | 1 + 12 files changed, 5280 insertions(+) create mode 100644 app/test/test_fastmem.c create mode 100644 app/test/test_fastmem_perf.c create mode 100644 app/test/test_fastmem_profile.c create mode 100644 doc/guides/prog_guide/fastmem_lib.rst create mode 100644 lib/fastmem/meson.build create mode 100644 lib/fastmem/rte_fastmem.c create mode 100644 lib/fastmem/rte_fastmem.h -- 2.43.0

