On 5/25/26 16:30, Stephen Hemminger wrote:
On Mon, 25 May 2026 12:36:39 +0200
Mattias Rönnblom <[email protected]> wrote:
This RFC introduces fastmem, a general-purpose small-object allocator
for DPDK. It is intended to replace per-type mempools with a single
allocator that handles arbitrary sizes, grows on demand, and matches
mempool-level performance on the hot path.
Makes sense, what a simple wrapper inline to allow full replacement
testing/performance A/B comparison?
Do you mean a mempool or a heap wrapper? Or both?
I haven't looked into what options there are with mempools. A mempool
driver should be possible, but then I guess one might attempt a
whole-sale mempool-compatible API as well.
The role(s) fastmem could serve are:
a) An lcore/fast path small-object allocator when you don't know the
object size and/or count beforehand (i.e., what the cover letter says).
b) Do what mempools do and a.
c) Do what the rte_malloc heap does, but lcore/fast path-friendly. In
other words, option a but with larger objects too.
e) Something that's both b and c.
I haven't really formed an opinion yet, other than that option a seems
like a natural first step.
Fastmem is significantly slower than mempools for the moment. Claude
will tell you to inline, but that doesn't help (at least not in the
micro benchmarks). Then it will tell you to go remove the statistics,
which also doesn't help. (Latency is data dependency-driven, so stats
load/store/compute runs on resources that otherwise would have been idle.)
What does help however is pre-compute socket and bin-related info and
put into a handle, which the application may optionally use to quickly
retrieve objects of-a-certain-size/from-a-certain-socket. Still slower
than mempool though.
=== Scenario 1: Single-object hot path — cycles per (alloc + free) ===
allocator 8 B 64 B 256 B 1024 B 4096 B
fastmem 16.9 16.7 17.7 17.6 17.9
fastmem_h 9.5 9.4 9.5 9.5 9.4
mempool 6.9 6.9 6.9 7.0 6.6
rte_malloc 93.7 93.8 94.8 100.1 130.0
libc 118.8 119.2 20.4 20.4 111.0
=== Scenario 2: Batch alloc, FIFO free — cycles per alloc ===
allocator 8 B 64 B 256 B 1024 B 4096 B
fastmem 10.1 10.2 10.8 12.7 14.7
fastmem_h 6.8 6.7 7.4 9.3 11.4
mempool 4.2 4.1 4.1 4.1 4.1
rte_malloc 58.6 58.5 62.1 67.5 68.5
libc 104.8 104.6 73.7 203.9 1254.0
Intel(R) Xeon(R) Gold 6421N / Ubuntu 24.04 / clang
Best regards,
Mattias