On 5/25/26 16:30, Stephen Hemminger wrote:
On Mon, 25 May 2026 12:36:39 +0200
Mattias Rönnblom <[email protected]> wrote:

This RFC introduces fastmem, a general-purpose small-object allocator
for DPDK. It is intended to replace per-type mempools with a single
allocator that handles arbitrary sizes, grows on demand, and matches
mempool-level performance on the hot path.

Makes sense, what a simple wrapper inline to allow full replacement
testing/performance A/B comparison?

Do you mean a mempool or a heap wrapper? Or both?

I haven't looked into what options there are with mempools. A mempool driver should be possible, but then I guess one might attempt a whole-sale mempool-compatible API as well.

The role(s) fastmem could serve are:
a) An lcore/fast path small-object allocator when you don't know the object size and/or count beforehand (i.e., what the cover letter says).
b) Do what mempools do and a.
c) Do what the rte_malloc heap does, but lcore/fast path-friendly. In other words, option a but with larger objects too.
e) Something that's both b and c.

I haven't really formed an opinion yet, other than that option a seems like a natural first step.

Fastmem is significantly slower than mempools for the moment. Claude will tell you to inline, but that doesn't help (at least not in the micro benchmarks). Then it will tell you to go remove the statistics, which also doesn't help. (Latency is data dependency-driven, so stats load/store/compute runs on resources that otherwise would have been idle.)

What does help however is pre-compute socket and bin-related info and put into a handle, which the application may optionally use to quickly retrieve objects of-a-certain-size/from-a-certain-socket. Still slower than mempool though.

=== Scenario 1: Single-object hot path — cycles per (alloc + free) ===
allocator             8 B         64 B        256 B       1024 B       4096 B
fastmem              16.9         16.7         17.7         17.6         17.9
fastmem_h             9.5          9.4          9.5          9.5          9.4
mempool               6.9          6.9          6.9          7.0          6.6
rte_malloc           93.7         93.8         94.8        100.1        130.0
libc                118.8        119.2         20.4         20.4        111.0

=== Scenario 2: Batch alloc, FIFO free — cycles per alloc ===
allocator             8 B         64 B        256 B       1024 B       4096 B
fastmem              10.1         10.2         10.8         12.7         14.7
fastmem_h             6.8          6.7          7.4          9.3         11.4
mempool               4.2          4.1          4.1          4.1          4.1
rte_malloc           58.6         58.5         62.1         67.5         68.5
libc                104.8        104.6         73.7        203.9       1254.0

Intel(R) Xeon(R) Gold 6421N / Ubuntu 24.04 / clang

Best regards,
        Mattias

Reply via email to