On Fri, 7 Nov 2025 16:57:37 +0000
Rami Neiman <[email protected]> wrote:

> Hi all,
> I have a design question regarding rte_ring that I didn’t find a historical 
> rationale for in the archives.
> Most modern high-perf ring buffers (e.g. some NIC drivers, some DB queue 
> implementations, etc.) eliminate wrap-around branches by taking the ring’s 
> element array and mapping two consecutive VA ranges to the same physical 
> backing pages.
> i.e. you allocate N elements, commit enough pages to cover N, then call mmap 
> (or equivalent) again immediately following it, pointing to the same physical 
> pages. So from the CPU’s POV the element array is logically [0 .. N*2) but 
> physically it’s the same backing. Therefore a batch read/write can always be 
> done as a single contiguous memcpy/CLD/STOS without conditionals, even if 
> (head+bulk) exceeds N.
> Pseudo illustration:
> 
> [phys buffer of N slots]
> VA: [0 .. N) -> phys
> VA: [N .. 2N) -> same phys
> 
> 
> For multi-element enqueue/dequeue it eliminates the “if wrap → split” case 
> entirely — you can always memcpy in one contiguous op.
> Question:
> Is there an explicit reason DPDK doesn’t use this technique for rte_ring?
> e.g.

Manipulating process mapping in user space is often non-portable, it is 
possible on Linux to use mmap
to do this but would require deep changes to the API.

Reply via email to