RE: Question: Why doesn’t rte_ring use double-mapped VMA to eliminate wraparound logic?

Konstantin Ananyev Mon, 10 Nov 2025 04:07:59 -0800


> > Hi all,
> > I have a design question regarding rte_ring that I didn’t find a historical 
> > rationale
> for in the archives.
> > Most modern high-perf ring buffers (e.g. some NIC drivers, some DB queue
> implementations, etc.) eliminate wrap-around branches by taking the ring’s 
> element
> array and mapping two consecutive VA ranges to the same physical backing 
> pages.
> > i.e. you allocate N elements, commit enough pages to cover N, then call 
> > mmap (or
> equivalent) again immediately following it, pointing to the same physical 
> pages. So
> from the CPU’s POV the element array is logically [0 .. N*2) but physically 
> it’s the
> same backing. Therefore a batch read/write can always be done as a single
> contiguous memcpy/CLD/STOS without conditionals, even if (head+bulk) exceeds 
> N.
> > Pseudo illustration:
> >
> > [phys buffer of N slots]
> > VA: [0 .. N) -> phys
> > VA: [N .. 2N) -> same phys
> >
> >
> > For multi-element enqueue/dequeue it eliminates the “if wrap → split” case
> entirely — you can always memcpy in one contiguous op.
> > Question:
> > Is there an explicit reason DPDK doesn’t use this technique for rte_ring?
> > e.g.
> 
> Manipulating process mapping in user space is often non-portable, it is 
> possible on
> Linux to use mmap
> to do this but would require deep changes to the API.


TBH, I didn't even consider such opportunity....
But yes, I Stephen already pointed, one potential problem:
DPDK memory allocation framework (rte_malloc)
doesn't support such double mappings for now.
Would it be a good thing to add or not - I suppose it depends on what
performance boost we will get in return and what would be a required 
code-changes complexity.
Konstantin

RE: Question: Why doesn’t rte_ring use double-mapped VMA to eliminate wraparound logic?

Reply via email to