On Wed, 28 Jan 2026 09:30:20 -0800 Stephen Hemminger <[email protected]> wrote:
> Implement the single/dual/quad loop design pattern from FD.IO VPP to > improve cache efficiency in the af_packet PMD receive path. > > The original implementation processes packets one at a time in a simple > loop, which can result in cache misses when accessing frame headers and > packet data. The new implementation: > > - Processes packets in batches of 4 (quad), 2 (dual), and 1 (single) > - Prefetches next batch of frame headers while processing current batch > - Prefetches packet data before memcpy to hide memory latency > - Reduces loop overhead through partial unrolling > > Two helper functions are introduced: > - af_packet_get_frame(): Returns frame pointer at index with wraparound > - af_packet_rx_one(): Common per-packet processing (mbuf alloc, memcpy, > VLAN handling, timestamp offload) > > The quad loop checks availability of all 4 frames before processing, > falling through to dual/single loops when fewer frames are ready. Early > exit paths (out_advance1/2/3) ensure correct frame index tracking when > mbuf allocation fails mid-batch. > > Prefetch strategy: > - Frame headers: prefetch N+4..N+7 while processing N..N+3 > - Packet data: prefetch at tp_mac offset before memcpy > > This pattern is well-established in high-performance packet processing > and should improve throughput by better utilizing CPU cache hierarchy, > particularly beneficial when processing bursts of packets. > > Signed-off-by: Stephen Hemminger <[email protected]> This and previous proposal to prefetch have no impact on performance. Rolled a simple perf test and all three versions come out the same. The bottleneck is not here, probably at system call and copies now. Original Prefetch Quad/Dual TX 1.427 Mpps 1.426 Mpps 1.426 Mpps RX 0.529 Mpps 0.530 Mpps 0.533 Mpps loss 87.93% 87.98% 88.0% Original Prefetch Quad/Dual TX 1.427 Mpps 1.426 Mpps 1.426 Mpps RX 0.529 Mpps 0.530 Mpps 0.533 Mpps loss 87.93% 87.98% 88.0% Will put the test in the next version of this series, and drop this patch.

