On Tue, 30 Jun 2026 14:49:45 +0200
Maxime Leroy <[email protected]> wrote:

> posts to the store buffer while the line fills in the background, and
> the rewritten fields are forwarded straight from there. buf_addr has
> nothing to forward, so it must be read from the line, whose fill is
> still in flight, and the read stalls. The ethertype read that follows,
> on the cold payload line, stalls again. Read later by the application,
> when the fill has completed, the same read hits. The offload just
> performs it at the worst possible moment.
> 
> Measured on a single-core port-to-port forwarding test over two 10G
> ports (one core at 2 GHz, 64-byte untagged frames):
> 
>   - throughput 4.22 -> 5.00 Mpps (+18 percent)
>   - IPC 0.93 -> 1.25: the cost was memory stall, not compute
>   - L3/DRAM-bound L2 refills 319M -> 200M over 10s (-37 percent)
> 
> perf confirms it: with the offload, the buf_addr load (the cold mbuf
> field) and the payload load account for about 84 percent of the Rx
> burst's L2 refills; removing it, those vanish and only the inherent DQRR
> dequeue misses remain.
> 
> Stop advertising VLAN_STRIP and remove the rte_vlan_strip() calls from
> every Rx path. This is a behavioural change: the tag is left in the
> frame, so an application must strip it itself, on the L2 header it
> already reads.
> 
> Signed-off-by: Maxime Leroy <[email protected]>
> Acked-by: Morten Brørup <[email protected]>
> Acked-by: Stephen Hemminger <[email protected]>
> Acked-by: Hemant Agrawal <[email protected]>

Applied to next-net

Reply via email to