Tx structs

Mattias Rönnblom Wed, 24 Apr 2024 15:27:46 -0700

On 2024-04-24 21:13, Stephen Hemminger wrote:

On Wed, 24 Apr 2024 18:50:50 +0100
Ferruh Yigit <ferruh.yi...@amd.com> wrote:

I don't know how slow af_packet is, but if you care about performance,
you don't want to use atomic add for statistics.


There are a few soft drivers already using atomics adds for updating stats.
If we document expectations from 'rte_eth_stats_reset()', we can update
those usages.


Using atomic add is lots of extra overhead. The statistics are not guaranteed
to be perfect.  If nothing else, the bytes and packets can be skewed.

The sad thing here is that in case the counters are reset within theload-modify-store cycle of the lcore counter update, the reset may endup being a nop. So, it's not like you missed a packet or two, or suffersome transient inconsistency, but you completed and permanently ignoredthe reset request.

The soft drivers af_xdp, af_packet, and tun performance is dominated by the
overhead of the kernel system call and copies. Yes, alignment is good
but won't be noticeable.


There aren't any syscalls in the RX path in the af_packet PMD.

I added the same statistics updates as the af_packet PMD uses into anbenchmark app which consumes ~1000 cc in-between stats updates.

If the equivalent of the RX queue struct was cache aligned, thestatistics overhead was so small it was difficult to measure. Less than3-4 cc per update. This was with volatile, but without atomics.

If the RX queue struct wasn't cache aligned, and sized so a cache linegenerally was used by two (neighboring) cores, the stats incurred a costof ~55 cc per update.

Shaving off 55 cc should translate to a couple of hundred percentincreased performance for an empty af_packet poll. If your lcore hassome other primary source of work than the af_packet RX queue, and theRX queue is polled often, then this may well be a noticeable gain.

The benchmark was run on 16 Gracemont cores, which in my experienceseems to have a little shorter core-to-core latency than many othersystems, provided the remote core/cache line owner is located in thesame cluster.

Re: [PATCH] net/af_packet: cache align Rx/Tx structs

Reply via email to