On 2024-04-24 21:13, Stephen Hemminger wrote:
On Wed, 24 Apr 2024 18:50:50 +0100
Ferruh Yigit <ferruh.yi...@amd.com> wrote:

I don't know how slow af_packet is, but if you care about performance,
you don't want to use atomic add for statistics.

There are a few soft drivers already using atomics adds for updating stats.
If we document expectations from 'rte_eth_stats_reset()', we can update
those usages.

Using atomic add is lots of extra overhead. The statistics are not guaranteed
to be perfect.  If nothing else, the bytes and packets can be skewed.


The sad thing here is that in case the counters are reset within the load-modify-store cycle of the lcore counter update, the reset may end up being a nop. So, it's not like you missed a packet or two, or suffer some transient inconsistency, but you completed and permanently ignored the reset request.

The soft drivers af_xdp, af_packet, and tun performance is dominated by the
overhead of the kernel system call and copies. Yes, alignment is good
but won't be noticeable.

There aren't any syscalls in the RX path in the af_packet PMD.

I added the same statistics updates as the af_packet PMD uses into an benchmark app which consumes ~1000 cc in-between stats updates.

If the equivalent of the RX queue struct was cache aligned, the statistics overhead was so small it was difficult to measure. Less than 3-4 cc per update. This was with volatile, but without atomics.

If the RX queue struct wasn't cache aligned, and sized so a cache line generally was used by two (neighboring) cores, the stats incurred a cost of ~55 cc per update.

Shaving off 55 cc should translate to a couple of hundred percent increased performance for an empty af_packet poll. If your lcore has some other primary source of work than the af_packet RX queue, and the RX queue is polled often, then this may well be a noticeable gain.

The benchmark was run on 16 Gracemont cores, which in my experience seems to have a little shorter core-to-core latency than many other systems, provided the remote core/cache line owner is located in the same cluster.

Reply via email to