On Sun, 19 May 2024 19:10:30 +0200 Morten Brørup <m...@smartsharesystems.com> wrote:
> Absolutely; whenever possible, local counters should be maintained inside the > loop, and added to the public counters at the end of a loop. > > Please note that application counters might be spread all over memory. > E.g. per-flow in a flow structure, per QoS class in a QoS class structure, > per subscriber in a subscriber structure, etc. And a burst of packets might > touch multiple of these. My point is: Atomic read-modify-write of counters > will cause serious stalling, waiting for memory access If an application needs to keep up at DPDK possible speeds, then it needs to worry about its cache access patterns. Last time I checked handling 10G 64 byte packets at line rate without loss means a maximum of 2 cache misses. Very hard to do with any non trivial application. Also, SW QoS above 1G is very hard to do with modern CPU's. Especially with multiple flows. It maybe possible with something like FQ Codel which only keeps small amount of state.