> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Monday, 20 May 2024 00.49 > > On Sun, 19 May 2024 19:10:30 +0200 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > Absolutely; whenever possible, local counters should be maintained > inside the loop, and added to the public counters at the end of a loop. > > > > Please note that application counters might be spread all over memory. > > E.g. per-flow in a flow structure, per QoS class in a QoS class > structure, per subscriber in a subscriber structure, etc. And a burst of > packets might touch multiple of these. My point is: Atomic read-modify- > write of counters will cause serious stalling, waiting for memory access > > If an application needs to keep up at DPDK possible speeds, then it > needs to worry about its > cache access patterns. Last time I checked handling 10G 64 byte packets > at line rate without loss > means a maximum of 2 cache misses. Very hard to do with any non trivial > application.
Yes, very hard. Which is why I insist that counters must have the absolutely highest possible performance in the fast path. Non-trivial applications are likely to maintain many instances of application specific counters. I consider this patch in the series as the EAL library generic 64 bit counters, and for application use too. So I am reviewing with a much broader perspective than just SW drivers. The SW drivers' use of these counters is not only an improvement of those drivers, it's also an excellent reference use case showing how to use this new EAL 64 bit counters library. > > Also, SW QoS above 1G is very hard to do with modern CPU's. Especially > with multiple flows. > It maybe possible with something like FQ Codel which only keeps small > amount of state. Yep. Designing software for high performance is all about using optimal algorithms! :-)