On Thu, 2 May 2024 15:12:43 +0100 Ferruh Yigit <ferruh.yi...@amd.com> wrote:
> I am not referring multiple core sharing a queue, this is wrong for DPDK. > > For single core case, if a variable doesn't have 'volatile' qualifier > and load/stores are not atomic, as compiler is not aware that there may > be other threads will access this value, what prevents compiler to hold > stats value in a register and store it memory at later stage, let say at > the end of the application forwarding loop? No other driver does this; it is not a problem since once rx/tx burst function returns, there is no state the compiler is aware of. > For this case keeping 'volatile' qualifier works. But Mattias has a > suggestion to use "normal load + normal add + atomic store" in datapath. > > > For getting stats, which will be in different thread, you are already > casting it to 'volatile' to enforce compiler read from memory each time. > > > Additionally when we take stats reset into account, which can happen in > different thread, 'volatile' is not enough. > For reliable stats reset, either we need to use full atomics, or offset > logic which I am trying to in other patch. If you want to attack the problem in the general case, there should be standard set of functions for handling per-core stats. If you look at Linux kernel you will see macros around this function: netdev_core_stats_inc((struct net_device *dev, u32 offset) { struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); unsigned long __percpu *field; ... field = (__force unsigned long __percpu *)((__force void *)p + offset); this_cpu_inc(*field); } The expansion of this_cpu_inc is nested but ends up being: *field += 1; No volatile is needed. The conclusion I reach from this is: - atomic is not needed for per-cpu data. See "volatile considered harmful" using a form of compiler barrier is useful but should be hidden in generic code. - what is the state of the better per-lcore patches? that should be used here. - need a more generic percpu stats infrastructure - ethedev needs to have a generic percpu infra and not reinvented in xdp, tap, packet, etc.