On Thu, 2 May 2024 15:12:43 +0100
Ferruh Yigit <ferruh.yi...@amd.com> wrote:

> I am not referring multiple core sharing a queue, this is wrong for DPDK.
> 
> For single core case, if a variable doesn't have 'volatile' qualifier
> and load/stores are not atomic, as compiler is not aware that there may
> be other threads will access this value, what prevents compiler to hold
> stats value in a register and store it memory at later stage, let say at
> the end of the application forwarding loop?

No other driver does this; it is not a problem since once rx/tx burst
function returns, there is no state the compiler is aware of.


> For this case keeping 'volatile' qualifier works. But Mattias has a
> suggestion to use "normal load + normal add + atomic store" in datapath.
> 
> 
> For getting stats, which will be in different thread, you are already
> casting it to 'volatile' to enforce compiler read from memory each time.
> 
> 
> Additionally when we take stats reset into account, which can happen in
> different thread, 'volatile' is not enough.
> For reliable stats reset, either we need to use full atomics, or offset
> logic which I am trying to in other patch.

If you want to attack the problem in the general case, there should be
standard set of functions for handling per-core stats.  If you look at
Linux kernel you will see macros around this function:

netdev_core_stats_inc((struct net_device *dev, u32 offset) {
        struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
        unsigned long __percpu *field;
...
        field = (__force unsigned long __percpu *)((__force void *)p + offset);
        this_cpu_inc(*field);
}

The expansion of this_cpu_inc is nested but ends up being:
        *field += 1;

No volatile is needed.

The conclusion I reach from this is:
        - atomic is not needed for per-cpu data. See "volatile considered 
harmful"
          using a form of compiler barrier is useful but should be hidden in 
generic code.
        - what is the state of the better per-lcore patches?
          that should be used here.
        - need a more generic percpu stats infrastructure
        - ethedev needs to have a generic percpu infra and not reinvented in 
xdp, tap, packet, etc.


Reply via email to