On Wed, Apr 24, 2024 at 11:21:52AM +0100, Ferruh Yigit wrote: > On 4/23/2024 9:56 PM, Mattias Rönnblom wrote: > > On 2024-04-23 13:15, Ferruh Yigit wrote: > >> On 4/23/2024 10:08 AM, Mattias Rönnblom wrote: > >>> Cache align Rx and Tx queue struct to avoid false sharing. > >>> > >>> RX struct happens to be 64 bytes on x86_64 already, so cache > >>> alignment makes no change there, but it does on 32-bit ISAs. > >>> > >>> TX struct is 56 bytes on x86_64. > >>> > >> > >> Hi Mattias, > >> > >> No objection to the patch. Is the improvement theoretical or do you > >> measure any improvement practically, if so how much is the > >> improvement? > >> > > > > I didn't run any benchmarks. > > > > Two cores storing to a (falsely) shared cache line on a per-packet > > basis is going to be very expensive, at least for "light touch" > > applications. > > > > ack I expect for af_packet bottleneck is the kernel side, so I don't > expect any visible improvement practically, but OK to fix this > theoretical issue. > > >>> Both structs keep counters, and in the RX case they are updated even > >>> for empty polls. > >>> > >> > >> Do you think does it help if move 'rx_pkts' & 'rx_bytes' update within > >> the loop? > >> > > > > No, why? Wouldn't that be worse? Especially since rx_pkts and rx_bytes > > are declared volatile, so you are forcing a load-modify-store cycle for > > every increment. > > > > My intention was to prevent updating stats in empty polls, I thought > stats will be hot in the cache but won't work with volatile. > Yes, it will. Volatile only prevents caching in registers, it does not affect the storing of data within the cache hierarchy. Reads/writes of stats on empty polls should indeed hit the L1 as expected. However, that is still less efficient than just doing a register increment which could theoretically be the result without the volatile.
/Bruce