On Wed, Apr 24, 2024 at 11:21:52AM +0100, Ferruh Yigit wrote:
> On 4/23/2024 9:56 PM, Mattias Rönnblom wrote:
> > On 2024-04-23 13:15, Ferruh Yigit wrote:
> >> On 4/23/2024 10:08 AM, Mattias Rönnblom wrote:
> >>> Cache align Rx and Tx queue struct to avoid false sharing.
> >>>
> >>> RX struct happens to be 64 bytes on x86_64 already, so cache
> >>> alignment makes no change there, but it does on 32-bit ISAs.
> >>>
> >>> TX struct is 56 bytes on x86_64.
> >>>
> >>
> >> Hi Mattias,
> >>
> >> No objection to the patch. Is the improvement theoretical or do you
> >> measure any improvement practically, if so how much is the
> >> improvement?
> >>
> > 
> > I didn't run any benchmarks.
> > 
> > Two cores storing to a (falsely) shared cache line on a per-packet
> > basis is going to be very expensive, at least for "light touch"
> > applications.
> > 
> 
> ack I expect for af_packet bottleneck is the kernel side, so I don't
> expect any visible improvement practically, but OK to fix this
> theoretical issue.
> 
> >>> Both structs keep counters, and in the RX case they are updated even
> >>> for empty polls.
> >>>
> >>
> >> Do you think does it help if move 'rx_pkts' & 'rx_bytes' update within
> >> the loop?
> >>
> > 
> > No, why? Wouldn't that be worse? Especially since rx_pkts and rx_bytes
> > are declared volatile, so you are forcing a load-modify-store cycle for
> > every increment.
> > 
> 
> My intention was to prevent updating stats in empty polls, I thought
> stats will be hot in the cache but won't work with volatile.
> 
Yes, it will. Volatile only prevents caching in registers, it does not
affect the storing of data within the cache hierarchy. Reads/writes of
stats on empty polls should indeed hit the L1 as expected. However, that is
still less efficient than just doing a register increment which could
theoretically be the result without the volatile.

/Bruce

Reply via email to