> -----Original Message----- > From: Honnappa Nagarahalli [mailto:[email protected]] > Sent: Monday, April 1, 2019 12:41 PM > To: Eads, Gage <[email protected]>; [email protected] > Cc: [email protected]; [email protected]; Richardson, Bruce > <[email protected]>; Ananyev, Konstantin > <[email protected]>; Gavin Hu (Arm Technology China) > <[email protected]>; nd <[email protected]>; [email protected]; nd > <[email protected]> > Subject: RE: [PATCH v3 1/8] stack: introduce rte stack library > > > > > > > +static ssize_t > > > > +rte_stack_get_memsize(unsigned int count) { > > > > + ssize_t sz = sizeof(struct rte_stack); > > > > + > > > > + /* Add padding to avoid false sharing conflicts */ > > > > + sz += RTE_CACHE_LINE_ROUNDUP(count * sizeof(void *)) + > > > > + 2 * RTE_CACHE_LINE_SIZE; > > > I did not understand how the false sharing is caused and how this > > > padding is solving the issue. Verbose comments would help. > > > > The additional padding (beyond the CACHE_LINE_ROUNDUP) is to prevent > > false sharing caused by adjacent/next-line hardware prefetchers. I'll > > address this. > > > Is it not a generic problem? Or is it specific to this library?
This is not limited to this library, but it only affects systems with (enabled) next-line prefetchers, for example Intel products with an L2 adjacent cache line prefetcher[1]. For those systems, additional padding can potentially improve performance. As I understand it, this was the reason behind the 128B alignment added to rte_ring a couple years ago[2]. [1] https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors [2] http://mails.dpdk.org/archives/dev/2017-February/058613.html

