> -----Original Message----- > From: Mattias Rönnblom <[email protected]> > Sent: Wednesday, October 27, 2021 12:42 PM > To: Van Haaren, Harry <[email protected]>; Thomas Monjalon > <[email protected]>; Aman Kumar <[email protected]> > Cc: [email protected]; [email protected]; Burakov, Anatoly > <[email protected]>; Song, Keesang <[email protected]>; > [email protected]; Ananyev, Konstantin <[email protected]>; > Richardson, Bruce <[email protected]>; > [email protected]; Ruifeng Wang <[email protected]>; > David Christensen <[email protected]>; [email protected]; > [email protected] > Subject: Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy > support for AMD platform > > On 2021-10-27 13:03, Van Haaren, Harry wrote: > >> -----Original Message-----
<snip> Hi Mattias, > > 6) What is the use-case for this? When would a user *want* to use this > > instead > of rte_memcpy()? > > If the data being loaded is relevant to datapath/packets, presumably other > packets might require the > > loaded data, so temporal (normal) loads should be used to cache the source > data? > > > I'm not sure if your first question is rhetorical or not, but a memcpy() > in a NT variant is certainly useful. One use case for a memcpy() with > temporal loads and non-temporal stores is if you need to archive packet > payload for (distant, potential) future use, and want to avoid causing > unnecessary LLC evictions while doing so. Yes I agree that there are certainly benefits in using cache-locality hints. There is an open question around if the src or dst or both are non-temporal. In the implementation of this patch, the NT/T type of store is reversed from your use-case: 1) Loads are NT (so loaded data is not cached for future packets) 2) Stores are T (so copied/dst data is now resident in L1/L2) In theory there might even be valid uses for this type of memcpy where loaded data is not needed again soon and stored data is referenced again soon, although I cannot think of any here while typing this mail.. I think some use-case examples, and clear documentation on when/how to choose between rte_memcpy() or any (potential future) rte_memcpy_nt() variants is required to progress this patch. Assuming a strong use-case exists, and it can be clearly indicators to users of DPDK APIs which rte_memcpy() to use, we can look at technical details around enabling the implementation. -Harry <snip remaining points>

