On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Brørup wrote: > A flush threshold for the mempool cache was introduced in DPDK version > 1.3, but rte_mempool_do_generic_get() was not completely updated back > then, and some inefficiencies were introduced. > > This patch fixes the following in rte_mempool_do_generic_get(): > > 1. The code that initially screens the cache request was not updated > with the change in DPDK version 1.3. > The initial screening compared the request length to the cache size, > which was correct before, but became irrelevant with the introduction of > the flush threshold. E.g. the cache can hold up to flushthresh objects, > which is more than its size, so some requests were not served from the > cache, even though they could be. > The initial screening has now been corrected to match the initial > screening in rte_mempool_do_generic_put(), which verifies that a cache > is present, and that the length of the request does not overflow the > memory allocated for the cache. > > 2. The function is a helper for rte_mempool_generic_get(), so it must > behave according to the description of that function. > Specifically, objects must first be returned from the cache, > subsequently from the ring. > After the change in DPDK version 1.3, this was not the behavior when > the request was partially satisfied from the cache; instead, the objects > from the ring were returned ahead of the objects from the cache. This is > bad for CPUs with a small L1 cache, which benefit from having the hot > objects first in the returned array. (This is also the reason why > the function returns the objects in reverse order.) > Now, all code paths first return objects from the cache, subsequently > from the ring. > > 3. If the cache could not be backfilled, the function would attempt > to get all the requested objects from the ring (instead of only the > number of requested objects minus the objects available in the ring), > and the function would fail if that failed. > Now, the first part of the request is always satisfied from the cache, > and if the subsequent backfilling of the cache from the ring fails, only > the remaining requested objects are retrieved from the ring. > > 4. The code flow for satisfying the request from the cache was slightly > inefficient: > The likely code path where the objects are simply served from the cache > was treated as unlikely. Now it is treated as likely. > And in the code path where the cache was backfilled first, numbers were > added and subtracted from the cache length; now this code path simply > sets the cache length to its final value. > > 5. Some comments were not correct anymore. > The comments have been updated. > Most importanly, the description of the succesful return value was > inaccurate. Success only returns 0, not >= 0. > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > ---
I am a little uncertain about the reversing of the copies taking things out of the mempool - for machines where we are not that cache constrainted will we lose out in possible optimizations where the compiler optimizes the copy loop as a memcpy? Otherwise the logic all looks correct to me. /Bruce