On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Brørup wrote:
> A flush threshold for the mempool cache was introduced in DPDK version
> 1.3, but rte_mempool_do_generic_get() was not completely updated back
> then, and some inefficiencies were introduced.
> 
> This patch fixes the following in rte_mempool_do_generic_get():
> 
> 1. The code that initially screens the cache request was not updated
> with the change in DPDK version 1.3.
> The initial screening compared the request length to the cache size,
> which was correct before, but became irrelevant with the introduction of
> the flush threshold. E.g. the cache can hold up to flushthresh objects,
> which is more than its size, so some requests were not served from the
> cache, even though they could be.
> The initial screening has now been corrected to match the initial
> screening in rte_mempool_do_generic_put(), which verifies that a cache
> is present, and that the length of the request does not overflow the
> memory allocated for the cache.
> 
> 2. The function is a helper for rte_mempool_generic_get(), so it must
> behave according to the description of that function.
> Specifically, objects must first be returned from the cache,
> subsequently from the ring.
> After the change in DPDK version 1.3, this was not the behavior when
> the request was partially satisfied from the cache; instead, the objects
> from the ring were returned ahead of the objects from the cache. This is
> bad for CPUs with a small L1 cache, which benefit from having the hot
> objects first in the returned array. (This is also the reason why
> the function returns the objects in reverse order.)
> Now, all code paths first return objects from the cache, subsequently
> from the ring.
> 
> 3. If the cache could not be backfilled, the function would attempt
> to get all the requested objects from the ring (instead of only the
> number of requested objects minus the objects available in the ring),
> and the function would fail if that failed.
> Now, the first part of the request is always satisfied from the cache,
> and if the subsequent backfilling of the cache from the ring fails, only
> the remaining requested objects are retrieved from the ring.
> 
> 4. The code flow for satisfying the request from the cache was slightly
> inefficient:
> The likely code path where the objects are simply served from the cache
> was treated as unlikely. Now it is treated as likely.
> And in the code path where the cache was backfilled first, numbers were
> added and subtracted from the cache length; now this code path simply
> sets the cache length to its final value.
> 
> 5. Some comments were not correct anymore.
> The comments have been updated.
> Most importanly, the description of the succesful return value was
> inaccurate. Success only returns 0, not >= 0.
> 
> Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
> ---

I am a little uncertain about the reversing of the copies taking things out
of the mempool - for machines where we are not that cache constrainted will
we lose out in possible optimizations where the compiler optimizes the copy
loop as a memcpy?

Otherwise the logic all looks correct to me.

/Bruce

Reply via email to