Hi Olivier, On 23/5/2016 1:55 PM, Olivier Matz wrote: > Hi David, > > Please find some comments below. > > On 05/19/2016 04:48 PM, David Hunt wrote: >> This is a mempool handler that is useful for pipelining apps, where >> the mempool cache doesn't really work - example, where we have one >> core doing rx (and alloc), and another core doing Tx (and return). In >> such a case, the mempool ring simply cycles through all the mbufs, >> resulting in a LLC miss on every mbuf allocated when the number of >> mbufs is large. A stack recycles buffers more effectively in this >> case. >> >> v2: cleanup based on mailing list comments. Mainly removal of >> unnecessary casts and comments. >> >> Signed-off-by: David Hunt <david.hunt at intel.com> >> --- >> lib/librte_mempool/Makefile | 1 + >> lib/librte_mempool/rte_mempool_stack.c | 145 >> +++++++++++++++++++++++++++++++++ >> 2 files changed, 146 insertions(+) >> create mode 100644 lib/librte_mempool/rte_mempool_stack.c >> >> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile >> index f19366e..5aa9ef8 100644 >> --- a/lib/librte_mempool/Makefile >> +++ b/lib/librte_mempool/Makefile >> @@ -44,6 +44,7 @@ LIBABIVER := 2 >> SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool.c >> SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_handler.c >> SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_default.c >> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += rte_mempool_stack.c >> # install includes >> SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h >> >> diff --git a/lib/librte_mempool/rte_mempool_stack.c >> b/lib/librte_mempool/rte_mempool_stack.c >> new file mode 100644 >> index 0000000..6e25028 >> --- /dev/null >> +++ b/lib/librte_mempool/rte_mempool_stack.c >> @@ -0,0 +1,145 @@ >> +/*- >> + * BSD LICENSE >> + * >> + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. >> + * All rights reserved. > Should be 2016?
Yup, will change. >> ... >> + >> +static void * >> +common_stack_alloc(struct rte_mempool *mp) >> +{ >> + struct rte_mempool_common_stack *s; >> + unsigned n = mp->size; >> + int size = sizeof(*s) + (n+16)*sizeof(void *); >> + >> + /* Allocate our local memory structure */ >> + s = rte_zmalloc_socket("common-stack", > "mempool-stack" ? Yes. Also, I thing the names of the function should be changed from common_stack_x to simply stack_x. The "common_" does not add anything. >> + size, >> + RTE_CACHE_LINE_SIZE, >> + mp->socket_id); >> + if (s == NULL) { >> + RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n"); >> + return NULL; >> + } >> + >> + rte_spinlock_init(&s->sl); >> + >> + s->size = n; >> + mp->pool = s; >> + rte_mempool_set_handler(mp, "stack"); > rte_mempool_set_handler() is a user function, it should be called here Sure, removed. >> + >> + return s; >> +} >> + >> +static int common_stack_put(void *p, void * const *obj_table, >> + unsigned n) >> +{ >> + struct rte_mempool_common_stack *s = p; >> + void **cache_objs; >> + unsigned index; >> + >> + rte_spinlock_lock(&s->sl); >> + cache_objs = &s->objs[s->len]; >> + >> + /* Is there sufficient space in the stack ? */ >> + if ((s->len + n) > s->size) { >> + rte_spinlock_unlock(&s->sl); >> + return -ENOENT; >> + } > The usual return value for a failing put() is ENOBUFS (see in rte_ring). Done. > After reading it, I realize that it's nearly exactly the same code than > in "app/test: test external mempool handler". > http://patchwork.dpdk.org/dev/patchwork/patch/12896/ > > We should drop one of them. If this stack handler is really useful for > a performance use-case, it could go in librte_mempool. At the first > read, the code looks like a demo example : it uses a simple spinlock for > concurrent accesses to the common pool. Maybe the mempool cache hides > this cost, in this case we could also consider removing the use of the > rte_ring. Unlike the code in the test app, the stack handler does not use a ring. This is for the case where applications do a lot of core-to-core transfers of mbufs. The test app was simply to demonstrate a simple example of a malloc mempool handler. This patch adds a new lifo handler for general use. Using the mempool_perf_autotest, I see a 30% increase in throughput when local cache is enabled/used. However, there is up to a 50% degradation when local cache is NOT used, so it's not usable in all situations. However, with a 30% gain for the cache use-case, I think it's worth having in there as an option for people to try if the use-case suits. > Do you have some some performance numbers? Do you know if it scales > with the number of cores? 30% gain when local cache is used. And these numbers scale up with the number of cores on my test machine. It may be better for other use cases. > If we can identify the conditions where this mempool handler > overperforms the default handler, it would be valuable to have them > in the documentation. > I could certainly add this to the docs, and mention the recommendation to use local cache. Regards, Dave.