Hi Olivier,

On 23/5/2016 1:55 PM, Olivier Matz wrote:
> Hi David,
>
> Please find some comments below.
>
> On 05/19/2016 04:48 PM, David Hunt wrote:
>> This is a mempool handler that is useful for pipelining apps, where
>> the mempool cache doesn't really work - example, where we have one
>> core doing rx (and alloc), and another core doing Tx (and return). In
>> such a case, the mempool ring simply cycles through all the mbufs,
>> resulting in a LLC miss on every mbuf allocated when the number of
>> mbufs is large. A stack recycles buffers more effectively in this
>> case.
>>
>> v2: cleanup based on mailing list comments. Mainly removal of
>> unnecessary casts and comments.
>>
>> Signed-off-by: David Hunt <david.hunt at intel.com>
>> ---
>>   lib/librte_mempool/Makefile            |   1 +
>>   lib/librte_mempool/rte_mempool_stack.c | 145 
>> +++++++++++++++++++++++++++++++++
>>   2 files changed, 146 insertions(+)
>>   create mode 100644 lib/librte_mempool/rte_mempool_stack.c
>>
>> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
>> index f19366e..5aa9ef8 100644
>> --- a/lib/librte_mempool/Makefile
>> +++ b/lib/librte_mempool/Makefile
>> @@ -44,6 +44,7 @@ LIBABIVER := 2
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
>>   SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
>> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_stack.c
>>   # install includes
>>   SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>>   
>> diff --git a/lib/librte_mempool/rte_mempool_stack.c 
>> b/lib/librte_mempool/rte_mempool_stack.c
>> new file mode 100644
>> index 0000000..6e25028
>> --- /dev/null
>> +++ b/lib/librte_mempool/rte_mempool_stack.c
>> @@ -0,0 +1,145 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
> Should be 2016?

Yup, will change.

>> ...
>> +
>> +static void *
>> +common_stack_alloc(struct rte_mempool *mp)
>> +{
>> +    struct rte_mempool_common_stack *s;
>> +    unsigned n = mp->size;
>> +    int size = sizeof(*s) + (n+16)*sizeof(void *);
>> +
>> +    /* Allocate our local memory structure */
>> +    s = rte_zmalloc_socket("common-stack",
> "mempool-stack" ?

Yes. Also, I thing the names of the function should be changed from 
common_stack_x to simply stack_x. The "common_" does not add anything.

>> +                    size,
>> +                    RTE_CACHE_LINE_SIZE,
>> +                    mp->socket_id);
>> +    if (s == NULL) {
>> +            RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n");
>> +            return NULL;
>> +    }
>> +
>> +    rte_spinlock_init(&s->sl);
>> +
>> +    s->size = n;
>> +    mp->pool = s;
>> +    rte_mempool_set_handler(mp, "stack");
> rte_mempool_set_handler() is a user function, it should be called here

Sure, removed.

>> +
>> +    return s;
>> +}
>> +
>> +static int common_stack_put(void *p, void * const *obj_table,
>> +            unsigned n)
>> +{
>> +    struct rte_mempool_common_stack *s = p;
>> +    void **cache_objs;
>> +    unsigned index;
>> +
>> +    rte_spinlock_lock(&s->sl);
>> +    cache_objs = &s->objs[s->len];
>> +
>> +    /* Is there sufficient space in the stack ? */
>> +    if ((s->len + n) > s->size) {
>> +            rte_spinlock_unlock(&s->sl);
>> +            return -ENOENT;
>> +    }
> The usual return value for a failing put() is ENOBUFS (see in rte_ring).

Done.

> After reading it, I realize that it's nearly exactly the same code than
> in "app/test: test external mempool handler".
> http://patchwork.dpdk.org/dev/patchwork/patch/12896/
>
> We should drop one of them. If this stack handler is really useful for
> a performance use-case, it could go in librte_mempool. At the first
> read, the code looks like a demo example : it uses a simple spinlock for
> concurrent accesses to the common pool. Maybe the mempool cache hides
> this cost, in this case we could also consider removing the use of the
> rte_ring.

Unlike the code in the test app, the stack handler does not use a ring. 
This is for the
case where applications do a lot of core-to-core transfers of mbufs. The 
test app was
simply to demonstrate a simple example of a malloc mempool handler. This 
patch adds
a new lifo handler for general use.

Using the mempool_perf_autotest, I see a 30% increase in throughput when
local cache is enabled/used.  However, there is up to a 50% degradation 
when local cache
is NOT used, so it's not usable in all situations. However, with a 30% 
gain for the cache
use-case, I think it's worth having in there as an option for people to 
try if the use-case suits.


> Do you have some some performance numbers? Do you know if it scales
> with the number of cores?

30% gain when local cache is used. And these numbers scale up with the
number of cores on my test machine. It may be better for other use cases.

> If we can identify the conditions where this mempool handler
> overperforms the default handler, it would be valuable to have them
> in the documentation.
>

I could certainly add this to the docs, and mention the recommendation to
use local cache.

Regards,
Dave.


Reply via email to