Re: [dpdk-dev] [RFC] mempool: implement index-based per core cache

Dharmik Thakkar Wed, 03 Nov 2021 21:42:44 -0700


> On Nov 3, 2021, at 10:52 AM, Morten Brørup <m...@smartsharesystems.com> wrote:
> 
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Dharmik Thakkar
>> Sent: Wednesday, 3 November 2021 16.13
>> 
>> Hi,
>> 
>> Thank you everyone for the comments! I am currently working on making
>> the global pool ring’s implementation as index based.
>> Once done, I will send a patch for community review. I will also make
>> it as a compile time option.
> 
> Sounds good to me.
> 
> This could probably be abstracted to other libraries too. E.g. the ring 
> library holds pointers to objects (void *); an alternative ring library could 
> hold indexes to objects (uint32_t). A ring often holds objects from the same 
> mempool, and the application knows which mempool, so indexing would be useful 
> here too.
>


Yes, ring library within DPDK has the APIs to support configurable element size

>> 
>>> On Oct 31, 2021, at 3:14 AM, Morten Brørup <m...@smartsharesystems.com>
>> wrote:
>>> 
>>>> From: Morten Brørup
>>>> Sent: Saturday, 30 October 2021 12.24
>>>> 
>>>>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Honnappa
>>>>> Nagarahalli
>>>>> Sent: Monday, 4 October 2021 18.36
>>>>> 
>>>>> <snip>
>>>>>> 
>>>>>> 
>>>>>>>>> Current mempool per core cache implementation is based on
>>>>> pointer
>>>>>>>>> For most architectures, each pointer consumes 64b Replace it
>>>>> with
>>>>>>>>> index-based implementation, where in each buffer is addressed
>>>>> by
>>>>>>>>> (pool address + index)
>>>> 
>>>> I like Dharmik's suggestion very much. CPU cache is a critical and
>>>> limited resource.
>>>> 
>>>> DPDK has a tendency of using pointers where indexes could be used
>>>> instead. I suppose pointers provide the additional flexibility of
>>>> mixing entries from different memory pools, e.g. multiple mbuf
>> pools.
>>>> 
>> 
>> Agreed, thank you!
>> 
>>>>>>>> 
>>>>>>>> I don't think it is going to work:
>>>>>>>> On 64-bit systems difference between pool address and it's elem
>>>>>>>> address could be bigger than 4GB.
>>>>>>> Are you talking about a case where the memory pool size is more
>>>>> than 4GB?
>>>>>> 
>>>>>> That is one possible scenario.
>>>> 
>>>> That could be solved by making the index an element index instead of
>> a
>>>> pointer offset: address = (pool address + index * element size).
>>> 
>>> Or instead of scaling the index with the element size, which is only
>> known at runtime, the index could be more efficiently scaled by a
>> compile time constant such as RTE_MEMPOOL_ALIGN (=
>> RTE_CACHE_LINE_SIZE). With a cache line size of 64 byte, that would
>> allow indexing into mempools up to 256 GB in size.
>>> 
>> 
>> Looking at this snippet [1] from rte_mempool_op_populate_helper(),
>> there is an ‘offset’ added to avoid objects to cross page boundaries.
>> If my understanding is correct, using the index of element instead of a
>> pointer offset will pose a challenge for some of the corner cases.
>> 
>> [1]
>>        for (i = 0; i < max_objs; i++) {
>>                /* avoid objects to cross page boundaries */
>>                if (check_obj_bounds(va + off, pg_sz, total_elt_sz) <
>> 0) {
>>                        off += RTE_PTR_ALIGN_CEIL(va + off, pg_sz) -
>> (va + off);
>>                        if (flags & RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ)
>>                                off += total_elt_sz -
>>                                        (((uintptr_t)(va + off - 1) %
>>                                                total_elt_sz) + 1);
>>                }
>> 
> 
> OK. Alternatively to scaling the index with a cache line size, you can scale 
> it with sizeof(uintptr_t) to be able to address 32 or 16 GB mempools on 
> respectively 64 bit and 32 bit architectures. Both x86 and ARM CPUs have 
> instructions to access memory with an added offset multiplied by 4 or 8. So 
> that should be high performance.

Yes, agreed this can be done.
Cache line size can also be used when ‘MEMPOOL_F_NO_CACHE_ALIGN’ is not enabled.
On a side note, I wanted to better understand the need for having the 
'MEMPOOL_F_NO_CACHE_ALIGN' option.

> 
>>>> 
>>>>>> Another possibility - user populates mempool himself with some
>>>>> external
>>>>>> memory by calling rte_mempool_populate_iova() directly.
>>>>> Is the concern that IOVA might not be contiguous for all the memory
>>>>> used by the mempool?
>>>>> 
>>>>>> I suppose such situation can even occur even with normal
>>>>>> rte_mempool_create(), though it should be a really rare one.
>>>>> All in all, this feature needs to be configurable during compile
>>>> time.
>>> 
>

Re: [dpdk-dev] [RFC] mempool: implement index-based per core cache

Reply via email to