Hi, Thank you everyone for the comments! I am currently working on making the global pool ring’s implementation as index based. Once done, I will send a patch for community review. I will also make it as a compile time option.
> On Oct 31, 2021, at 3:14 AM, Morten Brørup <m...@smartsharesystems.com> wrote: > >> From: Morten Brørup >> Sent: Saturday, 30 October 2021 12.24 >> >>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Honnappa >>> Nagarahalli >>> Sent: Monday, 4 October 2021 18.36 >>> >>> <snip> >>>> >>>> >>>>>>> Current mempool per core cache implementation is based on >>> pointer >>>>>>> For most architectures, each pointer consumes 64b Replace it >>> with >>>>>>> index-based implementation, where in each buffer is addressed >>> by >>>>>>> (pool address + index) >> >> I like Dharmik's suggestion very much. CPU cache is a critical and >> limited resource. >> >> DPDK has a tendency of using pointers where indexes could be used >> instead. I suppose pointers provide the additional flexibility of >> mixing entries from different memory pools, e.g. multiple mbuf pools. >> Agreed, thank you! >>>>>> >>>>>> I don't think it is going to work: >>>>>> On 64-bit systems difference between pool address and it's elem >>>>>> address could be bigger than 4GB. >>>>> Are you talking about a case where the memory pool size is more >>> than 4GB? >>>> >>>> That is one possible scenario. >> >> That could be solved by making the index an element index instead of a >> pointer offset: address = (pool address + index * element size). > > Or instead of scaling the index with the element size, which is only known at > runtime, the index could be more efficiently scaled by a compile time > constant such as RTE_MEMPOOL_ALIGN (= RTE_CACHE_LINE_SIZE). With a cache line > size of 64 byte, that would allow indexing into mempools up to 256 GB in size. > Looking at this snippet [1] from rte_mempool_op_populate_helper(), there is an ‘offset’ added to avoid objects to cross page boundaries. If my understanding is correct, using the index of element instead of a pointer offset will pose a challenge for some of the corner cases. [1] for (i = 0; i < max_objs; i++) { /* avoid objects to cross page boundaries */ if (check_obj_bounds(va + off, pg_sz, total_elt_sz) < 0) { off += RTE_PTR_ALIGN_CEIL(va + off, pg_sz) - (va + off); if (flags & RTE_MEMPOOL_POPULATE_F_ALIGN_OBJ) off += total_elt_sz - (((uintptr_t)(va + off - 1) % total_elt_sz) + 1); } >> >>>> Another possibility - user populates mempool himself with some >>> external >>>> memory by calling rte_mempool_populate_iova() directly. >>> Is the concern that IOVA might not be contiguous for all the memory >>> used by the mempool? >>> >>>> I suppose such situation can even occur even with normal >>>> rte_mempool_create(), though it should be a really rare one. >>> All in all, this feature needs to be configurable during compile >> time. >