On 13/04/2016 17:03, Thomas Monjalon wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788

+1

> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
>       - numa node
>       - page size
>       - swappable or not
>       - contiguous (cannot be guaranteed) or not
>       - physical address (as root only)

I think this ties up with the different command line options related to 
memory.
We have 3 choices:
1) no option : allocate all free hugepages in the system.
     Read free hugepages from sysfs (possible race conditions if 
multiple mount points
     for the same page size). We also need to account for a limit on the 
hugetlbfs mount,
     plus if we have a cgroup it looks like we have no other way than 
handle SIGBUS signal
     to deal with the fact that we may succeed allocating the hugepages 
even though
     they are not pre-faulted (this happens with MAP_POPULATE option too).
2) -m : allocate as much memory regardless of the numa node.
3) --socket-mem  : allocate memory per numa node.

At the moment we are not able to specify how much memory of a given page 
size we
want to allocate.

So would be provide contiguous memory as an option changing default 
behavior?

> Then the drivers or other libraries use the memory through
>       - rte_malloc
>       - rte_memzone
>       - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.

So the other bit we need to remember is the memory for the hardware queues.
There is already an API in ethdev rte_eth_dma_zone_reserve() which I 
think would
make sense to move to EAL so the memory allocator can guarantee contig 
memory
transparently for the cases that we may have memory of different 
hugepage sizes.

> If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
> and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
> The Xen support should also be better integrated.

CONFIG_RTE_LIBRTE_IVSHMEM should probably be a runtime option and
CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS could likely be removed once we have a
single mmap file for hugepages.

> Currently, the first class of API is directly implemented as command line
> parameters. Please let's think of C functions first.
> The EAL parameters should simply wrap some API functions and let the
> applications tune the memory initialization with a well documented API.
>
> Probably that I forget some needs, e.g. for the secondary processes.
> Please comment.

Regards,
Sergio

Reply via email to