> > What has been your experience of using DPDK based app's in NUMA mode > with multiple sockets where some cores are present on one socket and > other cores on some other socket. > > I am migrating my application from one intel machine with 8 cores, all in > one socket to a 32 core machine where 16 cores are in one socket and 16 > other cores in the second socket. > My core 0 does all initialization for mbuf's, nic ports, queues etc. and uses > SOCKET_ID_ANY for socket related parameters.
It is recommended that you decide ahead of time on what cores on what numa socket different parts of your application are going to run, and then set up your objects in memory appropriately. SOCKET_ID_ANY should only be used to allocate items that are not for use in the data-path and for which you therefore don't care about access time. Any objects for rings or mempools should be created by specifying the correct socket to allocate the memory on. If you are working using two sockets, in some cases you may want to duplicate your data structures, for example, use two memory pools - one on each socket - instead of one, so that all data access is local. > > The usecase works, but I think I am running into performance issues on the > 32 core machine. > The lscpu output on my 32 core machine shows the following - NUMA > node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 > I am using core 1 to lift all the data from a single queue of an 82599EB port > and I see that the cpu utilization for this core 1 is way too high even for > lifting traffic of 1 Gbps with packet size of 650 bytes. How are you measuring the cpu utilization, because when using the Intel DPDK in most cases your cpu utilization will always be 100% as you are constantly polling? Therefore actual cpu headroom can be hard to judge at times. Another thing to consider is the numa nodes to which your NICs are connected. You can check using the rte_eth_dev_socket_id() what numa socket your NIC is connected to - assuming a modern platform where the PCI connects straight to the CPUs. Whatever numa node that is connected to, you want to run the code for polling the NIC RX queues on that numa node, and do all packet transmission using cores on that NUMA node. > > In general, does one need to be careful in working with multiple sockets and > so forth, any comments would be helpful. In general, yes, you need to be a bit more careful, but the basic rules as outlined above should give you a good start.