Below is NVIDIA Mellanox's roadmap for DPDK21.11, on which we are currently
working:
ethdev new APIs:
===============
[1] Introduce an optimization in memory/performance for the case of scaled-up
interfaces.
Motivation: An application (e.g. OVS) polls all representors
queues. Each queue contains descriptors, and each descriptor is utilizing
mbufs. As the number of interfaces grows (e.g. 1k Scalable Functions(SFs) ),
the memory footprint grows dramatically (#queues X depth_of_queue X
mbufs_memory X 1k ports), and CPU usage becomes inefficient, due to cache
evictions between the queue contexts. The new optimization will aggregate the
queues into a single one. It will reduce the number of entities to poll as well
as reduce the memory footprint, allowing streamlined and efficient processing
with much less cache evictions.
rte_flow new APIs:
================
[2] Extend rte_flow api to support the definition of flexible parsers.
Motivation: NVIDIA Mellanox NICs supports flexible parser
configuration, and we've made use of that capability within the mlx5 PMD
before. Now we are exposing an API to allow applications to configure the NIC
to support matching over custom/non-supported protocol. With that configuration
done, matching can be applied to traffic using that protocol.
mlx5 PMD updates:
==================
mlx5 PMD will support the rte_flow update changes listed above and below
[3]Extend mlx5 PMD capability to support up to 512 interfaces(VFs,SFs)
Motivation: Allow applications like VDPA to utilize larger number
of interfaces. Another example would be in the DPU in which hundreds of
applications can be supported using SFs
rte_mempool updates:
===================
[4] Improve memory registration and sharing between drivers
Motivation: In a Data Processing Unit (DPU) environment, there's
a need to share data between the host memory and the DPU/arm memory to
facilitate fast data transfer of different drivers like regex and network that
operates on the same physical device. For that, we are refactoring the memory
registration and sharing method so that the memory region registration will be
abstracted through that method (not left for each driver to do) which will
enable sharing of a memory region between host and DPU/arm memory subset.
Together with this change, wewill also optimize the huge page initialization
and cross NUMA memory registration to speed up application start-up time.
testpmd updates:
================
testpmd updated to support the changes listed above