This patch adds infrastructure to the userspace datapath to defer or postpone work. At a high level, each PMD thread places work items into its own per thread work ring to be done later. The work ring is a FIFO queue of pointers to work items. Each work item has a "work_func()" function pointer allowing abstraction from what work is actually being done. More details about the infrastructure can be seen in the patch and its commit message.
The ability to defer work is necessary when considering asynchronous use-cases. The use-case this patch is targeted at is DMA offload of TX using VHOST ports. In this use-case, packets are passed to a copy engine rather than being copied in software. Once completed, the packets have to be freed and VHOST port statistics have to be updated in software. This completion work needs to be deferred. There are a number of requirements for an effective defer infrastructure. What are these and how are they accomplished: 1. Allow the thread which kicked off the DMA transfer to keep doing useful work, rather than waiting or polling for work to be completed. This is accomplished by deferring the completion work for DMA transfer rather than waiting for the DMA transfer to complete before moving on to process more packets. The completion work is added to the work ring to be done after some time, but more useful work can be done in the meantime. 2. Allow some time to pass between kicking off a DMA transfer for a VHOST port and checking for completion of the DMA transfer. This is accomplished by doing deferred work after processing all RXQs assigned to a PMD thread. 3. Upon checking for completion of the DMA transfer, allow re-deferral of work in the case where the DMA transfer has not completed. This is accomplished by adding checks in the "do_work()" function to defer the work again when DMA has not completed. This re-deferring of work helps with requirements 1 and 2. A ring buffer is used to queue the pointers to work items since its FIFO property means the DMA transfers which have been in progress the longest are checked first and have the highest chance of being completed. Open TODOs: - The patchset refers to "work" and "work items" but this infrastructure is focused on "netdev async work". The variables and functions could be named more appropriately. I'm open to any suggestions here. - This patchset has been tested manually. Some form of automated testing would be better. Since we have stats for lots of different scenarios, unit tests should be quite easy. I'm open to suggestions for other forms of testing. v2: - Count cycles spent doing asynchronous work (patch 2/3). - Add a configurable delay to work deferral (patch 3/3). - Implement and use a simpler ring buffer in OVS, rather than using the DPDK implementation. - Only print work defer stats if some work has actually been deferred. - Add a "force" flag to the "process_async()" API to implement an attempt limit on the number of times an asynchronous piece of work should be attempted. - Do all outstanding work on a PMD thread before allowing a reload to occur. Cian Ferriter (3): dpif-netdev: Add a per thread work ring dpif-netdev: Count cycles spent doing async work dpif-netdev: Add a configurable delay to work deferral lib/automake.mk | 1 + lib/dpif-netdev-perf.c | 20 ++- lib/dpif-netdev-perf.h | 9 ++ lib/dpif-netdev-private-defer.h | 95 ++++++++++++++ lib/dpif-netdev-private-thread.h | 4 + lib/dpif-netdev.c | 204 ++++++++++++++++++++++++++++++- lib/netdev-dpdk.c | 22 ++-- lib/netdev-provider.h | 19 ++- lib/netdev.c | 3 +- 9 files changed, 362 insertions(+), 15 deletions(-) create mode 100644 lib/dpif-netdev-private-defer.h -- 2.32.0 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev