On 5/28/26 10:29 AM, Eelco Chaudron wrote: > > > On 27 May 2026, at 16:37, Gaetan Rivet wrote: > >> On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote: >>> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote: >>>> >>>> >>>> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote: >>>> >>>>> This patch adds support for specific PMD thread initialization, >>>>> deinitialization, and a callback execution to perform work as >>>>> part of the PMD thread loop. This allows hardware offload >>>>> providers to handle any specific asynchronous or batching work. >>>>> >>>>> This patch also adds cycle statistics for the provider-specific >>>>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command. >>>> >>>> Bringing back the discussion on the earlier patch between Ilya and Gaetan >>>> to this revision :) >>>> >>>> Ilya: >>>> Hi, Eelco. As we talked before, this infrastructure resembles the async >>>> work infra that was proposed in the past for the use case of async vhost >>>> processing. And I don't see any real use case proposed for it here nor >>>> in the RFC, where the question was asked, but not replied. >>>> >>>> Gaetan: >>>> >>> >>> Hi Gaetan, >>> >>> A few questions below. I'm not so clear on the DOCA threading >>> requirements, so questions may be broad. >>> >>>> Hi Ilya, Eelco, >>>> >>>> Thanks for the patch and for the review. >>>> >>>> The use-case on our side is distributed data-structures in DOCA that >>>> requires each participating threads to do maintenance work periodically. >>>> >>>> Specifically, offload threads will insert offload objects. >>>> Those will reserve entries in a map that can be resized. The DOCA >>>> implementation requires any thread that owns an entry to perform the >>>> work of moving it to the new bucket / space after resize is initiated. >>>> >>>> This is a pervasive design choice in DOCA, they write most of their APIs >>>> assuming participating threads are periodically calling into these >>>> maintenance functions. >>>> >>> >>> What is a "particpating thread" ? IIUC, the pmd thread passes down the >>> flow pattern/action and the offload thread inserts the offload into the NIC. >>> >>> In that case, is it the offload thread that owns the entry ? >>> >> >> Participating threads are any threads that registered to DOCA-flow as >> offloading threads. In our case, it means: >> >> * The main thread >> --> When probing a port, starting it requires installing >> DOCA offloads to execute RSS in particular, and a few other >> 'admin' offloads (optional rate-limiting on VF to avoid >> noisy-neighbors, etc). >> >> * The offload thread(s) (in the OVS sense) >> A thread in OVS managing dp-flow offloads asynchronously. >> >> * The polling thread(s) >> CT-offload is much simpler and faster than dp-flow offload. >> Executing offload insertion synchronously from the fastpath >> is beneficial. >> >> In our case, 'participating threads' are any thread owning an offload >> queue in DOCA-flow. >> >> We have a few exceptions for the main thread, mainly that we force all >> offload operations to be fully synchronous there: we do not want to >> publish a new netdev if its 'admin' offloads have not yet been received >> and successfully acknowledged by the hardware, so we force waiting >> operations for it: it does not need to do regular upkeep etc. >> >>>> Some of such work is also time-sensitive, for example the current >>>> implementation requires a CT offload thread to receive completions after >>>> some hardware initialization. Until this completion is done, the CT >>>> offload entry is not fully usable (cannot be queried for activity / >>>> counters). We cannot leave batches of CT offload entry waiting for >>>> completion, assuming that at some later point, we will eventually >>>> re-execute something in our offload provider: it leaves a few stranded >>>> connection objects incomplete. >>>> >>>> This has the result of having hardware execution of a flow with CT >>>> actions, but no activity counters: the software datapath then deletes >>>> the connection and/or flow due to inactivity. >>>> >>> >>> Can this periodic work be done by the offload thread ? If it is fast >>> enough for inserting the offload, then maybe it is fast enough for this. >>> >> >> The PMD thread owns the offload queue. If another thread has to execute >> its upkeep work, it means sharing the queue between threads. >> >>> Some DPDK PMDs use alarms for periodic maintenance work, could they be >>> used inside DOCA for this? >>> >> >> Those upkeep functions are exposed by DOCA and part of the DOCA-flow >> API. DOCA does not expose an event framework to schedule this kind of >> work, it requires DOCA applications to explicitly call those functions. >> >>> If it needs to be on the PMD thread, is the work significant (i.e. more >>> than a few % cpu) and how variable is it ? Could it be added inside the >>> call to rte_eth_rx_burst polling ? >>> >> >> It can be significant. >> The work is anything requiring the use of the offload queue owned by >> this thread. The principle is that the owning thread must execute it. >> >> Currently, with CT offloads we have: >> >> * offload queue polling for HW completion (requests have been >> executed: add / mod / del were executed) >> >> * CT-del: A conn was offloaded by PMD 1. The connection either expired >> or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del >> request to PMD-1: PMD-1 must poll for CT-del requests and >> execute them locally. >> >> * Offload flush: when a port is deleted, all owning threads must >> process a blocking flush request from the main thread. The main >> thread only proceeds once all participating threads have completed >> their flush. >> >> Completion is a very lightweight work, but we must execute it. >> Generally we do only completion polling as needed: we only clear enough >> room in the offload queue for the current batch of requests we want to >> enqueue, but we have an issue on idle: some stray completion can >> be left in the queue and won't be processed if we rely only on activity. >> Currently DOCA-flow does not support leaving the completions until the >> port is deleted: they need to be processed. >> >> CT-del can be significant in some cases. We have a 'rolling-window' case >> of constant open + close of short connections, and in this worst case, >> CT-del takes ~30% (both local and distant). Some portion of it comes from >> CT-del messages, in particular in case of multiple PMDs. >> >> Offload flush is generally quick, but we must answer the flush message >> quickly to block the main thread as little as possible. >> >> Some of the messages must be handled even if there is no RX-burst: a PMD >> that is waiting for reload will need to execute a flush message that it >> has received. > > Hi Gaetan, > > I guess Kevin is suggesting to hide this work in netdev_doca_rxq_recv(), > as it will always be called as long as DOCA ports are present on the > PMD. Or are there cases where this is not the case? > > dp_netdev_process_rxq_port() > netdev_rxq_recv() > netdev_doca_rxq_recv() > > Kevin, please confirm.
Yes, that's what I was suggesting. The work is rxq specific and we already have an rxq specific call that is called in a loop so why not do it there and include the cycles needed for the maintenance work in the measured cycles needed for that rxq. > >> I think completions and flushes would be the main issues with the >> rx-burst approach. > > [...] > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
