On 27 May 2026, at 16:37, Gaetan Rivet wrote:

> On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote:
>> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
>>>
>>>
>>> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
>>>
>>>> This patch adds support for specific PMD thread initialization,
>>>> deinitialization, and a callback execution to perform work as
>>>> part of the PMD thread loop. This allows hardware offload
>>>> providers to handle any specific asynchronous or batching work.
>>>>
>>>> This patch also adds cycle statistics for the provider-specific
>>>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
>>>
>>> Bringing back the discussion on the earlier patch between Ilya and Gaetan 
>>> to this revision :)
>>>
>>> Ilya:
>>>   Hi, Eelco.  As we talked before, this infrastructure resembles the async
>>>   work infra that was proposed in the past for the use case of async vhost
>>>   processing.  And I don't see any real use case proposed for it here nor
>>>   in the RFC, where the question was asked, but not replied.
>>>
>>> Gaetan:
>>>
>>
>> Hi Gaetan,
>>
>> A few questions below. I'm not so clear on the DOCA threading
>> requirements, so questions may be broad.
>>
>>>   Hi Ilya, Eelco,
>>>
>>>   Thanks for the patch and for the review.
>>>
>>>   The use-case on our side is distributed data-structures in DOCA that
>>>   requires each participating threads to do maintenance work periodically.
>>>
>>>   Specifically, offload threads will insert offload objects.
>>>   Those will reserve entries in a map that can be resized. The DOCA
>>>   implementation requires any thread that owns an entry to perform the
>>>   work of moving it to the new bucket / space after resize is initiated.
>>>
>>>   This is a pervasive design choice in DOCA, they write most of their APIs
>>>   assuming participating threads are periodically calling into these
>>>   maintenance functions.
>>>
>>
>> What is a "particpating thread" ? IIUC, the pmd thread passes down the
>> flow pattern/action and the offload thread inserts the offload into the NIC.
>>
>> In that case, is it the offload thread that owns the entry ?
>>
>
> Participating threads are any threads that registered to DOCA-flow as
> offloading threads. In our case, it means:
>
>   * The main thread
>       --> When probing a port, starting it requires installing
>           DOCA offloads to execute RSS in particular, and a few other
>           'admin' offloads (optional rate-limiting on VF to avoid
>           noisy-neighbors, etc).
>
>   * The offload thread(s) (in the OVS sense)
>       A thread in OVS managing dp-flow offloads asynchronously.
>
>   * The polling thread(s)
>       CT-offload is much simpler and faster than dp-flow offload.
>       Executing offload insertion synchronously from the fastpath
>       is beneficial.
>
> In our case, 'participating threads' are any thread owning an offload
> queue in DOCA-flow.
>
> We have a few exceptions for the main thread, mainly that we force all
> offload operations to be fully synchronous there: we do not want to
> publish a new netdev if its 'admin' offloads have not yet been received
> and successfully acknowledged by the hardware, so we force waiting
> operations for it: it does not need to do regular upkeep etc.
>
>>>   Some of such work is also time-sensitive, for example the current
>>>   implementation requires a CT offload thread to receive completions after
>>>   some hardware initialization. Until this completion is done, the CT
>>>   offload entry is not fully usable (cannot be queried for activity /
>>>   counters). We cannot leave batches of CT offload entry waiting for
>>>   completion, assuming that at some later point, we will eventually
>>>   re-execute something in our offload provider: it leaves a few stranded
>>>   connection objects incomplete.
>>>
>>>   This has the result of having hardware execution of a flow with CT
>>>   actions, but no activity counters: the software datapath then deletes
>>>   the connection and/or flow due to inactivity.
>>>
>>
>> Can this periodic work be done by the offload thread ? If it is fast
>> enough for inserting the offload, then maybe it is fast enough for this.
>>
>
> The PMD thread owns the offload queue. If another thread has to execute
> its upkeep work, it means sharing the queue between threads.
>
>> Some DPDK PMDs use alarms for periodic maintenance work, could they be
>> used inside DOCA for this?
>>
>
> Those upkeep functions are exposed by DOCA and part of the DOCA-flow
> API. DOCA does not expose an event framework to schedule this kind of
> work, it requires DOCA applications to explicitly call those functions.
>
>> If it needs to be on the PMD thread, is the work significant (i.e. more
>> than a few % cpu) and how variable is it ? Could it be added inside the
>> call to rte_eth_rx_burst polling ?
>>
>
> It can be significant.
> The work is anything requiring the use of the offload queue owned by
> this thread. The principle is that the owning thread must execute it.
>
> Currently, with CT offloads we have:
>
>   * offload queue polling for HW completion (requests have been
>     executed: add / mod / del were executed)
>
>   * CT-del: A conn was offloaded by PMD 1. The connection either expired
>     or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del
>     request to PMD-1: PMD-1 must poll for CT-del requests and
>     execute them locally.
>
>   * Offload flush: when a port is deleted, all owning threads must
>     process a blocking flush request from the main thread. The main
>     thread only proceeds once all participating threads have completed
>     their flush.
>
> Completion is a very lightweight work, but we must execute it.
> Generally we do only completion polling as needed: we only clear enough
> room in the offload queue for the current batch of requests we want to
> enqueue, but we have an issue on idle: some stray completion can
> be left in the queue and won't be processed if we rely only on activity.
> Currently DOCA-flow does not support leaving the completions until the
> port is deleted: they need to be processed.
>
> CT-del can be significant in some cases. We have a 'rolling-window' case
> of constant open + close of short connections, and in this worst case,
> CT-del takes ~30% (both local and distant). Some portion of it comes from
> CT-del messages, in particular in case of multiple PMDs.
>
> Offload flush is generally quick, but we must answer the flush message
> quickly to block the main thread as little as possible.
>
> Some of the messages must be handled even if there is no RX-burst: a PMD
> that is waiting for reload will need to execute a flush message that it
> has received.

Hi Gaetan,

I guess Kevin is suggesting to hide this work in netdev_doca_rxq_recv(),
as it will always be called as long as DOCA ports are present on the
PMD. Or are there cases where this is not the case?

dp_netdev_process_rxq_port()
  netdev_rxq_recv()
    netdev_doca_rxq_recv()

Kevin, please confirm.

> I think completions and flushes would be the main issues with the
> rx-burst approach.

[...]

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to