On 5/28/26 10:29 AM, Eelco Chaudron wrote:
> 
> 
> On 27 May 2026, at 16:37, Gaetan Rivet wrote:
> 
>> On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote:
>>> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
>>>>
>>>>
>>>> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
>>>>
>>>>> This patch adds support for specific PMD thread initialization,
>>>>> deinitialization, and a callback execution to perform work as
>>>>> part of the PMD thread loop. This allows hardware offload
>>>>> providers to handle any specific asynchronous or batching work.
>>>>>
>>>>> This patch also adds cycle statistics for the provider-specific
>>>>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
>>>>
>>>> Bringing back the discussion on the earlier patch between Ilya and Gaetan 
>>>> to this revision :)
>>>>
>>>> Ilya:
>>>>   Hi, Eelco.  As we talked before, this infrastructure resembles the async
>>>>   work infra that was proposed in the past for the use case of async vhost
>>>>   processing.  And I don't see any real use case proposed for it here nor
>>>>   in the RFC, where the question was asked, but not replied.
>>>>
>>>> Gaetan:
>>>>
>>>
>>> Hi Gaetan,
>>>
>>> A few questions below. I'm not so clear on the DOCA threading
>>> requirements, so questions may be broad.
>>>
>>>>   Hi Ilya, Eelco,
>>>>
>>>>   Thanks for the patch and for the review.
>>>>
>>>>   The use-case on our side is distributed data-structures in DOCA that
>>>>   requires each participating threads to do maintenance work periodically.
>>>>
>>>>   Specifically, offload threads will insert offload objects.
>>>>   Those will reserve entries in a map that can be resized. The DOCA
>>>>   implementation requires any thread that owns an entry to perform the
>>>>   work of moving it to the new bucket / space after resize is initiated.
>>>>
>>>>   This is a pervasive design choice in DOCA, they write most of their APIs
>>>>   assuming participating threads are periodically calling into these
>>>>   maintenance functions.
>>>>
>>>
>>> What is a "particpating thread" ? IIUC, the pmd thread passes down the
>>> flow pattern/action and the offload thread inserts the offload into the NIC.
>>>
>>> In that case, is it the offload thread that owns the entry ?
>>>
>>
>> Participating threads are any threads that registered to DOCA-flow as
>> offloading threads. In our case, it means:
>>
>>   * The main thread
>>       --> When probing a port, starting it requires installing
>>           DOCA offloads to execute RSS in particular, and a few other
>>           'admin' offloads (optional rate-limiting on VF to avoid
>>           noisy-neighbors, etc).
>>
>>   * The offload thread(s) (in the OVS sense)
>>       A thread in OVS managing dp-flow offloads asynchronously.
>>
>>   * The polling thread(s)
>>       CT-offload is much simpler and faster than dp-flow offload.
>>       Executing offload insertion synchronously from the fastpath
>>       is beneficial.
>>
>> In our case, 'participating threads' are any thread owning an offload
>> queue in DOCA-flow.
>>
>> We have a few exceptions for the main thread, mainly that we force all
>> offload operations to be fully synchronous there: we do not want to
>> publish a new netdev if its 'admin' offloads have not yet been received
>> and successfully acknowledged by the hardware, so we force waiting
>> operations for it: it does not need to do regular upkeep etc.
>>
>>>>   Some of such work is also time-sensitive, for example the current
>>>>   implementation requires a CT offload thread to receive completions after
>>>>   some hardware initialization. Until this completion is done, the CT
>>>>   offload entry is not fully usable (cannot be queried for activity /
>>>>   counters). We cannot leave batches of CT offload entry waiting for
>>>>   completion, assuming that at some later point, we will eventually
>>>>   re-execute something in our offload provider: it leaves a few stranded
>>>>   connection objects incomplete.
>>>>
>>>>   This has the result of having hardware execution of a flow with CT
>>>>   actions, but no activity counters: the software datapath then deletes
>>>>   the connection and/or flow due to inactivity.
>>>>
>>>
>>> Can this periodic work be done by the offload thread ? If it is fast
>>> enough for inserting the offload, then maybe it is fast enough for this.
>>>
>>
>> The PMD thread owns the offload queue. If another thread has to execute
>> its upkeep work, it means sharing the queue between threads.
>>
>>> Some DPDK PMDs use alarms for periodic maintenance work, could they be
>>> used inside DOCA for this?
>>>
>>
>> Those upkeep functions are exposed by DOCA and part of the DOCA-flow
>> API. DOCA does not expose an event framework to schedule this kind of
>> work, it requires DOCA applications to explicitly call those functions.
>>
>>> If it needs to be on the PMD thread, is the work significant (i.e. more
>>> than a few % cpu) and how variable is it ? Could it be added inside the
>>> call to rte_eth_rx_burst polling ?
>>>
>>
>> It can be significant.
>> The work is anything requiring the use of the offload queue owned by
>> this thread. The principle is that the owning thread must execute it.
>>
>> Currently, with CT offloads we have:
>>
>>   * offload queue polling for HW completion (requests have been
>>     executed: add / mod / del were executed)
>>
>>   * CT-del: A conn was offloaded by PMD 1. The connection either expired
>>     or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del
>>     request to PMD-1: PMD-1 must poll for CT-del requests and
>>     execute them locally.
>>
>>   * Offload flush: when a port is deleted, all owning threads must
>>     process a blocking flush request from the main thread. The main
>>     thread only proceeds once all participating threads have completed
>>     their flush.
>>
>> Completion is a very lightweight work, but we must execute it.
>> Generally we do only completion polling as needed: we only clear enough
>> room in the offload queue for the current batch of requests we want to
>> enqueue, but we have an issue on idle: some stray completion can
>> be left in the queue and won't be processed if we rely only on activity.
>> Currently DOCA-flow does not support leaving the completions until the
>> port is deleted: they need to be processed.
>>
>> CT-del can be significant in some cases. We have a 'rolling-window' case
>> of constant open + close of short connections, and in this worst case,
>> CT-del takes ~30% (both local and distant). Some portion of it comes from
>> CT-del messages, in particular in case of multiple PMDs.
>>
>> Offload flush is generally quick, but we must answer the flush message
>> quickly to block the main thread as little as possible.
>>
>> Some of the messages must be handled even if there is no RX-burst: a PMD
>> that is waiting for reload will need to execute a flush message that it
>> has received.
> 
> Hi Gaetan,
> 
> I guess Kevin is suggesting to hide this work in netdev_doca_rxq_recv(),
> as it will always be called as long as DOCA ports are present on the
> PMD. Or are there cases where this is not the case?
> 
> dp_netdev_process_rxq_port()
>   netdev_rxq_recv()
>     netdev_doca_rxq_recv()
> 
> Kevin, please confirm.

Yes, that's what I was suggesting. The work is rxq specific and we
already have an rxq specific call that is called in a loop so why not do
it there and include the cycles needed for the maintenance work in the
measured cycles needed for that rxq.

> 
>> I think completions and flushes would be the main issues with the
>> rx-burst approach.
> 
> [...]
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to