On 7 Oct 2025, at 13:57, Eelco Chaudron via dev wrote:
> This RFC patch series introduces a major architectural
> refactoring of Open vSwitch's hardware offload
> infrastructure. It replaces the tightly coupled
> `netdev-offload` implementation with a new, modular
> `dpif-offload-provider` framework.
Hi Eli and Ilya,
Thanks for your initial feedback, Ilya. This revision renames the rte
references and files to dpdk. It still includes the unit tests, but I agree
that we should find a way to automate them upstream. Eli et al., do you perhaps
know of any kind of software rte_flow PMD implementation?
Eli, thanks also for your offline feedback. For transparency, I'll reply to
some of it here.
This series should fix the bug related to flow modifications. If it doesn't
address your specific case, please let me know the exact steps to reproduce.
I've also added a patch to clean up the netdev-offload-xxx file naming,
changing them to dpif-offload-xxx-netdev.c. This makes it clearer that they're
now part of the dpif-offload framework. I didn't change the internal function
names, since that would be a large change with little real benefit.
You also mentioned that you do not like the flow_mark to megaflow_ufid mapping
being tracked in dpif-netdev. I partly agree, but I haven't cleaned it up yet.
Let me first explain how it currently works:
- dpif-netdev requests a flow offload.
- If the provider can offload it, it returns a flow_mark (allocated through
the common dpif-offload API).
- dpif-netdev associates this flow_mark with a dp_netdev_flow.
- dp_netdev_hw_flow() performs a dp_netdev_flow lookup based on the packet's
provided flow_mark.
So if I understand your proposal correctly, you want to hide the flow_mark
management from the dpif-netdev layer. If that's the case, we would need an API
to query the offload provider in dp_netdev_flow(), which would then return a
mega_ufid that dpif-netdev can use to look up the associated dp_netdev_flow. I
assume you also want to use the flow_mark for other internal (pre)processing.
Do you need additional callbacks, or would one general callback be sufficient?
There's already the experimental dpif_offload_netdev_hw_miss_packet_recover().
Maybe we should fold all of this into a single callback? Please provide some
more insights into this.
Finally, you mentioned that you need some kind of thread registration for the
PMDs to communicate with your specific offload threads. Is some kind of init
function all you need? If so, we could do something like this during reload, so
all providers have a chance to perform their setup during PMD reload or
initialization:
7207 static void *
7208 pmd_thread_main(void *f_)
7209 {
7210 struct dp_netdev_pmd_thread *pmd = f_;
7211 struct pmd_perf_stats *s = &pmd->perf_stats;
7212 unsigned int lc = 0;
7213 struct polled_queue *poll_list;
7214 bool wait_for_reload = false;
7215 bool dpdk_attached;
7216 bool reload_tx_qid;
7217 bool exiting;
7218 bool reload;
7219 int poll_cnt;
7220 int i;
7221 int process_packets = 0;
7222 uint64_t sleep_time = 0;
7223
7224 poll_list = NULL;
7225
7226 /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
7227 ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
7228 ovs_numa_thread_setaffinity_core(pmd->core_id);
7229 dpdk_attached = dpdk_attach_thread(pmd->core_id);
7230 poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
7231 dfc_cache_init(&pmd->flow_cache);
7232 pmd_alloc_static_tx_qid(pmd);
7233 set_timer_resolution(PMD_TIMER_RES_NS);
7234
7235 reload:
7236 atomic_count_init(&pmd->pmd_overloaded, 0);
7237
7238 pmd->intrvl_tsc_prev = 0;
7239 atomic_store_relaxed(&pmd->intrvl_cycles, 0);
7240
7241 if (!dpdk_attached) {
7242 dpdk_attached = dpdk_attach_thread(pmd->core_id);
7243 }
7244
++++ dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id);
++++ for_all_offload_providers: { offload->pmd_thread_reload(struct
dpif_offload *, unsigned core_id) }
++++
7245 /* List port/core affinity */
7246 for (i = 0; i < poll_cnt; i++) {
7247 VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
7248 pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx),
7249 netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
7250 /* Reset the rxq current cycles counter. */
7251 dp_netdev_rxq_set_cycles(poll_list[i].rxq, RXQ_CYCLES_PROC_CURR,
0);
7252 for (int j = 0; j < PMD_INTERVAL_MAX; j++) {
7253 dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, 0);
7254 }
7255 }
Let me know your thoughts...
Cheers,
Eelco
<SNIP>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev