Re: [ovs-dev] [RFC PATCH v3 00/42] Architectural refactoring of hardware offload infrastructure

Eelco Chaudron via dev Wed, 08 Oct 2025 03:40:52 -0700

On 7 Oct 2025, at 13:57, Eelco Chaudron via dev wrote:

> This RFC patch series introduces a major architectural
> refactoring of Open vSwitch's hardware offload
> infrastructure. It replaces the tightly coupled
> `netdev-offload` implementation with a new, modular
> `dpif-offload-provider` framework.

Hi Eli and Ilya,

Thanks for your initial feedback, Ilya. This revision renames the rte 
references and files to dpdk. It still includes the unit tests, but I agree 
that we should find a way to automate them upstream. Eli et al., do you perhaps 
know of any kind of software rte_flow PMD implementation?

Eli, thanks also for your offline feedback. For transparency, I'll reply to 
some of it here.

This series should fix the bug related to flow modifications. If it doesn't 
address your specific case, please let me know the exact steps to reproduce.

I've also added a patch to clean up the netdev-offload-xxx file naming, 
changing them to dpif-offload-xxx-netdev.c. This makes it clearer that they're 
now part of the dpif-offload framework. I didn't change the internal function 
names, since that would be a large change with little real benefit.

You also mentioned that you do not like the flow_mark to megaflow_ufid mapping 
being tracked in dpif-netdev. I partly agree, but I haven't cleaned it up yet. 
Let me first explain how it currently works:
  - dpif-netdev requests a flow offload.
  - If the provider can offload it, it returns a flow_mark (allocated through 
the common dpif-offload API).
  - dpif-netdev associates this flow_mark with a dp_netdev_flow.
  - dp_netdev_hw_flow() performs a dp_netdev_flow lookup based on the packet's 
provided flow_mark.

So if I understand your proposal correctly, you want to hide the flow_mark 
management from the dpif-netdev layer. If that's the case, we would need an API 
to query the offload provider in dp_netdev_flow(), which would then return a 
mega_ufid that dpif-netdev can use to look up the associated dp_netdev_flow. I 
assume you also want to use the flow_mark for other internal (pre)processing. 
Do you need additional callbacks, or would one general callback be sufficient? 
There's already the experimental dpif_offload_netdev_hw_miss_packet_recover(). 
Maybe we should fold all of this into a single callback? Please provide some 
more insights into this.

Finally, you mentioned that you need some kind of thread registration for the 
PMDs to communicate with your specific offload threads. Is some kind of init 
function all you need? If so, we could do something like this during reload, so 
all providers have a chance to perform their setup during PMD reload or 
initialization:

7207  static void *
7208  pmd_thread_main(void *f_)
7209  {
7210      struct dp_netdev_pmd_thread *pmd = f_;
7211      struct pmd_perf_stats *s = &pmd->perf_stats;
7212      unsigned int lc = 0;
7213      struct polled_queue *poll_list;
7214      bool wait_for_reload = false;
7215      bool dpdk_attached;
7216      bool reload_tx_qid;
7217      bool exiting;
7218      bool reload;
7219      int poll_cnt;
7220      int i;
7221      int process_packets = 0;
7222      uint64_t sleep_time = 0;
7223
7224      poll_list = NULL;
7225
7226      /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
7227      ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
7228      ovs_numa_thread_setaffinity_core(pmd->core_id);
7229      dpdk_attached = dpdk_attach_thread(pmd->core_id);
7230      poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
7231      dfc_cache_init(&pmd->flow_cache);
7232      pmd_alloc_static_tx_qid(pmd);
7233      set_timer_resolution(PMD_TIMER_RES_NS);
7234
7235  reload:
7236      atomic_count_init(&pmd->pmd_overloaded, 0);
7237
7238      pmd->intrvl_tsc_prev = 0;
7239      atomic_store_relaxed(&pmd->intrvl_cycles, 0);
7240
7241      if (!dpdk_attached) {
7242          dpdk_attached = dpdk_attach_thread(pmd->core_id);
7243      }
7244
++++      dpif_offload_pmd_thread_reload(pmd->dp->full_name, pmd->core_id);
++++          for_all_offload_providers: { offload->pmd_thread_reload(struct 
dpif_offload *, unsigned core_id) }
++++
7245      /* List port/core affinity */
7246      for (i = 0; i < poll_cnt; i++) {
7247         VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
7248                  pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx),
7249                  netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
7250         /* Reset the rxq current cycles counter. */
7251         dp_netdev_rxq_set_cycles(poll_list[i].rxq, RXQ_CYCLES_PROC_CURR, 
0);
7252         for (int j = 0; j < PMD_INTERVAL_MAX; j++) {
7253             dp_netdev_rxq_set_intrvl_cycles(poll_list[i].rxq, 0);
7254         }
7255      }

Let me know your thoughts...

Cheers,

Eelco

<SNIP>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC PATCH v3 00/42] Architectural refactoring of hardware offload infrastructure

Reply via email to