On 1/15/26 7:10 PM, Aaron Conole wrote: > Ilya Maximets <[email protected]> writes: > >> On 1/15/26 10:19 AM, Eelco Chaudron wrote: >>> This patch series introduces a major architectural >>> refactoring of Open vSwitch's hardware offload >>> infrastructure. It replaces the tightly coupled >>> `netdev-offload` implementation with a new, modular >>> `dpif-offload-provider` framework. >>> >>> MOTIVATION >>> ------------------------------------------------------------- >>> The existing `netdev-offload` API tightly couples datapath >>> implementations (like `dpif-netdev`) with specific offload >>> technologies (DPDK's rte_flow). This design has several >>> limitations: >>> >>> - Rigid Architecture: It creates complex dependencies, >>> making the code difficult to maintain and extend. >>> >>> - Limited Flexibility: Supporting multiple offload backends >>> simultaneously or adding new ones is cumbersome. >>> >>> - Inconsistent APIs: The logic for handling different >>> offload types is scattered, leading to an inconsistent >>> and hard-to-follow API surface. >>> >>> This refactoring aims to resolve these issues by creating a >>> clean separation of concerns, improving modularity, and >>> establishing a clear path for future hardware offload >>> integrations. >>> >>> PROPOSED SOLUTION: THE `DPIF-OFFLOAD-PROVIDER` FRAMEWORK >>> ------------------------------------------------------------- >>> This series introduces the `dpif-offload-provider` >>> framework, which functions similarly to the existing >>> `dpif-provider` pattern. It treats hardware offload as a >>> distinct layer with multiple, dynamically selectable >>> backends. >>> >>> Key features of the new framework include: >>> >>> 1. Modular Architecture: A clean separation between the >>> generic datapath interface and specific offload >>> provider implementations (e.g., `dpif-offload-tc`, >>> `dpif-offload-dpdk`). `dpif` layers are now generic >>> clients of the offload API. >>> >>> 2. Provider-based System: Allows multiple offload backends >>> to coexist. >>> >>> 3. Unified and Asynchronous API: Establishes a consistent >>> API across all offload providers. For userspace >>> datapaths, the API is extended to support asynchronous >>> flow operations with callbacks, making `dpif-netdev` a >>> more efficient client. >>> >>> 4. Enhanced Configuration: Provides granular control over >>> offload provider selection through a global and per-port >>> priority system (`hw-offload-priority`), allowing >>> fine-tuned policies for different hardware. >>> >>> 5. Improved Testing: Includes a new test framework >>> specifically for validating DPDKs rte_flow offloads, >>> enhancing long-term maintainability. >>> >>> PATCH SERIES ORGANIZATION >>> ------------------------------------------------------------- >>> This large series is organized logically to facilitate >>> review: >>> >>> 1. Framework Foundation: The initial patches establish the >>> core `dpif-offload-provider` framework, including the >>> necessary APIs for port management, flow mark >>> allocation, configuration, and a dummy provider for >>> testing. >>> >>> 2. Provider Implementation: These patches introduce the new >>> `dpif-offload-tc` and `dpif-offload-dpdk` providers, >>> building out their specific implementations on top of >>> the new framework. >>> >>> 3. API Migration and Decoupling: The bulk of the series >>> systematically migrates functionality from the legacy >>> `netdev-offload` layer to the new providers. Key >>> commits here decouple `dpif-netdev` and, crucially, >>> `dpif-netlink` from their hardware offload >>> entanglements. >>> >>> 4. Cleanup: The final patches remove the now-redundant >>> global APIs and structures from `netdev-offload`, >>> completing the transition. >>> >>> BACKWARD COMPATIBILITY >>> ------------------------------------------------------------- >>> This refactoring maintains full API compatibility from a >>> user's perspective. All existing `ovs-vsctl` and >>> `ovs-appctl` commands continue to function as before. The >>> changes are primarily internal architectural improvements >>> designed to make OVS more robust and extensible. >>> >>> REQUEST FOR COMMENTS >>> ------------------------------------------------------------- >>> This is a significant architectural change that affects >>> core OVS infrastructure. We welcome feedback on: >>> >>> - The overall architectural approach and the >>> `dpif-offload-provider` concept. >>> - The API design, particularly the new asynchronous model >>> for `dpif-netdev`. >>> - The migration strategy and any potential backward >>> compatibility concerns. >>> - Performance implications of the new framework. >>> >>> ------------------------------------------------------------- >>> Changes from v1: >>> - Fixed issues reported by Aaron and my AI experiments. >>> - See individual patches for specific changes. >>> >>> Changes from v2: >>> - See individual patches for specific changes. >>> >>> Changes from v3: >>> - In patch 7, removed leftover netdev_close(port->netdev) >>> causing netdev reference count issues. >>> - Changed naming for dpif_offload_impl_type enum entries. >>> - Merged previous patch 36, 'dpif-netdev: Add full name >>> to the dp_netdev structure', with the next patch. >>> >>> Changes from v4: >>> - Applied review comments from Aaron, see individual patches. >>> - Added Eli's ACK to the individual patches. >>> >>> Changes from v5: >>> - Addressed Ilya's comments, see upstream discussion for >>> details. >>> >>> Eelco Chaudron (40): >>> dpif-offload-provider: Add dpif-offload-provider implementation. >>> dpif-offload: Add provider for tc offload. >>> dpif-offload: Add provider for dpdk (rte_flow). >>> dpif-offload: Allow configuration of offload provider priority. >>> dpif-offload: Move hw-offload configuration to dpif-offload. >>> dpif-offload: Add offload provider set_config API. >>> dpif-offload: Add port registration and management APIs. >>> dpif-offload-tc: Add port management framework. >>> dpif-offload-dpdk: Add port management framework. >>> dpif-offload: Validate mandatory port class callbacks on registration. >>> dpif-offload: Allow per-port offload provider priority config. >>> dpif-offload: Introduce provider debug information API. >>> dpif-offload: Call flow-flush netdev-offload APIs via dpif-offload. >>> dpif-offload: Call meter netdev-offload APIs via dpif-offload. >>> dpif-offload: Move the flow_get_n_flows() netdev-offload API to dpif. >>> dpif-offload: Move hw_post_process netdev API to dpif. >>> dpif-offload: Add flow dump APIs to dpif-offload. >>> dpif-offload: Move the tc flow dump netdev APIs to dpif-offload. >>> dpif-netlink: Remove netlink-offload integration. >>> dpif-netlink: Add API to get offloaded netdev from port_id. >>> dpif-offload: Add API to find offload implementation type. >>> dpif-offload: Add operate implementation to dpif-offload. >>> netdev-offload: Temporarily move thread-related APIs to dpif-netdev. >>> dpif-offload: Add port dump APIs to dpif-offload. >>> dpif-netdev: Remove indirect DPDK netdev offload API calls. >>> dpif: Add dpif_get_features() API. >>> dpif-offload: Add flow operations to dpif-offload-tc. >>> dpif-netlink: Remove entangled hardware offload. >>> dpif-offload-tc: Remove netdev-offload dependency. >>> netdev_dummy: Remove hardware offload override. >>> dpif-offload: Move the netdev_any_oor() API to dpif-offload. >>> netdev-offload: Remove the global netdev-offload API. >>> dpif-offload: Add inline flow APIs for userspace datapaths. >>> dpif_netdev: Fix nullable memcpy in queue_netdev_flow_put(). >>> dpif-offload: Move offload_stats_get() API to dpif-offload. >>> dpif-offload-dpdk: Abstract rte_flow implementation from dpif-netdev. >>> dpif-offload-dummy: Add flow add/del/get APIs. >>> netdev-offload: Fold netdev-offload APIs and files into dpif-offload. >>> tests: Fix NSH decap header test for real Ethernet devices. >>> tests: Add a simple DPDK rte_flow test framework. >>> >>> Documentation/topics/testing.rst | 19 + >>> include/openvswitch/json.h | 1 + >>> include/openvswitch/netdev.h | 1 + >>> include/openvswitch/thread.h | 6 + >>> lib/automake.mk | 17 +- >>> lib/dp-packet.h | 1 + >>> lib/dpctl.c | 50 +- >>> lib/dpdk.c | 2 - >>> lib/dpif-netdev-avx512.c | 4 +- >>> lib/dpif-netdev-private-flow.h | 9 +- >>> lib/dpif-netdev.c | 1243 +++--------- >>> lib/dpif-netlink.c | 557 +----- >>> ...load-dpdk.c => dpif-offload-dpdk-netdev.c} | 592 ++++-- >>> lib/dpif-offload-dpdk-private.h | 73 + >>> lib/dpif-offload-dpdk.c | 1114 +++++++++++ >>> lib/dpif-offload-dummy.c | 890 +++++++++ >>> lib/dpif-offload-provider.h | 412 ++++ >>> ...-offload-tc.c => dpif-offload-tc-netdev.c} | 229 ++- >>> lib/dpif-offload-tc-private.h | 73 + >>> lib/dpif-offload-tc.c | 810 ++++++++ >>> lib/dpif-offload.c | 1773 +++++++++++++++++ >>> lib/dpif-offload.h | 227 +++ >>> lib/dpif-provider.h | 66 +- >>> lib/dpif.c | 164 +- >>> lib/dpif.h | 14 +- >>> lib/dummy.h | 9 + >>> lib/json.c | 7 + >>> lib/netdev-dpdk.c | 9 +- >>> lib/netdev-dpdk.h | 2 +- >>> lib/netdev-dummy.c | 199 +- >>> lib/netdev-linux.c | 3 +- >>> lib/netdev-offload-provider.h | 148 -- >>> lib/netdev-offload.c | 910 --------- >>> lib/netdev-offload.h | 169 -- >>> lib/netdev-provider.h | 10 +- >>> lib/netdev.c | 71 +- >>> lib/netdev.h | 22 + >>> lib/tc.c | 2 +- >>> ofproto/ofproto-dpif-upcall.c | 50 +- >>> ofproto/ofproto-dpif.c | 82 +- >>> tests/.gitignore | 3 + >>> tests/automake.mk | 24 + >>> tests/dpif-netdev.at | 37 +- >>> tests/ofproto-dpif.at | 170 ++ >>> tests/ofproto-macros.at | 17 +- >>> tests/sendpkt.py | 12 +- >>> tests/system-dpdk-offloads-macros.at | 236 +++ >>> tests/system-dpdk-offloads-testsuite.at | 28 + >>> tests/system-dpdk-offloads.at | 223 +++ >>> tests/system-dpdk.at | 35 + >>> tests/system-kmod-macros.at | 5 + >>> tests/system-offloads-testsuite-macros.at | 5 + >>> tests/system-offloads-traffic.at | 46 + >>> tests/system-traffic.at | 9 +- >>> tests/system-userspace-macros.at | 5 + >>> vswitchd/bridge.c | 7 +- >>> vswitchd/vswitch.xml | 44 + >>> 57 files changed, 7601 insertions(+), 3345 deletions(-) >>> rename lib/{netdev-offload-dpdk.c => dpif-offload-dpdk-netdev.c} (83%) >>> create mode 100644 lib/dpif-offload-dpdk-private.h >>> create mode 100644 lib/dpif-offload-dpdk.c >>> create mode 100644 lib/dpif-offload-dummy.c >>> create mode 100644 lib/dpif-offload-provider.h >>> rename lib/{netdev-offload-tc.c => dpif-offload-tc-netdev.c} (95%) >>> create mode 100644 lib/dpif-offload-tc-private.h >>> create mode 100644 lib/dpif-offload-tc.c >>> create mode 100644 lib/dpif-offload.c >>> create mode 100644 lib/dpif-offload.h >>> delete mode 100644 lib/netdev-offload-provider.h >>> delete mode 100644 lib/netdev-offload.c >>> delete mode 100644 lib/netdev-offload.h >>> create mode 100644 tests/system-dpdk-offloads-macros.at >>> create mode 100644 tests/system-dpdk-offloads-testsuite.at >>> create mode 100644 tests/system-dpdk-offloads.at >>> >> >> Thanks, Eelco for the update! I read through the set again and it >> reads much easier after the renames. There are still a few small >> style issues throughout the set, but they are all minor like an >> extra empty line at the EOF of dpif-offload.c in patch 7, a few >> more places with a missing double space or underscore or the thread >> safety annotations not on a separate line. There is also a missed >> rename of dpif_offload_flow_get_n_offloaded() into something like >> dpif_ofload_flow_count(), since the underlying callback was renamed. >> And the provider_collection_add() should return a positive errno >> after all, since it's passed directly into ovs_strerror(). >> >> As discussed off-list, instead of me writing all of these nits down >> in the emails and then you fixing them, it's simpler if I just >> handle the merge and fix the nits on commit. > > Thanks for the work on that. > >> Also, found a couple more issues that we should address: >> >> 1. In the first patch, the dpif_offload_dump_next() function is >> describing a case that should be impossible, but still tries >> to handle it. We shouldn't do that. And instead we need to >> just take the collection reference in dpif_offload_dump_start() >> to make sure the collection doesn't go away during the dump. >> And then we can fully rely on the LIST_FOR_EACH_CONTINUE. >> >> 2. The per-port priority configuration should be per-interface >> instead as datapath doesn't deal with ports, it only works >> with interfaces. And some ports, like bonds can have multiple >> interfaces. It may not be a big deal as in a normal case >> I don't think we would need different offload for interfaces >> in the same port, but it's not really correct to configure >> ports, when it is applied on the interface. >> >> And there are two things that I flagged in v5, but we decided to >> handle separately: >> >> 3. Mass-rename of offload provider structures and functions to >> make them coherent and shorter than they are. Most of them >> are not exported and don't need extensive prefixes. And these >> prefixes need unification after moving stuff between netdev* >> and dpif* modules. >> >> 4. Move of the port manager instances from provider-specific >> structures to the common struct dpif_offload. This should >> eliminate some duplication and make the module boundaries >> more clear. This should also eliminate the need for the >> port dump API or at least significantly simplify it, e.g. >> by just iterating over port cmap with a cmap_cursor. >> >> >> So, the situation is the following: We have branching planned for >> tomorrow. Full CI run takes about 24 hours. So, we don't really >> have time for v7. We could postpone branching, but it also seems >> unreasonable as we only really need changes in about 4 patches >> out of 40, and none of the issues listed above are critical or >> breaking any functionality. They are mostly internal code movements. >> It would be much easier to get the set merged and then work on the >> 4 items above next week and backport to the new branch. > > +1 to that. > >> As mentioned earlier, to save some time, I can handle the merging >> and fix all the small nits on commit instead of wasting a lot of >> time on writing them down on the mailing list and then Eelco >> fixing them. >> >> Eelco, Aaron, does that sound like a good plan? Or did I miss >> anything in my summary? > > <gimli-voice/> And my ACKs :)
Thanks! > > (if it makes it easier, I can go through and send them or you can add > them during your merge) I'll be doing an interactive rebase for all the nit fixing anyway, so I can add the ACks during that process. > >> Best regards, Ilya Maximets. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
