On 12/01/2026 11:23, Ilya Maximets wrote:
External email: Use caution opening links or attachments
On 1/11/26 5:29 PM, Eli Britstein wrote:
On 11/01/2026 17:54, Aaron Conole wrote:
External email: Use caution opening links or attachments
Eelco Chaudron via dev <[email protected]> writes:
This RFC patch series introduces a major architectural
refactoring of Open vSwitch's hardware offload
infrastructure. It replaces the tightly coupled
`netdev-offload` implementation with a new, modular
`dpif-offload-provider` framework.
MOTIVATION
-------------------------------------------------------------
The existing `netdev-offload` API tightly couples datapath
implementations (like `dpif-netdev`) with specific offload
technologies (DPDK's rte_flow). This design has several
limitations:
- Rigid Architecture: It creates complex dependencies,
making the code difficult to maintain and extend.
- Limited Flexibility: Supporting multiple offload backends
simultaneously or adding new ones is cumbersome.
- Inconsistent APIs: The logic for handling different
offload types is scattered, leading to an inconsistent
and hard-to-follow API surface.
This refactoring aims to resolve these issues by creating a
clean separation of concerns, improving modularity, and
establishing a clear path for future hardware offload
integrations.
Thanks for all the work you've done on this over the last 8 months.
I'm joining the thanks.
PROPOSED SOLUTION: THE `DPIF-OFFLOAD-PROVIDER` FRAMEWORK
-------------------------------------------------------------
This series introduces the `dpif-offload-provider`
framework, which functions similarly to the existing
`dpif-provider` pattern. It treats hardware offload as a
distinct layer with multiple, dynamically selectable
backends.
Key features of the new framework include:
1. Modular Architecture: A clean separation between the
generic datapath interface and specific offload
provider implementations (e.g., `dpif-offload-tc`,
`dpif-offload-dpdk`). `dpif` layers are now generic
clients of the offload API.
2. Provider-based System: Allows multiple offload backends
to coexist.
3. Unified and Asynchronous API: Establishes a consistent
API across all offload providers. For userspace
datapaths, the API is extended to support asynchronous
flow operations with callbacks, making `dpif-netdev` a
more efficient client.
4. Enhanced Configuration: Provides granular control over
offload provider selection through a global and per-port
priority system (`hw-offload-priority`), allowing
fine-tuned policies for different hardware.
5. Improved Testing: Includes a new test framework
specifically for validating DPDKs rte_flow offloads,
enhancing long-term maintainability.
PATCH SERIES ORGANIZATION
-------------------------------------------------------------
This large series is organized logically to facilitate
review:
1. Framework Foundation: The initial patches establish the
core `dpif-offload-provider` framework, including the
necessary APIs for port management, flow mark
allocation, configuration, and a dummy provider for
testing.
2. Provider Implementation: These patches introduce the new
`dpif-offload-tc` and `dpif-offload-dpdk` providers,
building out their specific implementations on top of
the new framework.
3. API Migration and Decoupling: The bulk of the series
systematically migrates functionality from the legacy
`netdev-offload` layer to the new providers. Key
commits here decouple `dpif-netdev` and, crucially,
`dpif-netlink` from their hardware offload
entanglements.
4. Cleanup: The final patches remove the now-redundant
global APIs and structures from `netdev-offload`,
completing the transition.
BACKWARD COMPATIBILITY
-------------------------------------------------------------
This refactoring maintains full API compatibility from a
user's perspective. All existing `ovs-vsctl` and
`ovs-appctl` commands continue to function as before. The
changes are primarily internal architectural improvements
designed to make OVS more robust and extensible.
REQUEST FOR COMMENTS
-------------------------------------------------------------
This is a significant architectural change that affects
core OVS infrastructure. We welcome feedback on:
- The overall architectural approach and the
`dpif-offload-provider` concept.
- The API design, particularly the new asynchronous model
for `dpif-netdev`.
- The migration strategy and any potential backward
compatibility concerns.
- Performance implications of the new framework.
Just a few quick thoughts while I'm finishing up (I'm on patch 29):
1. There are a few groups that might make more sense to squash together
while applying. For example, 23-25 is particularly agregious example
where it introduces some temp change, and then deletes it with a
different change. While seeing the steps is nice, it is *really*
confusing - even on the review side, it's a bit jarring to see.
Typically we don't allow patches that clean up earlier patches in the
same series. It may be worth spending the time to squash some of
these. Regardless of how messy it can be.
2. The dpdk offload stuff - my opinion is it should be named with
rte_flow rather than dpdk. It could make things difficult in the
future if a second offload type becomes available for DPDK ports.
WDYT (I realize that's a rename that takes a bit of time to do)?
Hi, Aaron. I requested the rename on the v2 of the set and I still think
it should be 'dpdk' in all the user-facing parts and internally by extension.
See the arguments here:
https://mail.openvswitch.org/pipermail/ovs-dev/2025-October/426681.html
In short, 'rte_flow' is a very user-unfriendly name as it doesn't mean much
to the average OVS user. And we already expose 'dpdk' to users today as a
name of the offload type. Changing that will be a backward-incompatible
change while this set aims to be a refactor without major changes to the
current use of the feature. Having 'dpdk' on a user-facing side and 'rte_flow'
internally is confusing.
If we ever need to have an implementation of two separate DPDK APIs for flow
offload and have to give users a choice, we can go with a suffix on 'dpdk',
i.e. 'dpdk-ng' or 'dpdk-new-shiny-offload', but that is, IMO, very unlikely
to happen. More on that below.
Actually, giving it another thought, there is already a second offload
type in dpdk - rte_flow_async_XXX.
It's not a different API, async is just a mode of the same API. Having
a second offload API in DPDK would be strange, as rte_flow is supposed
to be an abstraction layer that is covering offload for all supported HW.
Having a second offload API within DPDK would mean a complete failure of
rte_flow and we'd just migrate to a new API instead while rte_flow is
likely getting deprecated and removed from DPDK in this situation.
It was introduced first to support HW steering mode for mlx cards.
New mlx5 cards (>=CX9) support only HW-steering, so it meant indeed the
legacy rte-flow could not work there.
Since then, another work has been done to do the required things under
the hood of the legacy rte-flow, so it can work.
It is a different API. It has another set of requirement, API etc. It's
not just a "mode".
The concept with this API is that there are QPs (Queue Pair) from the
SW->HW and HW->SW and WQE/CQE (Work-Queue-Element/Completion-Queue-Element).
async API form a WQE, put on the SW->HW queue, but the HW doesn't
do anything with it unless explicitly requested (see the comment for
struct rte_flow_op_attr).
When the HW competes the operation, it puts a CQE in the other queue.
The SW then must poll for it. The "user_data" specified in the async API
is returned as a cookie in the poll API.
Also, tables and templates must be first created.
Regarding "not likely to happen" - I agree with that (since legacy
rte-flow can support new mlx5 cards now, and also this API is not
implemented for most PMDs). However, in the unlikely case it will,
"dpdk-ng"/"new-shiny" are worse than calling it from the start "rte-flow".
This naming is not for the "average OVS user", but for a OVS developer
that wants to handle offloads.
I'm OK either way, just FYI.
If someone wants to implement in OVS dpif-offload-X based on
rte_flow_async API, it will indeed be confusing.
We'd likely need a separate set of high level APIs on the OVS side in order to
support async operation. And our offload is already happening asynchronously
with the traffic processing, so I'm not sure if we need to use the
rte_flow_async
functions. And if we do migrate to this implementation, I'm not sure why we
would allow users to choose rather than choosing the best implementation
ourselves.
The "async" in rte_flow_async is not related to OVS mode of async
threads. See above.
Best regards, Ilya Maximets.
3. During my read-ahead, I got the heebee jeebees with
[34/40] dpif_netdev: Fix nullable memcpy in queue_netdev_flow_put().
I haven't completed my review to that portion, but I suspect that if
it's a real issue, it could be first in the series or even separate
and applied separately. WDYT?
-------------------------------------------------------------
Changes from v1:
- Fixed issues reported by Aaron and my AI experiments.
- See individual patches for specific changes.
Changes from v2:
- See individual patches for specific changes.
Changes from v3:
- In patch 7, removed leftover netdev_close(port->netdev)
causing netdev reference count issues.
- Changed naming for dpif_offload_impl_type enum entries.
- Merged previous patch 36, 'dpif-netdev: Add full name
to the dp_netdev structure', with the next patch.
Eelco Chaudron (40):
dpif-offload-provider: Add dpif-offload-provider implementation.
dpif-offload: Add provider for tc offload.
dpif-offload: Add provider for dpdk (rte_flow).
dpif-offload: Allow configuration of offload provider priority.
dpif-offload: Move hw-offload configuration to dpif-offload.
dpif-offload: Add offload provider set_config API.
dpif-offload: Add port registration and management APIs.
dpif-offload-tc: Add port management framework.
dpif-offload-dpdk: Add port management framework.
dpif-offload: Validate mandatory port class callbacks on registration.
dpif-offload: Allow per-port offload provider priority config.
dpif-offload: Introduce provider debug information API.
dpif-offload: Call flow-flush netdev-offload APIs via dpif-offload.
dpif-offload: Call meter netdev-offload APIs via dpif-offload.
dpif-offload: Move the flow_get_n_flows() netdev-offload API to dpif.
dpif-offload: Move hw_post_process netdev API to dpif.
dpif-offload: Add flow dump APIs to dpif-offload.
dpif-offload: Move the tc flow dump netdev APIs to dpif-offload.
dpif-netlink: Remove netlink-offload integration.
dpif-netlink: Add API to get offloaded netdev from port_id.
dpif-offload: Add API to find offload implementation type.
dpif-offload: Add operate implementation to dpif-offload.
netdev-offload: Temporarily move thread-related APIs to dpif-netdev.
dpif-offload: Add port dump APIs to dpif-offload.
dpif-netdev: Remove indirect DPDK netdev offload API calls.
dpif: Add dpif_get_features() API.
dpif-offload: Add flow operations to dpif-offload-tc.
dpif-netlink: Remove entangled hardware offload.
dpif-offload-tc: Remove netdev-offload dependency.
netdev_dummy: Remove hardware offload override.
dpif-offload: Move the netdev_any_oor() API to dpif-offload.
netdev-offload: Remove the global netdev-offload API.
dpif-offload: Add inline flow APIs for userspace datapaths.
dpif_netdev: Fix nullable memcpy in queue_netdev_flow_put().
dpif-offload: Move offload_stats_get() API to dpif-offload.
dpif-offload-dpdk: Abstract rte_flow implementation from dpif-netdev.
dpif-offload-dummy: Add flow add/del/get APIs.
netdev-offload: Fold netdev-offload APIs and files into dpif-offload.
tests: Fix NSH decap header test for real Ethernet devices.
tests: Add a simple DPDK rte_flow test framework.
Documentation/topics/testing.rst | 19 +
include/openvswitch/json.h | 1 +
include/openvswitch/netdev.h | 1 +
lib/automake.mk | 17 +-
lib/dp-packet.h | 1 +
lib/dpctl.c | 50 +-
lib/dpdk.c | 2 -
lib/dpif-netdev-avx512.c | 4 +-
lib/dpif-netdev-private-flow.h | 9 +-
lib/dpif-netdev.c | 1244 +++---------
lib/dpif-netlink.c | 557 +----
...load-dpdk.c => dpif-offload-dpdk-netdev.c} | 592 ++++--
lib/dpif-offload-dpdk-private.h | 73 +
lib/dpif-offload-dpdk.c | 1186 +++++++++++
lib/dpif-offload-dummy.c | 920 +++++++++
lib/dpif-offload-provider.h | 421 ++++
...-offload-tc.c => dpif-offload-tc-netdev.c} | 238 ++-
lib/dpif-offload-tc-private.h | 76 +
lib/dpif-offload-tc.c | 877 ++++++++
lib/dpif-offload.c | 1790 +++++++++++++++++
lib/dpif-offload.h | 221 ++
lib/dpif-provider.h | 65 +-
lib/dpif.c | 166 +-
lib/dpif.h | 14 +-
lib/dummy.h | 9 +
lib/json.c | 7 +
lib/netdev-dpdk.c | 9 +-
lib/netdev-dpdk.h | 2 +-
lib/netdev-dummy.c | 199 +-
lib/netdev-linux.c | 3 +-
lib/netdev-offload-provider.h | 148 --
lib/netdev-offload.c | 910 ---------
lib/netdev-offload.h | 169 --
lib/netdev-provider.h | 10 +-
lib/netdev.c | 71 +-
lib/netdev.h | 22 +
lib/tc.c | 2 +-
ofproto/ofproto-dpif-upcall.c | 50 +-
ofproto/ofproto-dpif.c | 90 +-
tests/.gitignore | 3 +
tests/automake.mk | 24 +
tests/dpif-netdev.at | 40 +-
tests/ofproto-dpif.at | 170 ++
tests/ofproto-macros.at | 17 +-
tests/sendpkt.py | 12 +-
tests/system-dpdk-offloads-macros.at | 236 +++
tests/system-dpdk-offloads-testsuite.at | 28 +
tests/system-dpdk-offloads.at | 223 ++
tests/system-dpdk.at | 35 +
tests/system-kmod-macros.at | 5 +
tests/system-offloads-testsuite-macros.at | 5 +
tests/system-offloads-traffic.at | 48 +
tests/system-traffic.at | 9 +-
tests/system-userspace-macros.at | 5 +
vswitchd/bridge.c | 7 +-
vswitchd/vswitch.xml | 43 +
56 files changed, 7808 insertions(+), 3347 deletions(-)
rename lib/{netdev-offload-dpdk.c => dpif-offload-dpdk-netdev.c} (83%)
create mode 100644 lib/dpif-offload-dpdk-private.h
create mode 100644 lib/dpif-offload-dpdk.c
create mode 100644 lib/dpif-offload-dummy.c
create mode 100644 lib/dpif-offload-provider.h
rename lib/{netdev-offload-tc.c => dpif-offload-tc-netdev.c} (95%)
create mode 100644 lib/dpif-offload-tc-private.h
create mode 100644 lib/dpif-offload-tc.c
create mode 100644 lib/dpif-offload.c
create mode 100644 lib/dpif-offload.h
delete mode 100644 lib/netdev-offload-provider.h
delete mode 100644 lib/netdev-offload.c
delete mode 100644 lib/netdev-offload.h
create mode 100644 tests/system-dpdk-offloads-macros.at
create mode 100644 tests/system-dpdk-offloads-testsuite.at
create mode 100644 tests/system-dpdk-offloads.at
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev