By default, DPDK probes all available resources (like PCI devices) and
partially initialises them (/ takes over them).
This behavior has been relied on by OVS, since netdev-dpdk introduction.
It is not needed since DPDK device hotplug has been supported and used
for some time now.
Besides, this initial probing may not be desirable:
- for PCI devices bound to vfio-pci, the first application taking over
them "wins", meaning that OVS would prevent qemu from using some VF
devices,
- for mlx5 devices,
- the driver maintains link status and liveness of all ports
(taking some kernel lock) even when OVS only uses a subset of them,
- if some driver feature needs to be enabled for one port via a devargs,
this would have to be set in dpdk-extra,
Change this behavior and disable the initial PCI probing by passing
a specially crafted allow list: this implementation is not elegant
but it has been successfully used (for the PCI part) in a number of
setups I know, and there is no better DPDK API to achieve the same
at the moment.
This behavior change breaks setups that were using the
class=eth,mac=XX:XX:XX:XX:XX:XX syntax because OVS was relying on the
(fragile) assumption that all DPDK ports were probed at init once and
for all.
Add a warning for users of this syntax, update the documentation and
add an option to restore the original behavior via
'dpdk-probe-at-init=true'.
This option also helps for unexpected cases like https://xkcd.com/1172/.
Acked-by: Eli Britstein <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: David Marchand <[email protected]>
---
Changes since v1:
- improved documentation wording and references,
- added checks for --allow/--block and --no-pci options,
Changes since RFC v2:
- updated descriptions and comments,
Changes since RFC v1:
- updated commitlog (mentionning devargs),
- handled other DPDK buses,
---
Documentation/howto/dpdk.rst | 6 +++++
Documentation/intro/install/dpdk.rst | 6 +++++
NEWS | 5 ++++
lib/dpdk.c | 31 +++++++++++++++++++++
lib/netdev-dpdk.c | 2 +-
tests/system-dpdk-macros.at | 2 +-
tests/system-dpdk.at | 40 ++++++++++++++--------------
vswitchd/vswitch.xml | 31 +++++++++++++++++++++
8 files changed, 101 insertions(+), 22 deletions(-)
diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 73e630b07f..5d6bf94cdb 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -62,6 +62,12 @@ is suggested::
.. important::
+ Using this syntax requires that DPDK probes the device that owns those
+ multiple ports. This can be achieved by either setting an allowlist
+ of PCI devices in the ``dpdk-extra`` configuration, or by requesting that
+ all available devices (including PCI devices) be probed at initialization
+ (setting ``dpdk-probe-at-init`` to true).
+
Hotplugging physical interfaces is not supported using the above syntax.
This is expected to change with the release of DPDK v18.05. For information
on hotplugging physical interfaces, you should instead refer to
diff --git a/Documentation/intro/install/dpdk.rst
b/Documentation/intro/install/dpdk.rst
index 6f4687bdea..8bc15529ba 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -297,6 +297,12 @@ listed below. Defaults will be provided for all values not
explicitly set.
sockets. If not specified, this option will not be set by default. DPDK
default will be used instead.
+``dpdk-probe-at-init``
+ Specifies whether DPDK should probe all devices available at initialization.
+ This option is needed when using the ``class=eth,mac=XX:XX:XX:XX:XX:XX``
+ syntax for DPDK ports. Defaults to false. See ``ovs-vswitchd.conf.db(5)``
+ for more details.
+
``dpdk-hugepage-dir``
Directory where hugetlbfs is mounted
diff --git a/NEWS b/NEWS
index 1a3044cbfb..67107da720 100644
--- a/NEWS
+++ b/NEWS
@@ -3,6 +3,11 @@ Post-v3.7.0
- Userspace datapath:
* ARP/ND lookups for native tunnel are now rate limited. The holdout
timer can be configured with 'tnl/neigh/retrans_time'.
+ - DPDK:
+ * Probing of devices at DPDK initialization has been disabled to avoid
+ wasting resources on unused devices. This breaks DPDK netdev ports
+ using the "class=eth,mac=" syntax (though it can be restored via the
+ 'dpdk-probe-at-init' config option, see ovs-vswitchd.conf.db(5)).
v3.7.0 - 16 Feb 2026
diff --git a/lib/dpdk.c b/lib/dpdk.c
index d27b95cd9a..3685baf00c 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -430,6 +430,37 @@ dpdk_init__(const struct smap *ovs_other_config)
svec_add_nocopy(&args, xasprintf("0@%d", cpu));
}
+ if (!args_contains(&args, "-a") && !args_contains(&args, "--allow")
+ && !args_contains(&args, "-b") && !args_contains(&args, "--block")
+ && !smap_get_bool(ovs_other_config, "dpdk-probe-at-init", false)) {
+#ifdef RTE_BUS_AUXILIARY
+ svec_add(&args, "-a");
+ svec_add(&args, "auxiliary:");
+#endif
+#ifdef RTE_BUS_CDX
+ svec_add(&args, "-a");
+ svec_add(&args, "cdx:cdx-");
+#endif
+#ifdef RTE_BUS_FSLMC
+ svec_add(&args, "-a");
+ svec_add(&args, "fslmc:dpni.65535");
+#endif
+#ifdef RTE_BUS_PCI
+ if (!args_contains(&args, "--no-pci")) {
+ svec_add(&args, "-a");
+ svec_add(&args, "pci:0000:00:00.0");
+ }
+#endif
+#ifdef RTE_BUS_UACCE
+ svec_add(&args, "-a");
+ svec_add(&args, "uacce:");
+#endif
+#ifdef RTE_BUS_VMBUS
+ svec_add(&args, "-a");
+ svec_add(&args, "vmbus:00000000-0000-0000-0000-000000000000");
+#endif
+ }
+
svec_terminate(&args);
optind = 1;
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7629d0f974..e375275de1 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2052,7 +2052,7 @@ netdev_dpdk_get_port_by_mac(const char *mac_str, char
const **extra_err)
}
}
- *extra_err = "unknown mac";
+ *extra_err = "unknown mac (need dpdk-probe-at-init=true ?)";
return DPDK_ETH_PORT_ID_INVALID;
}
diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index f8ba766739..716d8a357d 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -139,7 +139,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
OVS_DPDK_PRE_CHECK()
OVS_WAIT_WHILE([ip link show ovs-netdev])
dnl For functional tests, no need for DPDK PCI probing.
- OVS_DPDK_START([--no-pci], [--disable-system], [$3])
+ OVS_DPDK_START([], [--disable-system], [$3])
dnl Add bridges, ports, etc.
OVS_WAIT_WHILE([ip link show br0])
AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
uuidfilt])], [0], [$2])
diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 47a70f2b03..c5f18e71c5 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -43,7 +43,7 @@ dnl Check if EAL init is successful
AT_SETUP([OVS-DPDK - EAL init])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
AT_CHECK([grep "DPDK Enabled - initializing..." ovs-vswitchd.log], [],
[stdout])
AT_CHECK([grep "EAL" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "DPDK Enabled - initialized" ovs-vswitchd.log], [], [stdout])
@@ -59,7 +59,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - single])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_START_OVSDB()
-OVS_DPDK_START_VSWITCHD([--no-pci])
+OVS_DPDK_START_VSWITCHD()
CHECK_CPU_DISCOVERED()
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
other_config:dpdk-lcore-mask=0x1])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true])
@@ -77,7 +77,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - multi])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_START_OVSDB()
-OVS_DPDK_START_VSWITCHD([--no-pci])
+OVS_DPDK_START_VSWITCHD()
CHECK_CPU_DISCOVERED(4)
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
other_config:dpdk-lcore-mask=0xf])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true])
@@ -95,7 +95,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - non-contig])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_START_OVSDB()
-OVS_DPDK_START_VSWITCHD([--no-pci])
+OVS_DPDK_START_VSWITCHD()
CHECK_CPU_DISCOVERED(8)
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
other_config:dpdk-lcore-mask=0xca])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true])
@@ -113,7 +113,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - zeromask])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_START_OVSDB()
-OVS_DPDK_START_VSWITCHD([--no-pci])
+OVS_DPDK_START_VSWITCHD()
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
other_config:dpdk-lcore-mask=0x0])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true])
OVS_WAIT_UNTIL([grep "Ignoring database defined option 'dpdk-lcore-mask' due
to invalid value '0x0'" ovs-vswitchd.log])
@@ -152,7 +152,7 @@ dnl Add vhost-user-client port
AT_SETUP([OVS-DPDK - add vhost-user-client port])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -181,7 +181,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user ports])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_CHECK_TESTPMD()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -237,7 +237,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user-client ports])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_CHECK_TESTPMD()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -350,7 +350,7 @@ AT_SETUP([OVS-DPDK - Ingress policing create delete vport
port])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add ingress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -387,7 +387,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing rate])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add ingress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -421,7 +421,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing burst])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add ingress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -487,7 +487,7 @@ AT_SETUP([OVS-DPDK - QoS create delete vport port])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add egress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -522,7 +522,7 @@ AT_SETUP([OVS-DPDK - QoS no cir])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add egress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -551,7 +551,7 @@ AT_SETUP([OVS-DPDK - QoS no cbs])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and add egress policer
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -657,7 +657,7 @@ AT_KEYWORDS([dpdk])
OVS_DPDK_CHECK_TESTPMD()
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS with default MTU value
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -698,7 +698,7 @@ AT_KEYWORDS([dpdk])
OVS_DPDK_CHECK_TESTPMD()
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and modify MTU value
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -816,7 +816,7 @@ AT_KEYWORDS([dpdk])
OVS_DPDK_CHECK_TESTPMD()
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and set MTU value to max upper
bound
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -858,7 +858,7 @@ AT_KEYWORDS([dpdk])
OVS_DPDK_CHECK_TESTPMD()
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
dnl Add userspace bridge and attach it to OVS and set MTU value to min lower
bound
AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
@@ -897,7 +897,7 @@ AT_SETUP([OVS-DPDK - user configured mempool])
AT_KEYWORDS([dpdk])
OVS_DPDK_PRE_CHECK()
OVS_DPDK_START_OVSDB()
-OVS_DPDK_START_VSWITCHD([--no-pci])
+OVS_DPDK_START_VSWITCHD()
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch .
other_config:shared-mempool-config=8000,6000,1500])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true])
@@ -946,7 +946,7 @@ dnl
--------------------------------------------------------------------------
AT_SETUP([OVS-DPDK - ovs-appctl dpif/offload/show])
AT_KEYWORDS([dpdk dpif-offload])
OVS_DPDK_PRE_CHECK()
-OVS_DPDK_START([--no-pci])
+OVS_DPDK_START()
AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
AT_CHECK([ovs-vsctl add-port br0 p1 \
-- set Interface p1 type=dpdk options:dpdk-devargs=net_null0,no-rx=1],
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index b7a5afc0a5..e33f43b66b 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -453,6 +453,37 @@
</p>
</column>
+ <column name="other_config" key="dpdk-probe-at-init"
+ type='{"type": "boolean"}'>
+ <p>
+ Specifies whether DPDK should probe all devices available at the
+ time DPDK is initialized. When set to <code>false</code> (the
+ default), OVS will instruct DPDK to skip probing devices at
+ initialization, unless <ref column="other_config" key="dpdk-extra"/>
+ already contains allowed or blocked devices. This is required when
+ declaring DPDK ports using the
+ <code>class=eth,mac=XX:XX:XX:XX:XX:XX</code> syntax.
+ </p>
+ <p>
+ Enabling this option implies higher resource consumption, as OVS may
+ not use all probed devices. It may also cause undesired side effects
+ such as additional interrupt handling and link status checks for
+ unused devices. For example, mlx5 devices maintain link status and
+ liveness by taking kernel locks frequently, even when OVS only uses
+ a subset of the available ports.
+ </p>
+ <p>
+ Note that this option has no effect when
+ <ref column="other_config" key="dpdk-extra"/> contains allowed
+ (<code>-a</code> or <code>--allow</code>) or blocked
+ (<code>-b</code> or <code>--block</code>) devices, as DPDK will
+ honor those specifications regardless of this setting.
+ </p>
+ <p>
+ Defaults to <code>false</code>.
+ </p>
+ </column>
+
<column name="other_config" key="dpdk-extra"
type='{"type": "string"}'>
<p>
--
2.53.0
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev