Re: [PATCH V2] ethdev_trace.h: Update the trace point function when _TIME_BITS=64
On 4/24/25 02:00, Stephen Hemminger wrote: CAUTION: This email comes from a non Wind River email account! Do not click links or open attachments unless you recognize the sender and know the content is safe. On Tue, 22 Apr 2025 20:29:56 +0800 wrote: +#if defined(_TIME_BITS) && _TIME_BITS == 64 RTE_TRACE_POINT( rte_eth_trace_timesync_write_time, RTE_TRACE_POINT_ARGS(uint16_t port_id, const struct timespec *time, int ret), rte_trace_point_emit_u16(port_id); + rte_trace_point_emit_u64(time->tv_sec); + rte_trace_point_emit_long(time->tv_nsec); + rte_trace_point_emit_int(ret); +) +#else +RTE_TRACE_POINT( + rte_eth_trace_timesync_write_time, + RTE_TRACE_POINT_ARGS(uint16_t port_id, const struct timespec *time, + int ret), + rte_trace_point_emit_u16(port_id); rte_trace_point_emit_size_t(time->tv_sec); rte_trace_point_emit_long(time->tv_nsec); rte_trace_point_emit_int(ret); ) +#endif No. Do not start adding #ifdef to trace points. Instead, add new hook rte_trace_point_emit_time_t and that can handle any ABI changes like this. Hi, Stephen Thanks, I will try to add this. Best to wait until 25.11 release since could be ABI change. Do you mean I should not send V3 patch? Send the patch after 25.11 is release? Regards Changqing
Re: rte_eth_stats_get seems slow
On Fri, 25 Apr 2025 13:52:55 +0200 Morten Brørup wrote: > Bruce, > > rte_eth_stats_get() on Intel NICs seems slow to me. > > E.g. getting the stats on a single port takes ~132 us (~451,000 CPU cycles) > using the igb driver, and ~50 us using the i40e driver. > > Referring to the igb driver source code [1], it's 44 calls to > E1000_READ_REG(), so the math says that each one takes 3 us (~10,000 CPU > cycles). > > Is this expected behavior? > > It adds up, e.g. it takes a full millisecond to fetch the stats from eight > ports using the igb driver. > > [1]: > https://elixir.bootlin.com/dpdk/v24.11.1/source/drivers/net/e1000/igb_ethdev.c#L1724 > > > Med venlig hilsen / Kind regards, > -Morten Brørup > Well reading each stat requires a PCI access. And PCI accesses are non-cached.
Re: Regarding Mellanox bifurcated driver on Azure
On Fri, 25 Apr 2025 23:17:30 +0530 Prashant Upadhyaya wrote: > Hi, > > I am having a VM on Azure where I have got two 'accelerated networking' > interfaces of Mellanox > # lspci -nn|grep -i ether > 6561:00:02.0 Ethernet controller [0200]: Mellanox Technologies MT27710 > Family [ConnectX-4 Lx Virtual Function] [15b3:1016] (rev 80) > f08c:00:02.0 Ethernet controller [0200]: Mellanox Technologies MT27710 > Family [ConnectX-4 Lx Virtual Function] [15b3:1016] (rev 80) > > I have a DPDK application which needs to obtain 'all' packets from the NIC. > I installed the drivers, compiled DPDK24.11 (Ubuntu20.04), my app starts > and is able to detect the NIC's. > Everything looks good > myapp.out -c 0x07 -a f08c:00:02.0 -a 6561:00:02.0 > EAL: Detected CPU lcores: 8 > EAL: Detected NUMA nodes: 1 > EAL: Detected shared linkage of DPDK > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: VFIO support initialized > mlx5_net: Default miss action is not supported. > mlx5_net: Default miss action is not supported. > All Ports initialized > Port 0 is UP (5 Mbps) > Port 1 is UP (5 Mbps) > > The trouble is that the ARP packets are not being picked up by my DPDK > application, I see them being delivered to the kernel via the eth interface > corresponding to the port (MLX is a bifurcated driver, you don't really > bind to the NIC, so you still see the eth interfaces at linux level and can > run tcpdump on those, I see ARP packets in the tcpdump there on the > interface) > I can receive UDP packets in my DPDK app though. > > My application is not setting any rte_flow rules etc. so I was expecting > that by default my dpdk app would get all the packets as is normally the > case with other NIC's > Is there something I need to configure for Mellanox NIC somewhere such that > I get 'all' the packets including ARP packets in my DPDK app ? > > Regards > -Prashant The Mellanox device in Azure networking cards is only used as a VF switch. You can go back to earlier DPDK presentations for more detail. Three reason bifurcation won't work. 1. Only some of the packets arrive on the VF. All non-IP show up on the synthetic device. The VF is only used after the TCP three way handshake. 2. The Netvsc PMD doesn't handle flow rules. 3. The VF can be removed and restored any time (by hypervisor) it is not a stable entity.
[PATCH v3 1/1] bus/pci: introduce get_iova_mode for pci dev
I propose this patch for DPDK to enable coexistence between DPDK and kernel drivers for regular NICs.This solution requires adding a new pci_ops in rte_pci_driver, through which DPDK will retrieve the required IOVA mode from the vendor driver. This mechanism is necessary to handle different IOMMU configurations and operating modes. Below is a detailed analysis of various scenarios: 1. When IOMMU is enabled: 1.1 With PT (Pass-Through) enabled: In this case, the domain type is IOMMU_DOMAIN_IDENTITY, which prevents vendor drivers from setting IOVA->PA mapping tables. Therefore, DPDK must use PA mode. To achieve this: The vendor kernel driver will register a character device (cdev) to communicate with DPDK. This cdev handles device operations (open, mmap, etc.) and ultimately programs the hardware registers. 1.2 With PT disabled: Here, the vendor driver doesn't enforce specific IOVA mode requirements. Our implementation will: Integrate a mediated device (mdev) in the vendor driver. This mdev interacts with DPDK and manages IOVA->PA mapping configurations. 2. When IOMMU is disabled: The vendor driver mandates PA mode (consistent with DPDK's PA mode requirement in this scenario). A character device (cdev) will similarly be registered for DPDK communication. Summary: The solution leverages multiple technologies: mdev for IOVA management when IOMMU is partially enabled. VFIO for device passthrough operations. cdev for register programming coordination. A new pci_ops interface in DPDK to dynamically determine IOVA modes. This architecture enables clean coexistence by establishing standardized communication channels between DPDK and vendor drivers across different IOMMU configurations. Motivation for the Patch: This patch is introduced to prepare for the upcoming open-source contribution of our NebulaMatrix SNIC driver to DPDK. We aim to ensure that our SNIC can seamlessly coexist with kernel drivers using this mechanism. By adopting the proposed architecture—leveraging dynamic IOVA mode negotiation via pci_ops, mediated devices (mdev), and character device (cdev) interactions—we enable our SNIC to operate in hybrid environments here both DPDK and kernel drivers may manage the same hardware. This design aligns with DPDK’s scalability goals and ensures compatibility across diverse IOMMU configurations, which is critical for real-world deployment scenarios. Signed-off-by: Kyo Liu --- .mailmap | 2 ++ doc/guides/rel_notes/release_25_07.rst | 5 + drivers/bus/pci/bus_pci_driver.h | 11 +++ drivers/bus/pci/linux/pci.c| 2 ++ 4 files changed, 20 insertions(+) diff --git a/.mailmap b/.mailmap index d8439b79ce..509ff9a16f 100644 --- a/.mailmap +++ b/.mailmap @@ -78,6 +78,7 @@ Allen Hubbe Alok Makhariya Alok Prasad Alvaro Karsz +Alvin Wang Alvin Zhang Aman Singh Amaranath Somalapuram @@ -829,6 +830,7 @@ Kumar Amber Kumara Parameshwaran Kumar Sanghvi Kyle Larose +Kyo Liu Lance Richardson Laszlo Ersek Laura Stroe diff --git a/doc/guides/rel_notes/release_25_07.rst b/doc/guides/rel_notes/release_25_07.rst index 093b85d206..7722d338b2 100644 --- a/doc/guides/rel_notes/release_25_07.rst +++ b/doc/guides/rel_notes/release_25_07.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added get_iova_mode for rte_pci_driver.** + + Introduce `pci_get_iova_mode` rte_pci_ops for `pci_get_iova_mode` + to PCI bus so that PCI drivers could get their wanted iova_mode + Removed Items - diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h index 2cc1119072..e5e36b0a5c 100644 --- a/drivers/bus/pci/bus_pci_driver.h +++ b/drivers/bus/pci/bus_pci_driver.h @@ -125,6 +125,16 @@ typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr, typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr, uint64_t iova, size_t len); +/** + * retrieve the required IOVA mode from the vendor driver + * + * @param dev + * Pointer to the PCI device. + * @return + * - rte_iova_mode + */ +typedef int (pci_get_iova_mode)(const struct rte_pci_device *pdev); + /** * A structure describing a PCI driver. */ @@ -136,6 +146,7 @@ struct rte_pci_driver { pci_dma_map_t *dma_map;/**< device dma map function. */ pci_dma_unmap_t *dma_unmap;/**< device dma unmap function. */ const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */ + pci_get_iova_mode *get_iova_mode; /**< Device get iova_mode function */ uint32_t drv_flags;/**< Flags RTE_PCI_DRV_*. */ }; diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index c20d159218..fd69a02989 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -624,6 +624,8 @@ pci_device_iova_mode(const struct
Re: [PATCH 02/13] net/sxe: add ethdev probe and remove
On Thu, 24 Apr 2025 19:36:41 -0700 Jie Liu wrote: > diff --git a/drivers/net/sxe/Makefile b/drivers/net/sxe/Makefile > new file mode 100644 > index 00..3d4e6f0a1c > --- /dev/null > +++ b/drivers/net/sxe/Makefile > @@ -0,0 +1,67 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2010-2016 Intel Corporation > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB = librte_pmd_sxe.a > + > +CFLAGS += -DALLOW_EXPERIMENTAL_API > +CFLAGS += -DSXE_DPDK > +CFLAGS += -DSXE_HOST_DRIVER > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > + > +EXPORT_MAP := rte_pmd_sxe_version.map > + NAK No Makefile's in current DPDK
Re: [PATCH 02/13] net/sxe: add ethdev probe and remove
On Thu, 24 Apr 2025 19:36:41 -0700 Jie Liu wrote: > diff --git a/drivers/net/sxe/base/sxe_logs.h b/drivers/net/sxe/base/sxe_logs.h > new file mode 100644 > index 00..e90b563eac > --- /dev/null > +++ b/drivers/net/sxe/base/sxe_logs.h > @@ -0,0 +1,273 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright (C), 2022, Linkdata Technology Co., Ltd. > + */ > + > +#ifndef _SXE_LOGS_H_ > +#define _SXE_LOGS_H_ > + > +#include > +#include > +#include > + > +#include "sxe_types.h" > + > +#define LOG_FILE_NAME_LEN 256 > +#define LOG_FILE_PATH "/var/log/" > +#define LOG_FILE_PREFIX "sxepmd.log" > + > +extern s32 sxe_log_init; > +extern s32 sxe_log_rx; > +extern s32 sxe_log_tx; > +extern s32 sxe_log_drv; > +extern s32 sxe_log_hw; > + > +#define RTE_LOGTYPE_sxe_log_init sxe_log_init > +#define RTE_LOGTYPE_sxe_log_rx sxe_log_rx > +#define RTE_LOGTYPE_sxe_log_tx sxe_log_tx > +#define RTE_LOGTYPE_sxe_log_drv sxe_log_drv > +#define RTE_LOGTYPE_sxe_log_hw sxe_log_hw > + > +#define INIT sxe_log_init > +#define RX sxe_log_rx > +#define TX sxe_log_tx > +#define HW sxe_log_hw > +#define DRV sxe_log_drv > + > +#define UNUSED(x)((void)(x)) > + > +#define TIME(log_time) \ > + do { \ > + struct timeval tv; \ > + struct tm *td; \ > + gettimeofday(&tv, NULL); \ > + td = localtime(&tv.tv_sec); \ > + strftime(log_time, sizeof(log_time), "%Y-%m-%d-%H:%M:%S", td); \ > + } while (0) > + > +#define filename_printf(x) (strrchr((x), '/') ? strrchr((x), '/') + 1 : (x)) > + > +#ifdef SXE_DPDK_DEBUG > +#define PMD_LOG_DEBUG(logtype, ...) \ NAK Not carrying custom backport code in the upstream tree. This driver is abusing the idea behind base/ code. In DPDK base directory is intended for code shared between multiple platforms. I.e some vendors support DPDK, BSD, VmWare, and even Linux with common code. The base directory is not intended as a backport hook.
Re: [PATCH 02/13] net/sxe: add ethdev probe and remove
On Thu, 24 Apr 2025 19:36:41 -0700 Jie Liu wrote: > From: JieLiu > > Add basic modules: logs、 hardware communication、common components > and support basic PCIe ethdev probe and remove. > > Signed-off-by: Jie Liu > --- This and other patches are getting reports of sign off mismatch. WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email name mismatch: 'From: JieLiu ' != 'Signed-off-by: Jie Liu '
Re: [PATCH 02/13] net/sxe: add ethdev probe and remove
On Thu, 24 Apr 2025 19:36:41 -0700 Jie Liu wrote: > diff --git a/drivers/net/meson.build b/drivers/net/meson.build > index 460eb69e5b..429e6b17eb 100644 > --- a/drivers/net/meson.build > +++ b/drivers/net/meson.build > @@ -64,6 +64,7 @@ drivers = [ > 'vmxnet3', > 'xsc', > 'zxdh', > + 'sxe', > ] Put new drivers in with same indentation and keep list in alpha order.
Re: [PATCH 01/13] net/sxe: add base driver directory and doc
On Thu, 24 Apr 2025 19:36:40 -0700 Jie Liu wrote: > From: JieLiu > > Adding a minimum maintainable directory structure for the > network driver and request maintenance of the sxe driver. > > Signed-off-by: Jie Liu Cross build for windows fails: DPDK 25.07.0-rc0 User defined options Cross files: config/x86/cross-mingw examples : helloworld Found ninja-1.12.1 at /usr/bin/ninja FAILED: drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_main.c.obj x86_64-w64-mingw32-gcc -Idrivers/libtmp_rte_net_sxe.a.p -Idrivers -I../drivers -Idrivers/net/sxe -I../drivers/net/sxe -I../drivers/net/sxe/base -I../drivers/net/sxe/pf -I../drivers/net/sxe/vf -I../drivers/net/sxe/include/sxe -I../drivers/net/sxe/include -Ilib/ethdev -I../lib/ethdev -Ilib/eal/common -I../lib/eal/common -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/windows/include -I../lib/eal/windows/include -Ilib/eal/x86/include -I../lib/eal/x86/include -Ilib/eal -I../lib/eal -Ilib/log -I../lib/log -Ilib/kvargs -I../lib/kvargs -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/windows -Ilib/pci -I../lib/pci -Idrivers/bus/vdev -I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -include rte_config.h -Wvla -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers -D_GNU_SOURCE -D_WIN32_WINNT=0x0A00 -D__USE_MINGW_ANSI_STDIO -march=native -mrtm -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation -Wno-address-of-packed-member -DSXE_DPDK -DSXE_HOST_DRIVER -DSXE_DPDK_L4_FEATURES -DSXE_DPDK_SRIOV -DSXE_DPDK_SIMD -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.sxe -MD -MQ drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_main.c.obj -MF drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_main.c.obj.d -o drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_main.c.obj -c ../drivers/net/sxe/pf/sxe_main.c In file included from ../drivers/net/sxe/base/sxe_hw.h:13, from ../drivers/net/sxe/pf/sxe_main.c:30: ../drivers/net/sxe/base/sxe_compat_platform.h:32:9: warning: "min" redefined 32 | #define min(a, b) RTE_MIN(a, b) | ^~~ In file included from /usr/share/mingw-w64/include/windef.h:9, from /usr/share/mingw-w64/include/windows.h:69, from ../lib/eal/windows/include/rte_windows.h:28, from ../lib/eal/windows/include/pthread.h:17, from ../lib/ethdev/ethdev_driver.h:17, from ../drivers/net/sxe/pf/sxe_main.c:23: /usr/share/mingw-w64/include/minwindef.h:177:9: note: this is the location of the previous definition 177 | #define min(a, b) (((a) < (b)) ? (a) : (b)) | ^~~ In file included from ../drivers/net/sxe/base/sxe_hw.h:16: ../drivers/net/sxe/sxe_drv_type.h:21:14: error: conflicting types for ‘BOOL’; have ‘_Bool’ 21 | typedef bool BOOL; | ^~~~ /usr/share/mingw-w64/include/minwindef.h:131:15: note: previous declaration of ‘BOOL’ with type ‘BOOL’ {aka ‘int’} 131 | typedef int BOOL; | ^~~~ In file included from ../drivers/net/sxe/pf/sxe.h:16, from ../drivers/net/sxe/pf/sxe_ethdev.h:8, from ../drivers/net/sxe/pf/sxe_main.c:31: ../drivers/net/sxe/pf/sxe_stats.h:52:47: error: unknown type name ‘ulong’; did you mean ‘u_long’? 52 | const ulong *ids, | ^ | u_long ../drivers/net/sxe/pf/sxe_stats.h:53:41: error: unknown type name ‘ulong’; did you mean ‘u_long’? 53 | ulong *values, u32 usr_cnt); | ^ | u_long ../drivers/net/sxe/pf/sxe_stats.h:62:15: error: unknown type name ‘ulong’; did you mean ‘u_long’? 62 | const ulong *ids, | ^ | u_long FAILED: drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_filter.c.obj x86_64-w64-mingw32-gcc -Idrivers/libtmp_rte_net_sxe.a.p -Idrivers -I../drivers -Idrivers/net/sxe -I../drivers/net/sxe -I../drivers/net/sxe/base -I../drivers/net/sxe/pf -I../drivers/net/sxe/vf -I../drivers/net/sxe/include/sxe -I../drivers/net/sxe/include -Ilib/ethdev -I../lib/ethdev -Ilib/eal/common -I../lib/eal/common -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/windows/include -I../lib/eal/windows/inc
Re: [PATCH 01/13] net/sxe: add base driver directory and doc
On Thu, 24 Apr 2025 19:36:40 -0700 Jie Liu wrote: > From: JieLiu > > Adding a minimum maintainable directory structure for the > network driver and request maintenance of the sxe driver. > > Signed-off-by: Jie Liu Build test failed for 32 bit x86 DPDK 25.07.0-rc0 User defined options c_args : -m32 c_link_args: -m32 Found ninja-1.12.1 at /usr/bin/ninja FAILED: drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_ethdev.c.o cc -Idrivers/libtmp_rte_net_sxe.a.p -Idrivers -I../drivers -Idrivers/net/sxe -I../drivers/net/sxe -I../drivers/net/sxe/base -I../drivers/net/sxe/pf -I../drivers/net/sxe/vf -I../drivers/net/sxe/include/sxe -I../drivers/net/sxe/include -Ilib/ethdev -I../lib/ethdev -Ilib/eal/common -I../lib/eal/common -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -I../kernel/linux -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/meter -I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vdev -I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu -I/usr/include/i386-linux-gnu -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -include rte_config.h -Wvla -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-pointer-to-int-cast -D_GNU_SOURCE -m32 -fPIC -march=native -mrtm -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation -Wno-address-of-packed-member -Wno-vla -DSXE_DPDK -DSXE_HOST_DRIVER -DSXE_DPDK_L4_FEATURES -DSXE_DPDK_SRIOV -DSXE_DPDK_SIMD -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.sxe -MD -MQ drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_ethdev.c.o -MF drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_ethdev.c.o.d -o drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_ethdev.c.o -c ../drivers/net/sxe/pf/sxe_ethdev.c ../drivers/net/sxe/pf/sxe_ethdev.c: In function ‘sxe_dev_stop’: ../drivers/net/sxe/pf/sxe_ethdev.c:342:9: warning: value computed is not used [-Wunused-value] 342 | rte_atomic_exchange_explicit(&adapter->is_stopping, 1, rte_memory_order_seq_cst); ../drivers/net/sxe/pf/sxe_ethdev.c: At top level: ../drivers/net/sxe/pf/sxe_ethdev.c:740:35: error: initialization of ‘int (*)(struct rte_eth_dev *, const uint64_t *, uint64_t *, unsigned int)’ {aka ‘int (*)(struct rte_eth_dev *, const long long unsigned int *, long long unsigned int *, unsigned int)’} from incompatible pointer type ‘s32 (*)(struct rte_eth_dev *, const ulong *, ulong *, u32)’ {aka ‘int (*)(struct rte_eth_dev *, const long unsigned int *, long unsigned int *, unsigned int)’} [-Wincompatible-pointer-types] 740 | .xstats_get_by_id = sxe_xstats_get_by_id, | ^~~~ ../drivers/net/sxe/pf/sxe_ethdev.c:740:35: note: (near initialization for ‘sxe_eth_dev_ops.xstats_get_by_id’) ../drivers/net/sxe/pf/sxe_ethdev.c:742:35: error: initialization of ‘int (*)(struct rte_eth_dev *, const uint64_t *, struct rte_eth_xstat_name *, unsigned int)’ {aka ‘int (*)(struct rte_eth_dev *, const long long unsigned int *, struct rte_eth_xstat_name *, unsigned int)’} from incompatible pointer type ‘s32 (*)(struct rte_eth_dev *, const ulong *, struct rte_eth_xstat_name *, u32)’ {aka ‘int (*)(struct rte_eth_dev *, const long unsigned int *, struct rte_eth_xstat_name *, unsigned int)’} [-Wincompatible-pointer-types] 742 | .xstats_get_names_by_id = sxe_xstats_names_get_by_id, | ^~ ../drivers/net/sxe/pf/sxe_ethdev.c:742:35: note: (near initialization for ‘sxe_eth_dev_ops.xstats_get_names_by_id’) FAILED: drivers/libtmp_rte_net_sxe.a.p/net_sxe_pf_sxe_stats.c.o cc -Idrivers/libtmp_rte_net_sxe.a.p -Idrivers -I../drivers -Idrivers/net/sxe -I../drivers/net/sxe -I../drivers/net/sxe/base -I../drivers/net/sxe/pf -I../drivers/net/sxe/vf -I../drivers/net/sxe/include/sxe -I../drivers/net/sxe/include -Ilib/ethdev -I../lib/ethdev -Ilib/eal/common -I../lib/eal/common -I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -I../kernel/linux -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/me
[PATCH v2 1/1] bus/pci: introduce get_iova_mode for pci dev
I propose this patch for DPDK to enable coexistence between DPDK and kernel drivers for regular NICs.This solution requires adding a new pci_ops in rte_pci_driver, through which DPDK will retrieve the required IOVA mode from the vendor driver. This mechanism is necessary to handle different IOMMU configurations and operating modes. Below is a detailed analysis of various scenarios: 1. When IOMMU is enabled: 1.1 With PT (Pass-Through) enabled: In this case, the domain type is IOMMU_DOMAIN_IDENTITY, which prevents vendor drivers from setting IOVA->PA mapping tables. Therefore, DPDK must use PA mode. To achieve this: The vendor kernel driver will register a character device (cdev) to communicate with DPDK. This cdev handles device operations (open, mmap, etc.) and ultimately programs the hardware registers. 1.2 With PT disabled: Here, the vendor driver doesn't enforce specific IOVA mode requirements. Our implementation will: Integrate a mediated device (mdev) in the vendor driver. This mdev interacts with DPDK and manages IOVA->PA mapping configurations. 2. When IOMMU is disabled: The vendor driver mandates PA mode (consistent with DPDK's PA mode requirement in this scenario). A character device (cdev) will similarly be registered for DPDK communication. Summary: The solution leverages multiple technologies: mdev for IOVA management when IOMMU is partially enabled. VFIO for device passthrough operations. cdev for register programming coordination. A new pci_ops interface in DPDK to dynamically determine IOVA modes. This architecture enables clean coexistence by establishing standardized communication channels between DPDK and vendor drivers across different IOMMU configurations. Motivation for the Patch: This patch is introduced to prepare for the upcoming open-source contribution of our NebulaMatrix SNIC driver to DPDK. We aim to ensure that our SNIC can seamlessly coexist with kernel drivers using this mechanism. By adopting the proposed architecture—leveraging dynamic IOVA mode negotiation via pci_ops, mediated devices (mdev), and character device (cdev) interactions—we enable our SNIC to operate in hybrid environments here both DPDK and kernel drivers may manage the same hardware. This design aligns with DPDK’s scalability goals and ensures compatibility across diverse IOMMU configurations, which is critical for real-world deployment scenarios. Signed-off-by: Kyo Liu --- .mailmap | 2 ++ doc/guides/rel_notes/release_25_07.rst | 4 drivers/bus/pci/bus_pci_driver.h | 11 +++ drivers/bus/pci/linux/pci.c| 2 ++ 4 files changed, 19 insertions(+) diff --git a/.mailmap b/.mailmap index d8439b79ce..509ff9a16f 100644 --- a/.mailmap +++ b/.mailmap @@ -78,6 +78,7 @@ Allen Hubbe Alok Makhariya Alok Prasad Alvaro Karsz +Alvin Wang Alvin Zhang Aman Singh Amaranath Somalapuram @@ -829,6 +830,7 @@ Kumar Amber Kumara Parameshwaran Kumar Sanghvi Kyle Larose +Kyo Liu Lance Richardson Laszlo Ersek Laura Stroe diff --git a/doc/guides/rel_notes/release_25_07.rst b/doc/guides/rel_notes/release_25_07.rst index 093b85d206..e220b3883f 100644 --- a/doc/guides/rel_notes/release_25_07.rst +++ b/doc/guides/rel_notes/release_25_07.rst @@ -54,6 +54,10 @@ New Features This section is a comment. Do not overwrite or remove it. Also, make sure to start the actual text at the margin. === +* **Added get_iova_mode for rte_pci_driver.** + + Introduce `pci_get_iova_mode` rte_pci_ops for `pci_get_iova_mode` + to PCI bus so that PCI drivers could get their wanted iova_mode Removed Items diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h index 2cc1119072..e5e36b0a5c 100644 --- a/drivers/bus/pci/bus_pci_driver.h +++ b/drivers/bus/pci/bus_pci_driver.h @@ -125,6 +125,16 @@ typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr, typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr, uint64_t iova, size_t len); +/** + * retrieve the required IOVA mode from the vendor driver + * + * @param dev + * Pointer to the PCI device. + * @return + * - rte_iova_mode + */ +typedef int (pci_get_iova_mode)(const struct rte_pci_device *pdev); + /** * A structure describing a PCI driver. */ @@ -136,6 +146,7 @@ struct rte_pci_driver { pci_dma_map_t *dma_map;/**< device dma map function. */ pci_dma_unmap_t *dma_unmap;/**< device dma unmap function. */ const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */ + pci_get_iova_mode *get_iova_mode; /**< Device get iova_mode function */ uint32_t drv_flags;/**< Flags RTE_PCI_DRV_*. */ }; diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index c20d159218..fd69a02989 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -624