Re: Re: dumpcap coredump for 82599 NIC

2024-03-15 Thread junwan...@cestc.cn
Yes, I think you are right. After adding some debug information, I can confirm 
that it's probably an initialization issue with the ixgbe driver. 
Secondary processes should initialize some callback functions, but they seem to 
be missing. 

I made some minor modifications by moving the ixgbe_init_shared_code(hw) 
position before the secondary processes.
While this brought about some changes, there still occurred a core dump.
I suspect there might be other issues or that such modification might not be 
appropriate.

[root@xc03-compute3 /]# /dpdk/app/dpdk-dumpcap -i :18:00.0
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device :3b:00.0 cannot be used
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device :3b:00.1 cannot be used
File: /tmp/dpdk-dumpcap_0_:18:00.0_20240314091910.pcapng
Capturing on ':18:00.0'
Packets captured: 2 Primary process is no longer active, exiting...
EAL: Fail to recv reply for request /var/run/dpdk/rte/mp_socket:mp_pdump
pdump_prepare_client_request(): client request for pdump enable/disable failed
Floating point exception (core dumped)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 
d6cf00317e77b64f9822c155115f388ae62241eb..0bf885d7eaba3689fb9b98cdcaa6a928aa787985
 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1104,6 +1104,24 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void 
*init_params __rte_unused)
eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
+   /* Vendor and Device ID need to be set before init of shared code */
+   hw->device_id = pci_dev->id.device_id;
+   hw->vendor_id = pci_dev->id.vendor_id;
+   hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+   hw->allow_unsupported_sfp = 1;
+
+   /* Initialize the shared code (base driver) */
+#ifdef RTE_LIBRTE_IXGBE_BYPASS
+   diag = ixgbe_bypass_init_shared_code(hw);
+#else
+   diag = ixgbe_init_shared_code(hw);
+#endif /* RTE_LIBRTE_IXGBE_BYPASS */
+
+   if (diag != IXGBE_SUCCESS) {
+   PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
+   return -EIO;
+   }
+
/*
 * For secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -1135,24 +1153,6 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void 
*init_params __rte_unused)
rte_eth_copy_pci_info(eth_dev, pci_dev);
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 
-   /* Vendor and Device ID need to be set before init of shared code */
-   hw->device_id = pci_dev->id.device_id;
-   hw->vendor_id = pci_dev->id.vendor_id;
-   hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-   hw->allow_unsupported_sfp = 1;
-
-   /* Initialize the shared code (base driver) */
-#ifdef RTE_LIBRTE_IXGBE_BYPASS
-   diag = ixgbe_bypass_init_shared_code(hw);
-#else
-   diag = ixgbe_init_shared_code(hw);
-#endif /* RTE_LIBRTE_IXGBE_BYPASS */
-
-   if (diag != IXGBE_SUCCESS) {
-   PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
-   return -EIO;
-   }
-
if (hw->mac.ops.fw_recovery_mode && hw->mac.ops.fw_recovery_mode(hw)) {
PMD_INIT_LOG(ERR, "\nERROR: "
"Firmware recovery mode detected. Limiting 
functionality.\n"


Additionally, I'm using a debug build, but the printed call stack still doesn't 
feel clear enough, which is quite strange. 

meson  -Dc_args="-mno-avx512f" 
-Ddisable_drivers=net/ark,net/atlantic,net/avp,net/axgbe,net/pfe,net/netvsc 
-Dmax_numa_nodes=8 -Dmax_ethports=128 --buildtype=debug --optimization=0 build 
ninja -C build install




junwan...@cestc.cn

From: Stephen Hemminger
Date: 2024-03-14 00:29
To: junwan...@cestc.cn
CC: dev
Subject: Re: dumpcap coredump for 82599 NIC
On Wed, 13 Mar 2024 10:00:17 +0800
"junwan...@cestc.cn"  wrote:

> Hi, when I use dumpcap to capture packets on the 82559 network card, coredump 
> appears. 
> The network card bound to ovs-dpdk is 82599, but when I capture packets in 
> other non-82599 network cards (mellanox CX5/C6 or Intel's E810), it is 
> normal. ,
> the dpdk version I am using is 22.11.1, but I see that the call stack is 
> strange, so I am asking you for help. 
> 
> 
> 
>  
> 
> I thought the new version of dpdk might solve it, so I upgraded the dpdk 
> version to 23.11, but the problem is still t

Re: Re: dumpcap coredump for 82599 NIC

2024-03-18 Thread junwan...@cestc.cn
   "Other link thread is running 
now!");
}
-   } else {
-   PMD_DRV_LOG(ERR,
-   "Other link thread is running now!");
}
+   return rte_eth_linkstatus_set(dev, &link);
}
-   return rte_eth_linkstatus_set(dev, &link);
}
 
link.link_status = RTE_ETH_LINK_UP;




junwan...@cestc.cn

From: junwan...@cestc.cn
Date: 2024-03-14 17:22
To: Stephen Hemminger
CC: dev
Subject: Re: Re: dumpcap coredump for 82599 NIC
Yes, I think you are right. After adding some debug information, I can confirm 
that it's probably an initialization issue with the ixgbe driver. 
Secondary processes should initialize some callback functions, but they seem to 
be missing. 

I made some minor modifications by moving the ixgbe_init_shared_code(hw) 
position before the secondary processes.
While this brought about some changes, there still occurred a core dump.
I suspect there might be other issues or that such modification might not be 
appropriate.

[root@xc03-compute3 /]# /dpdk/app/dpdk-dumpcap -i :18:00.0
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device :3b:00.0 cannot be used
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device :3b:00.1 cannot be used
File: /tmp/dpdk-dumpcap_0_:18:00.0_20240314091910.pcapng
Capturing on ':18:00.0'
Packets captured: 2 Primary process is no longer active, exiting...
EAL: Fail to recv reply for request /var/run/dpdk/rte/mp_socket:mp_pdump
pdump_prepare_client_request(): client request for pdump enable/disable failed
Floating point exception (core dumped)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 
d6cf00317e77b64f9822c155115f388ae62241eb..0bf885d7eaba3689fb9b98cdcaa6a928aa787985
 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1104,6 +1104,24 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void 
*init_params __rte_unused)
eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
+   /* Vendor and Device ID need to be set before init of shared code */
+   hw->device_id = pci_dev->id.device_id;
+   hw->vendor_id = pci_dev->id.vendor_id;
+   hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+   hw->allow_unsupported_sfp = 1;
+
+   /* Initialize the shared code (base driver) */
+#ifdef RTE_LIBRTE_IXGBE_BYPASS
+   diag = ixgbe_bypass_init_shared_code(hw);
+#else
+   diag = ixgbe_init_shared_code(hw);
+#endif /* RTE_LIBRTE_IXGBE_BYPASS */
+
+   if (diag != IXGBE_SUCCESS) {
+   PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
+   return -EIO;
+   }
+
/*
 * For secondary processes, we don't initialise any further as primary
 * has already done this work. Only check we don't need a different
@@ -1135,24 +1153,6 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void 
*init_params __rte_unused)
rte_eth_copy_pci_info(eth_dev, pci_dev);
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 
-   /* Vendor and Device ID need to be set before init of shared code */
-   hw->device_id = pci_dev->id.device_id;
-   hw->vendor_id = pci_dev->id.vendor_id;
-   hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-   hw->allow_unsupported_sfp = 1;
-
-   /* Initialize the shared code (base driver) */
-#ifdef RTE_LIBRTE_IXGBE_BYPASS
-   diag = ixgbe_bypass_init_shared_code(hw);
-#else
-   diag = ixgbe_init_shared_code(hw);
-#endif /* RTE_LIBRTE_IXGBE_BYPASS */
-
-   if (diag != IXGBE_SUCCESS) {
-   PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
-   return -EIO;
-   }
-
if (hw->mac.ops.fw_recovery_mode && hw->mac.ops.fw_recovery_mode(hw)) {
PMD_INIT_LOG(ERR, "\nERROR: "
"Firmware recovery mode detected. Limiting 
functionality.\n"


Additionally, I'm using a debug build, but the printed call stack still doesn't 
feel clear enough, which is quite strange. 

meson  -Dc_args="-mno-avx512f" 
-Ddisable_drivers=net/ark,net/atlantic,net/avp,net/axgbe,net/pfe,net/netvsc 
-Dmax_numa_nodes=8 -Dmax_ethports=128 --buildtype=debug --optimization=0 build 
ninja -C build install




junwan...@cestc.cn

From: Stephen Hemminger
Date: 2024-03-14 00:29
To: junwan...@cestc.cn
CC: dev
Subject: Re: dumpcap coredump for 82599 NIC
On Wed, 13 Mar 2024 

Re: [PATCH] net/ixgbe: do not update link status in secondary process

2024-03-20 Thread junwan...@cestc.cn
I tried this modification and it works as well.

[root@compute3 /]# /dpdk/app/dpdk-dumpcap -i :18:00.0
File: /tmp/dpdk-dumpcap_0_:18:00.0_20240321043451.pcapng
Capturing on ':18:00.0'
Packets captured: 499 ^C
Packets received/dropped on interface ':18:00.0': 499/0 (100.0)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index c61c52b2966b..86ccbdd78292 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4293,6 +4293,9 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
int wait = 1;
u32 esdp_reg;
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ return -1;
+
memset(&link, 0, sizeof(link));
link.link_status = RTE_ETH_LINK_DOWN;
link.link_speed = RTE_ETH_SPEED_NUM_NONE;




junwan...@cestc.cn

From: Stephen Hemminger
Date: 2024-03-21 01:33
To: dev
CC: junwang01; Stephen Hemminger
Subject: [PATCH] net/ixgbe: do not update link status in secondary process
The code to update link status is not safe in secondary process.
If called from secondary it will crash, example from dumpcap:
ixgbe_dev_link_update_share()
ixgbe_dev_link_update()
rte_eth_link_get()

Signed-off-by: Stephen Hemminger 
Reported-by: Jun Wang 
---
Simpler version of earlier patch, and add explanation.

drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index c61c52b2966b..86ccbdd78292 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4293,6 +4293,9 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
int wait = 1;
u32 esdp_reg;
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ return -1;
+
memset(&link, 0, sizeof(link));
link.link_status = RTE_ETH_LINK_DOWN;
link.link_speed = RTE_ETH_SPEED_NUM_NONE;
-- 
2.43.0