Dear Li,
Thank you for your patch.
Am 07.01.26 um 02:04 schrieb Li Li via Intel-wired-lan:
When an idpf HW reset is triggered, it clears the vport but does
not clear the netdev held by vport:
// In idpf_vport_dealloc() called by idpf_init_hard_reset(),
// idpf_init_hard_reset() sets IDPF_HR_RESET_IN_PROG, so
// idpf_decfg_netdev() doesn't get called.
No need to format this as code comments. At least it confused me a little.
if (!test_bit(IDPF_HR_RESET_IN_PROG, adapter->flags))
idpf_decfg_netdev(vport);
// idpf_decfg_netdev() would clear netdev but it isn't called:
unregister_netdev(vport->netdev);
free_netdev(vport->netdev);
vport->netdev = NULL;
// Later in idpf_init_hard_reset(), the vport is cleared:
kfree(adapter->vports);
adapter->vports = NULL;
During an idpf HW reset, when "ethtool -g/-G" is called on the netdev,
the vport associated with the netdev is NULL, and so a kernel panic
would happen:
[ 513.185327] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[ 513.232756] RIP: 0010:idpf_get_ringparam+0x45/0x80
This can be reproduced reliably by injecting a TX timeout to cause
an idpf HW reset, and injecting a virtchnl error to cause the HW
reset to fail and retry, while calling "ethtool -g/-G" on the netdev
at the same time.
If you shared the commands, how to do that, it would make reproducing
the issue easier.
With this patch applied, we see the following error but no kernel
panics anymore:
[ 476.323630] idpf 0000:05:00.0 eth1: failed to get ring params due to no
vport in netdev
Signed-off-by: Li Li <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index d5711be0b8e69..6a4b630b786c2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -639,6 +638,10 @@ static void idpf_get_ringparam(struct net_device *netdev,
idpf_vport_ctrl_lock(netdev);
vport = idpf_netdev_to_vport(netdev);
+ if (!vport) {
+ netdev_err(netdev, "failed to get ring params due to no vport in
netdev\n");
If vport == NULL is expected, why log it as an error. What should the
user do? Wait until reset is done?
+ goto unlock;
+ }
ring->rx_max_pending = IDPF_MAX_RXQ_DESC;
ring->tx_max_pending = IDPF_MAX_TXQ_DESC;
@@ -647,6 +651,7 @@ static void idpf_get_ringparam(struct net_device *netdev,
kring->tcp_data_split = idpf_vport_get_hsplit(vport);
+unlock:
idpf_vport_ctrl_unlock(netdev);
}
@@ -673,6 +674,11 @@ static int idpf_set_ringparam(struct net_device *netdev,
idpf_vport_ctrl_lock(netdev);
vport = idpf_netdev_to_vport(netdev);
+ if (!vport) {
+ netdev_err(netdev, "ring params not changed due to no vport in
netdev\n");
+ err = -EFAULT;
+ goto unlock_mutex;
+ }
idx = vport->idx;
Is there another – possible more involved – solution possible to wait
until the hardware reset finished?
Kind regards,
Paul