On 2/3/26 02:00, Jacob Keller wrote:
On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
On Mon, 2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
+ netdev_unlock(netdev);
+ ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
+ !iavf_is_reset_in_progress(adapter),
+ msecs_to_jiffies(5000));
+ netdev_lock(netdev);
Dropping locks taken by the core around the driver callback
is obviously unacceptable. SMH.
Right. It seems like the correct fix is to either a) have reset take
and hold the netdev lock (now that its distinct from the global RTNL
lock) or b) refactor reset so that it can defer any of the netdev
related stuff somehow.
I modeled this after the existing pattern in iavf_close() (ndo_stop),
which also temporarily releases the netdev instance lock taken by the
core to wait for an async operation to complete:
static int iavf_close(struct net_device *netdev)
{
netdev_assert_locked(netdev);
...
iavf_down(adapter);
iavf_change_state(adapter, __IAVF_DOWN_PENDING);
iavf_free_traffic_irqs(adapter);
netdev_unlock(netdev);
status = wait_event_timeout(adapter->down_waitqueue,
adapter->state == __IAVF_DOWN,
msecs_to_jiffies(500));
if (!status)
netdev_warn(netdev, "Device resources not yet released\n");
netdev_lock(netdev);
...
}
This was introduced by commit 120f28a6f314fe ("iavf: get rid of the crit
lock"), and ndo_stop is called with netdev instance lock held by the
core just like ndo_change_mtu is. Could you clarify why the
unlock-wait-lock pattern is acceptable in ndo_stop but not here?