Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Petr Oros Tue, 03 Feb 2026 03:33:04 -0800


On 2/3/26 11:19, Przemek Kitszel wrote:

On 2/3/26 09:44, Petr Oros wrote:

On 2/3/26 02:00, Jacob Keller wrote:
On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
+    netdev_unlock(netdev);
+    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
+ !iavf_is_reset_in_progress(adapter),
+                           msecs_to_jiffies(5000));
+    netdev_lock(netdev);
Dropping locks taken by the core around the driver callback
is obviously unacceptable. SMH.
Right. It seems like the correct fix is to either a) have reset takeand hold the netdev lock (now that its distinct from the global RTNLlock) or b) refactor reset so that it can defer any of the netdevrelated stuff somehow.
I modeled this after the existing pattern in iavf_close() (ndo_stop),which also temporarily releases the netdev instance lock taken by thecore to wait for an async operation to complete:


First of all, thank you for working on that, I was hit by the very same
problem (no series yet), but my local fix is the same as of now.

I don't see an easy fix (w/o substantial driver refactor).


static int iavf_close(struct net_device *netdev)
{
         netdev_assert_locked(netdev);
         ...
         iavf_down(adapter);
         iavf_change_state(adapter, __IAVF_DOWN_PENDING);
         iavf_free_traffic_irqs(adapter);

         netdev_unlock(netdev);

         status = wait_event_timeout(adapter->down_waitqueue,
                                     adapter->state == __IAVF_DOWN,
                                     msecs_to_jiffies(500));
         if (!status)

netdev_warn(netdev, "Device resources not yetreleased\n");

         netdev_lock(netdev);
         ...
}

This was introduced by commit 120f28a6f314fe ("iavf: get rid of thecrit lock"), and ndo_stop is called with netdev instance lock held bythe core just like ndo_change_mtu is.


technically it was introduced by commmit afc664987ab3 ("eth: iavf:
extend the netdev_lock usage")

Could you clarify why the unlock-wait- lock pattern is acceptable inndo_stop but not here?


perhaps just closing netdev is a special kind of operation

Other thing is that the lock was added to allow further NAPI
development, and one silly driver should not stop that effort.
Sadly, we have not managed to re-design the driver yet. I would like to
do so personally, but have much work accumulated/pending to free my time

I agree, the unlock-wait-lock pattern is fundamentally flawed (I nowunderstand

why it is unacceptable) and should be avoided.

What can we do now?

* Eliminating the wait is not an option: As noted in the description ofcommit

c2ed2403f12c, this wait was originally added to fix a race condition where
adding an interface to bonding failed because the device remained in
__RESETTING state after the callback returned.

* Passing the lock into reset is impractical: The reset path istriggered from

numerous contexts, many of which are not under the netdev_lock, making this
even more complex than a full refactor.

If dropping the lock is a no-go, the only viable path forward is tosplit the

reset_task so that the waiting portion is decoupled from the netdev_lock
critical section.

The fact remains that MTU configuration and ring parameter changes are
currently broken in iavf. Changing the MTU on a Virtual Function is a

fundamental configuration not an obscure edge case that can remainnon-functional.


I would appreciate any further guidance on how you would prefer...

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Reply via email to