Our developer still thinks the race condition would be elsewhere.
In both i40evf_reset_task and i40evf_open, i40evf_configure is being called and
i40evf_configure calls i40evf_set_rx_mode.
Then in 40evf_set_rx_mode, there is the while loop to check
__I40EVF_IN_CRITICAL_TASK.
while (test_and_set_bit(__I40EVF_IN_CRITICAL_TASK,
&adapter->crit_section))
dev_err(&adapter->pdev->dev, "Failed to get lock in
%s\n", __func__);
i40evf_up_complete is being called after the call of i40evf_configure.
Todd Fujinaka
Software Application Engineer
Datacenter Engineering Group
Intel Corporation
[email protected]
(503) 712-4565
-----Original Message-----
From: Codrut Grosu [mailto:[email protected]]
Sent: Thursday, August 3, 2017 7:12 AM
To: Alexander Duyck <[email protected]>
Cc: [email protected]; Tudor Cornea <[email protected]>
Subject: Re: [E1000-devel] Possible bug in the i40evf driver
Hi Alex,
I apologize for my mistake and for being unclear. I meant "ndo_stop".
You are right that they should be called with the RTNL lock held. And we were
doing that. What is probably wrong is that we were not calling
clear_bit(__LINK_STATE_START, &dev->state); This only happens if we
call dev_close.
If this bit is not cleared, netif_running will return true in the reset task.
Then napi_disable will be called again. So it's not about the RTNL lock but
about the __LINK_STATE_START bit.
I agree that calling ndo_stop directly is probably wrong. And so far this is
the only way we can make the driver hang.
I still think that reset and open could race. As I said I can't reproduce
this. But the second call to netif_running and napi_enable in the reset task is
not protected by __I40EVF_IN_CRITICAL_TASK. And the call in i40evf_open to
i40evf_up_complete is not protected either.
Best wishes,
Codrut
-----Original Message-----
From: Alexander Duyck [mailto:[email protected]]
Sent: Thursday, August 03, 2017 4:08 AM
To: Codrut Grosu <[email protected]>
Cc: [email protected]
Subject: Re: [E1000-devel] Possible bug in the i40evf driver
On Tue, Aug 1, 2017 at 8:29 AM, Codrut Grosu <[email protected]> wrote:
> Hi,
>
> We believe there might be a bug in the i40evf driver.
>
> We think that there is race issue between i40evf_reset_task and i40evf_down
> / i40evf_open.
> The reason is that the functions napi_enable / napi_disable must be
> called in pairs in order not to loop indefinitely (or crash).
>
> Consider the following:
>
> ifconfig eth1 mtu 1000 & ifconfig eth1 up
>
> What happens now is that the change of mtu schedules a reset. Suppose
> the reset task starts, and the first call to netif_running (after
> continue_reset) returns false. Before the thread reaches the second
> call to netif_running, i40evf_open starts in another thread. Then the second
> netif_running in reset will return true, and we will have 2 consecutive calls
> of napi_enable.
>
> We could not reproduce this particular situation in practice (probably due to
> the short timing).
> However, we did hang the driver using a call to ndo_close() followed
> quickly by "ethtool -G eth1 rx 4096". In this case netif_running will
> return true always (as we bypassed the call to dev_close), the reset
> will be scheduled before the interface finishes going down, and 2 calls to
> napi_disable will happen.
So this last statement isn't exactly clear. There isn't an ndo_close() in the
kernel last I knew. There is an ndo_stop() and an i40evf_close(), but there
isn't an ndo_close(). Can you clarify which one of these you were calling?
I ask because ndo_stop() and i40evf_close() should only be called with the RTNL
lock held. It sounds like you may not be doing that and that could cause a
collsion with something like an "ethtool -G" command because the ethtool ioctl
is taking the RTNL lock to avoid such collisions and if your call that is
getting into i40evf_close() isn't holding the RTNL then that might explain the
issue.
- Alex
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech
sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired