RE: [Intel-wired-lan] [PATCH v2] igb: reinit_locked() should be called with rtnl_lock

2020-07-28 Thread Brown, Aaron F
> From: Intel-wired-lan  On Behalf Of
> Francesco Ruggeri
> Sent: Thursday, July 2, 2020 3:39 PM
> To: linux-kernel@vger.kernel.org; net...@vger.kernel.org; intel-wired-
> l...@lists.osuosl.org; k...@kernel.org; da...@davemloft.net; Kirsher, Jeffrey
> T ; frugg...@arista.com
> Subject: [Intel-wired-lan] [PATCH v2] igb: reinit_locked() should be called 
> with
> rtnl_lock
> 
> We observed two panics involving races with igb_reset_task.
> The first panic is caused by this race condition:
> 
>   kworker reboot -f
> 
>   igb_reset_task
>   igb_reinit_locked
>   igb_down
>   napi_synchronize
>   __igb_shutdown
>   igb_clear_interrupt_scheme
>   igb_free_q_vectors
>   igb_free_q_vector
>   adapter->q_vector[v_idx] = NULL;
>   napi_disable
>   Panics trying to access
>   adapter->q_vector[v_idx].napi_state
> 
> The second panic (a divide error) is caused by this race:
> 
> kworker   reboot -f   tx packet
> 
> igb_reset_task
>   __igb_shutdown
>   rtnl_lock()
>   ...
>   igb_clear_interrupt_scheme
>   igb_free_q_vectors
>   adapter->num_tx_queues = 0
>   ...
>   rtnl_unlock()
> rtnl_lock()
> igb_reinit_locked
> igb_down
> igb_up
> netif_tx_start_all_queues
>   dev_hard_start_xmit
>   igb_xmit_frame
>   igb_tx_queue_mapping
>   Panics on
>   r_idx % adapter->num_tx_queues
> 
> This commit applies to igb_reset_task the same changes that
> were applied to ixgbe in commit 2f90b8657ec9 ("ixgbe: this patch
> adds support for DCB to the kernel and ixgbe driver"),
> commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
> rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
> reset subtask").
> 
> v2: add fix for second race condition above.
> 
> Signed-off-by: Francesco Ruggeri 
> 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 9 +
>  1 file changed, 9 insertions(+)
Tested-by: Aaron Brown 


[PATCH v2] igb: reinit_locked() should be called with rtnl_lock

2020-07-02 Thread Francesco Ruggeri
We observed two panics involving races with igb_reset_task.
The first panic is caused by this race condition:

kworker reboot -f

igb_reset_task
igb_reinit_locked
igb_down
napi_synchronize
__igb_shutdown
igb_clear_interrupt_scheme
igb_free_q_vectors
igb_free_q_vector
adapter->q_vector[v_idx] = NULL;
napi_disable
Panics trying to access
adapter->q_vector[v_idx].napi_state

The second panic (a divide error) is caused by this race:

kworker reboot -f   tx packet

igb_reset_task
__igb_shutdown
rtnl_lock()
...
igb_clear_interrupt_scheme
igb_free_q_vectors
adapter->num_tx_queues = 0
...
rtnl_unlock()
rtnl_lock()
igb_reinit_locked
igb_down
igb_up
netif_tx_start_all_queues
dev_hard_start_xmit
igb_xmit_frame
igb_tx_queue_mapping
Panics on
r_idx % adapter->num_tx_queues

This commit applies to igb_reset_task the same changes that
were applied to ixgbe in commit 2f90b8657ec9 ("ixgbe: this patch
adds support for DCB to the kernel and ixgbe driver"),
commit 8f4c5c9fb87a ("ixgbe: reinit_locked() should be called with
rtnl_lock") and commit 88adce4ea8f9 ("ixgbe: fix possible race in
reset subtask").

v2: add fix for second race condition above.

Signed-off-by: Francesco Ruggeri 

---
 drivers/net/ethernet/intel/igb/igb_main.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 8bb3db2cbd41..6e5861bfb0fa 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6224,9 +6224,18 @@ static void igb_reset_task(struct work_struct *work)
struct igb_adapter *adapter;
adapter = container_of(work, struct igb_adapter, reset_task);
 
+   rtnl_lock();
+   /* If we're already down or resetting, just bail */
+   if (test_bit(__IGB_DOWN, &adapter->state) ||
+   test_bit(__IGB_RESETTING, &adapter->state)) {
+   rtnl_unlock();
+   return;
+   }
+
igb_dump(adapter);
netdev_err(adapter->netdev, "Reset adapter\n");
igb_reinit_locked(adapter);
+   rtnl_unlock();
 }
 
 /**