On 27. 11. 18 23:51, David Miller wrote:
From: Petr Oros <po...@redhat.com>
Date: Thu, 22 Nov 2018 15:24:07 +0100

@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
        struct net_device *netdev = adapter->netdev;
        int status;
- if (netif_running(netdev))
+       if (netif_running(netdev)) {
+               /* prevent netdev watchdog during tx queue destroy */
+               netif_carrier_off(netdev);
                be_close(netdev);
+       }

This will make userspace networking daemons will think that the link
went down.

This absolutely should not be a side effect of making a simple
TX queue configuration change via ethtool.


Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback. The patch that Petr sent does the practically the same thing like this one:

commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend <john.r.fastab...@intel.com>
Date:   Tue Apr 27 02:13:39 2010 +0000

    ixgbe: ixgbe_down needs to stop dev_watchdog

    There is a small race between when the tx queues are stopped
    and when netif_carrier_off() is called in ixgbe_down.  If the
    dev_watchdog() timer fires during this time it is possible for
    a false tx timeout to occur.

    This patch moves the netif_carrier_off() so that it is called before
    the tx queues are stopped preventing the dev_watchdog timer from
    detecting false tx timeouts.  The race is seen occosionally when
    FCoE or DCB settings are being configured or changed.

    Testing note, running ifconfig up/down will not reproduce this
    issue because dev_open/dev_close call dev_deactivate() and then
    dev_activate().

where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down() to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels() this way:
ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()

As I said the similar approach is used by other drivers as well.

The mlx4 driver resolves this situation differently. It calls mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that causes that netif_device_detach() is called prior stopping of Tx queues. This also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An advantage is netif_device_detach() does not fire linkwatch events.

So... what ways is the _right_ one ?

Thanks,
Ivan

Reply via email to