On 08/01/2017 02:37 PM, Timur Tabi wrote:
> The EMAC has a curious qwirk when RX flow control is enabled and the
> kernel hangs.  With the kernel hung, the EMAC's RX queue soon fills.
> If RX flow control is enabled, the EMAC will then send a non-stop
> stream of pause frames until the system is reset.  The EMAC does not
> have a built-in watchdog.
> 
> In various tests, the pause frame stream sometimes overloads nearby
> switches, effectively disabling the network.  Since the RX queue is
> large and the host processor is more than capable of handling incoming
> packets quickly, the only time the EMAC will send any pause frames is
> when the kernel is hung and unrecoverable.

This is not specific to your EMAC, a lot of adapters have this problem
actually.

I wonder if it would make sense to reach for a broader solution where we
could have a networking stack panic/oops notifier which will actively
clean up the active network devices' RX queue(s) and if tx_pause was
enabled, disable it. We could have drivers announce themselves as
needing this either via NETIF_F_* feature bit or some other private flag.

> 
> To avoid all these problems, we disable flow control autonegotiation
> by default, and only enable receiving pause frames.
> 
> Cc: sta...@vger.kernel.org # 4.11.x
> Signed-off-by: Timur Tabi <ti...@codeaurora.org>
> ---
>  drivers/net/ethernet/qualcomm/emac/emac.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c 
> b/drivers/net/ethernet/qualcomm/emac/emac.c
> index 60850bfa3d32..475c0ea29235 100644
> --- a/drivers/net/ethernet/qualcomm/emac/emac.c
> +++ b/drivers/net/ethernet/qualcomm/emac/emac.c
> @@ -441,8 +441,13 @@ static void emac_init_adapter(struct emac_adapter *adpt)
>       /* others */
>       adpt->preamble = EMAC_PREAMBLE_DEF;
>  
> -     /* default to automatic flow control */
> -     adpt->automatic = true;
> +     /* Disable transmission of pause frames by default, to avoid the
> +      * risk of a pause frame flood that can occur if the kernel hangs.
> +      * We still want to be able to respond to them, however.
> +      */
> +     adpt->automatic = false;
> +     adpt->tx_flow_control = false;
> +     adpt->rx_flow_control = true;
>  }
>  
>  /* Get the clock */
> 


-- 
Florian

Reply via email to