On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote:
> From: David Miller <da...@davemloft.net>
> Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)
> 
> > If an async code path tests 'present', gets true, and then the RTNL
> > holding synchronous code path puts the device into D3hot
> immediately
> > afterwards, the async code path will still continue and access the
> > chips registers and fault.
> 
> Wait, is the sequence:
> 
>         ->ndo_stop()
>                 mark device not present and put into D3hot
>                 triggers linkwatch event
>           ...
>                          ->ndo_get_stats64()
> 
> ???
> 

I assume it is, since normally device drivers do carrier_off() on
ndo_stop()

1) One problematic sequence would be 
(for drivers doing D3hot on ndo_stop())

__dev_close_many()
   ->ndo_stop()
      netif_device_detach() //Mark !present;
      ... D3hot
      carrier_off()->linkwatch_event()
            ... // !present && IFF_UP 
      
2) Another problematic scenario which i see is repeated in many
drivers:

shutdown/suspend()
    rtnl_lock()
    netif_device_detach()//Mark !present;
    stop()->carrier_off()->linkwatch_event()
    // at this point device is still IFF_UP and !present
    // due to the early detach above..  
    rtnl_unlock();
   
For scenario 1) we can fix by marking IFF_UP at the beginning, but for
2), i think we need to fix the drivers to detach only after stop :(
   
> Then yeah we might have to clear IFF_UP at the beginning of taking
> a netdev down.


Reply via email to