On 31/08/2017 19:53, Florian Fainelli wrote: > On 08/31/2017 10:49 AM, Mason wrote: >> On 31/08/2017 18:57, Florian Fainelli wrote: >>> And the race is between phy_detach() setting phydev->attached_dev = NULL >>> and phy_state_machine() running in PHY_HALTED state and calling >>> netif_carrier_off(). >> >> I must be missing something. >> (Since a thread cannot race against itself.) >> >> phy_disconnect calls phy_stop_machine which >> 1) stops the work queue from running in a separate thread >> 2) calls phy_state_machine *synchronously* >> which runs the PHY_HALTED case with everything well-defined >> end of phy_stop_machine >> >> phy_disconnect only then calls phy_detach() >> which makes future calls of phy_state_machine perilous. >> >> This all happens in the same thread, so I'm not yet >> seeing where the race happens? > > The race is as described in David's earlier email, so let's recap: > > Thread 1 Thread 2 > phy_disconnect() > phy_stop_interrupts() > phy_stop_machine() > phy_state_machine() > -> queue_delayed_work() > phy_detach() > phy_state_machine() > -> netif_carrier_off() > > If phy_detach() finishes earlier than the workqueue had a chance to be > scheduled and process PHY_HALTED again, then we trigger the NULL pointer > de-reference. > > workqueues are not tasklets, the CPU scheduling them gets no guarantee > they will run on the same CPU.
Something does not add up. The synchronous call to phy_state_machine() does: case PHY_HALTED: if (phydev->link) { phydev->link = 0; netif_carrier_off(phydev->attached_dev); phy_adjust_link(phydev); do_suspend = true; } then sets phydev->link = 0; therefore subsequent calls to phy_state_machin() will be no-op. Also, queue_delayed_work() is only called in polling mode. David stated that he's using interrupt mode. Regards.