On 02/07/2018 01:13 PM, Russell King - ARM Linux wrote: > On Wed, Feb 07, 2018 at 09:56:37PM +0100, Heiner Kallweit wrote: >> Am 04.02.2018 um 03:48 schrieb Florian Fainelli: >>> >>> >>> On 02/03/2018 03:58 PM, Heiner Kallweit wrote: >>>> Am 03.02.2018 um 21:17 schrieb Andrew Lunn: >>>>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote: >>>>>> This commit forces callers of phy_resume() and phy_suspend() to hold >>>>>> mutex phydev->lock. This was done for calls to phy_resume() and >>>>>> phy_suspend() in phylib, however there are more callers in network >>>>>> drivers. I'd assume that these other calls issue a warning now >>>>>> because of the lock not being held. >>>>>> So is there something I miss or would this have to be fixed? >>>>> >>>>> Hi Heiner >>>>> >>>>> This is a good point. >>>>> >>>>> Yes, it looks like some fixes are needed. But what exactly? >>>>> >>>>> The phy state machine will suspend and resume the phy is you call >>>>> phy_stop() and phy_start() in the MAC suspend and resume functions. >>>>> >>>> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state >>>> to PHY_HALTED and (at least if we're not in polling mode) doesn't >>>> call phy_suspend(). Maybe a call to phy_trigger_machine() is >>>> needed like in phy_start() ? Then the state machine would call >>>> phy_suspend(), provided the link is still up. >>> >>> Right, phy_stop() merely just moves the state machine to PHY_HALTED and >>> this is actually a great source of problems which I tried to address here: >>> >>> https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html >>> >>> because phy_stop() is not a synchronous call, so when it returns the >>> state machine might still be running (it can take up to a 1 HZ depending >>> on when you called phy_stop()) and so if you took that as a >>> synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks, >>> you will likely see problems. phy_stop_machine() would provide that >>> synchronization point, but is not currently exported, despite being a >>> global symbol. This patch series above is all well and good, except that >>> Geert reported issues with suspend/resume interactions which I have not >>> been able to track down. >>> >> To not confuse readers I changed the subject of the mail to reflect that >> the discussion isn't about the original topic any longer. >> >> It seems to me that (at least one) reason for the issues is that pm >> callbacks for the phy device and the network device interfere. >> phy device pm (mdio_bus_phy_suspend et al) feels responsible for dealing >> with suspending/resuming the PHY, and the network driver pm callbacks >> as well. >> >> Maybe, if the network driver takes care, it should inform phy device pm >> to do nothing. For this we could add a flag to phydev.mdio.flags. >> If the network driver sets this flag then mdio_bus_phy_suspend() >> and mdio_bus_phy_resume() could turn into no-ops. >> Not totally sure yet about mdio_bus_phy_restore() .. > > What if the MDIO bus is handled by a separate device and the MDIO bus > is suspended prior to the network driver, thereby making the PHY > inaccessible?
Indeed. We can know that in the PHY library though, because there is logic to hold the module reference count, see phy_attach_direct(), we cannot quite trust whether the Ethernet controller does the right thing though, as I can think of several ways for things to be done wrong like: CONFIG_PM is enabled, but the Ethernet driver does not implement any suspend/resume callback, or does it wrong etc... -- Florian