Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Sun, 24 Jan 2021, Greg Kroah-Hartman wrote: > On Sat, Jan 23, 2021 at 03:37:30PM -0800, Hugh Dickins wrote: > > On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote: > > > On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote: > > > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > > > > device_reorder_to_tail() in some situations... It's a bit complicated > > > > > to > > > > > explain so I want to apologize in advance for the long mail. :) > > > > > > > > > > Kernel panic - not syncing: kernel stack overflow > > > > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > > > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > > > > Call trace: > > > > >... > > > > >device_reorder_to_tail+0x4c/0xf0 > > > > >device_reorder_to_tail+0x98/0xf0 > > > > >device_reorder_to_tail+0x60/0xf0 > > > > >device_reorder_to_tail+0x60/0xf0 > > > > >device_reorder_to_tail+0x60/0xf0 > > > > >... > > > > > > > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > > > > Reorder devices on successful probe"). It stops happening when I > > > > > revert > > > > > this commit. > > > > > > > > Thanks for the report! > > > > > > > > Greg, please revert commit 5b6164d3465f, it clearly is not an > > > > improvement, at least at this point. > > > > > > Now reverted, thanks. > > > > > > greg k-h > > > > I think that there has been a misunderstanding here: although > > 5b6164d3465f ("driver core: Reorder devices on successful probe") > > has been reverted from linux-next (thank you), it has not yet been > > reverted from 5.11-rc, and still causing problems there (in my case, > > not the infinite recursion Stephan reported in this thread, but the > > ThinkPad rmi4 suspend failure that I reported in another thread). > > It will be sent to Linus in a few hours, thanks, so should show up in > 5.11-rc5. I had other patches to go along with this to send him at the > same time :) And indeed it's now in, thanks Greg: I'm sorry for being importunate, the misunderstanding was mine. Hugh
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Sat, Jan 23, 2021 at 03:37:30PM -0800, Hugh Dickins wrote: > On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote: > > On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote: > > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold > > > wrote: > > > > > > > > Hi, > > > > > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > > > device_reorder_to_tail() in some situations... It's a bit complicated to > > > > explain so I want to apologize in advance for the long mail. :) > > > > > > > > Kernel panic - not syncing: kernel stack overflow > > > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > > > Call trace: > > > >... > > > >device_reorder_to_tail+0x4c/0xf0 > > > >device_reorder_to_tail+0x98/0xf0 > > > >device_reorder_to_tail+0x60/0xf0 > > > >device_reorder_to_tail+0x60/0xf0 > > > >device_reorder_to_tail+0x60/0xf0 > > > >... > > > > > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > > > Reorder devices on successful probe"). It stops happening when I revert > > > > this commit. > > > > > > Thanks for the report! > > > > > > Greg, please revert commit 5b6164d3465f, it clearly is not an > > > improvement, at least at this point. > > > > Now reverted, thanks. > > > > greg k-h > > I think that there has been a misunderstanding here: although > 5b6164d3465f ("driver core: Reorder devices on successful probe") > has been reverted from linux-next (thank you), it has not yet been > reverted from 5.11-rc, and still causing problems there (in my case, > not the infinite recursion Stephan reported in this thread, but the > ThinkPad rmi4 suspend failure that I reported in another thread). It will be sent to Linus in a few hours, thanks, so should show up in 5.11-rc5. I had other patches to go along with this to send him at the same time :) greg k-h
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote: > On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote: > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold wrote: > > > > > > Hi, > > > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > > device_reorder_to_tail() in some situations... It's a bit complicated to > > > explain so I want to apologize in advance for the long mail. :) > > > > > > Kernel panic - not syncing: kernel stack overflow > > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > > Call trace: > > >... > > >device_reorder_to_tail+0x4c/0xf0 > > >device_reorder_to_tail+0x98/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >... > > > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > > Reorder devices on successful probe"). It stops happening when I revert > > > this commit. > > > > Thanks for the report! > > > > Greg, please revert commit 5b6164d3465f, it clearly is not an > > improvement, at least at this point. > > Now reverted, thanks. > > greg k-h I think that there has been a misunderstanding here: although 5b6164d3465f ("driver core: Reorder devices on successful probe") has been reverted from linux-next (thank you), it has not yet been reverted from 5.11-rc, and still causing problems there (in my case, not the infinite recursion Stephan reported in this thread, but the ThinkPad rmi4 suspend failure that I reported in another thread). Thanks, Hugh
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
Hi Peter, On Thu, Jan 14, 2021 at 08:54:54AM +0800, Peter Chen wrote: > On 21-01-13 12:18:35, Stephan Gerhold wrote: > > > > Also, on a completely different note I looked again at the chipidea USB > > driver that produces this situation. To request the PHY (which ends up > > in the circular device link) it does: > > > > /* Look for a generic PHY first */ > > ci->phy = devm_phy_get(dev->parent, "usb-phy"); > > > > To me it doesn't really seem great to use the devm_* helpers on the > > parent device either, so I will check if I can refactor this somehow. > > Perhaps this situation can be prevented entirely. > > > > You could try to get the PHY at parent driver > (drivers/usb/chipidea/ci_hdrc_msm.c) to see the difference. > Unfortunately, I don't think this works in my case because I have an ULPI PHY. It's not available until ret = ci_ulpi_init(ci); is called from the ci_hdrc.0 platform device. I tried the following diff yesterday. It prevents the circular device link, therefore also the crash and even devm_* on the parent device: diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c index aa40e510b806..79f556d0c93e 100644 --- a/drivers/usb/chipidea/core.c +++ b/drivers/usb/chipidea/core.c @@ -847,6 +847,8 @@ struct platform_device *ci_hdrc_add_device(struct device *dev, } pdev->dev.parent = dev; + device_set_of_node_from_dev(&pdev->dev, dev); + pdev->driver_override = kstrdup("ci_hdrc", GFP_KERNEL); ret = platform_device_add_resources(pdev, res, nres); if (ret) @@ -1027,7 +1029,7 @@ static int ci_hdrc_probe(struct platform_device *pdev) ci->usb_phy = ci->platdata->usb_phy; } else { /* Look for a generic PHY first */ - ci->phy = devm_phy_get(dev->parent, "usb-phy"); + ci->phy = devm_phy_get(dev, "usb-phy"); if (PTR_ERR(ci->phy) == -EPROBE_DEFER) { ret = -EPROBE_DEFER; Basically my idea was to share the of_node with the ci_hdrc.0 platform device, so that it can request the PHY itself instead of going through the parent. It seems to work fine but I had to add pdev->driver_override to prevent the ci_hdrc.0 device from probing using ci_hdrc_msm (since it considers the "compatible" value on the of_node otherwise). This is a bit weird (I think driver_override is mainly intended to be written through sysfs, not from kernel code directly). That is why I did not post this as a proper patch yet... Thanks, Stephan
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On 21-01-13 12:18:35, Stephan Gerhold wrote: > > Also, on a completely different note I looked again at the chipidea USB > driver that produces this situation. To request the PHY (which ends up > in the circular device link) it does: > > /* Look for a generic PHY first */ > ci->phy = devm_phy_get(dev->parent, "usb-phy"); > > To me it doesn't really seem great to use the devm_* helpers on the > parent device either, so I will check if I can refactor this somehow. > Perhaps this situation can be prevented entirely. > Hi Stephan, You could try to get the PHY at parent driver (drivers/usb/chipidea/ci_hdrc_msm.c) to see the difference. -- Thanks, Peter Chen
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Tue, Jan 12, 2021 at 07:24:24PM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 12, 2021 at 3:32 PM Rafael J. Wysocki wrote: > > > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold wrote: > > > > > > Hi, > > > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > > device_reorder_to_tail() in some situations... It's a bit complicated to > > > explain so I want to apologize in advance for the long mail. :) > > > > > > Kernel panic - not syncing: kernel stack overflow > > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > > Call trace: > > >... > > >device_reorder_to_tail+0x4c/0xf0 > > >device_reorder_to_tail+0x98/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >device_reorder_to_tail+0x60/0xf0 > > >... > > > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > > Reorder devices on successful probe"). It stops happening when I revert > > > this commit. > > > > Thanks for the report! > > > > Greg, please revert commit 5b6164d3465f, it clearly is not an > > improvement, at least at this point. > > Thanks a lot for the quick reply and for reverting the patch! > > > But I don't think this commit is the actual problem... > > > > Well, it may not be the root cause, but it is a change in behavior > > that exposes the breakage and this is not the only problem introduced > > by it. > > > > > It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC > > > by adding a random regulator to the USB node, e.g.: > > > > > > diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > > b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > > index 3a9538e1ec97..9f43fce9e6e3 100644 > > > --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > > +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > > @@ -372,6 +372,7 @@ codec { > > > > > > &usb { > > > status = "okay"; > > > + vbus-supply = <&pm8916_l5>; > > > extcon = <&usb_id>, <&usb_id>; > > > > > > pinctrl-names = "default", "device"; > > > > > > I searched for problems in the regulator core but the problem actually > > > has nothing to do with regulators: The additional regulator supply just > > > delays probing of the USB driver long enough to trigger the issue. > > > > > > Adding some debug output to device_reorder_to_tail() reveals that it > > > keeps recursing over the same 4 devices: > > > > > > msm_hsusb 78d9000.usb: device_reorder_to_tail() > > > ci_hdrc ci_hdrc.0: device_reorder_to_tail() > > > qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail() > > > phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail() > > > msm_hsusb 78d9000.usb: device_reorder_to_tail() > > > ... > > > > > > The device hierarchy of these is (children devices): > > > > > > 78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0 > > > > > > ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(), > > > the phy-core then attempts to add the following device link: > > > > > > phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb > > > > > > The device link creation in phy-core is optional (see commit 1d7cb11e1090 > > > ("phy: core: Fix phy_get() to not return error on link creation failure")) > > > because this device link is circular in case of ULPI PHYs (like here). > > > > > > And indeed, creating this device link usually fails (as it should). > > > However, adding the "vbus-supply" changes probe order in some way that > > > this circular device link ends up being created: > > > /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when > > > I add the "vbus-supply" as in the diff above. > > > > > > Apparently, there is a special situation where device_is_dependent() > > > does not work properly, and therefore the driver core allows creating > > > the circular device link. > > > > > > To show the problem, I enabled some debug messages and added the > > > following log message: > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > index 25e08e5f40bd..ff1344eabb31 100644 > > > --- a/drivers/base/core.c > > > +++ b/drivers/base/core.c > > > @@ -3089,9 +3089,11 @@ int device_add(struct device *dev) > > > } > > > > > > bus_probe_device(dev); > > > - if (parent) > > > + if (parent) { > > > + dev_info(dev, "add to parent %s\n", dev_name(parent)); > > > klist_add_tail(&dev->p->knode_parent, > > >&parent->p->klist_children); > > > + } > > > > > > if (dev->class) { > > > mutex_lock(&dev->class->p->mutex); > > > > > > Running this with "vbus-supply" (where it crashes) produces: > > > > > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > > > > > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > > > bus: 'platform': probing driver ci_hdrc with de
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Tue, Jan 12, 2021 at 3:32 PM Rafael J. Wysocki wrote: > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold wrote: > > > > Hi, > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > device_reorder_to_tail() in some situations... It's a bit complicated to > > explain so I want to apologize in advance for the long mail. :) > > > > Kernel panic - not syncing: kernel stack overflow > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > Call trace: > >... > >device_reorder_to_tail+0x4c/0xf0 > >device_reorder_to_tail+0x98/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >... > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > Reorder devices on successful probe"). It stops happening when I revert > > this commit. > > Thanks for the report! > > Greg, please revert commit 5b6164d3465f, it clearly is not an > improvement, at least at this point. > > > But I don't think this commit is the actual problem... > > Well, it may not be the root cause, but it is a change in behavior > that exposes the breakage and this is not the only problem introduced > by it. > > > It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC > > by adding a random regulator to the USB node, e.g.: > > > > diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > index 3a9538e1ec97..9f43fce9e6e3 100644 > > --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > > @@ -372,6 +372,7 @@ codec { > > > > &usb { > > status = "okay"; > > + vbus-supply = <&pm8916_l5>; > > extcon = <&usb_id>, <&usb_id>; > > > > pinctrl-names = "default", "device"; > > > > I searched for problems in the regulator core but the problem actually > > has nothing to do with regulators: The additional regulator supply just > > delays probing of the USB driver long enough to trigger the issue. > > > > Adding some debug output to device_reorder_to_tail() reveals that it > > keeps recursing over the same 4 devices: > > > > msm_hsusb 78d9000.usb: device_reorder_to_tail() > > ci_hdrc ci_hdrc.0: device_reorder_to_tail() > > qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail() > > phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail() > > msm_hsusb 78d9000.usb: device_reorder_to_tail() > > ... > > > > The device hierarchy of these is (children devices): > > > > 78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0 > > > > ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(), > > the phy-core then attempts to add the following device link: > > > > phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb > > > > The device link creation in phy-core is optional (see commit 1d7cb11e1090 > > ("phy: core: Fix phy_get() to not return error on link creation failure")) > > because this device link is circular in case of ULPI PHYs (like here). > > > > And indeed, creating this device link usually fails (as it should). > > However, adding the "vbus-supply" changes probe order in some way that > > this circular device link ends up being created: > > /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when > > I add the "vbus-supply" as in the diff above. > > > > Apparently, there is a special situation where device_is_dependent() > > does not work properly, and therefore the driver core allows creating > > the circular device link. > > > > To show the problem, I enabled some debug messages and added the > > following log message: > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index 25e08e5f40bd..ff1344eabb31 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -3089,9 +3089,11 @@ int device_add(struct device *dev) > > } > > > > bus_probe_device(dev); > > - if (parent) > > + if (parent) { > > + dev_info(dev, "add to parent %s\n", dev_name(parent)); > > klist_add_tail(&dev->p->knode_parent, > >&parent->p->klist_children); > > + } > > > > if (dev->class) { > > mutex_lock(&dev->class->p->mutex); > > > > Running this with "vbus-supply" (where it crashes) produces: > > > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > > > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > > bus: 'platform': probing driver ci_hdrc with device ci_hdrc.0 > > bus: 'ulpi': probing driver qcom_usb_hs_phy with device ci_hdrc.0.ulpi > > phy phy-ci_hdrc.0.ulpi.0: add to parent ci_hdrc.0.ulpi > > qcom_usb_hs_phy ci_hdrc.0.ulpi: add to parent ci_hdrc.0 > > (1) msm_hsusb 78d9000.usb: Linked as a consumer to phy-ci_hdrc.0.ulpi.0 > > (2) ci_hdrc ci_hdrc.0: add to parent 78d9000.usb > > Ker
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold wrote: > > > > Hi, > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in > > device_reorder_to_tail() in some situations... It's a bit complicated to > > explain so I want to apologize in advance for the long mail. :) > > > > Kernel panic - not syncing: kernel stack overflow > > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > Call trace: > >... > >device_reorder_to_tail+0x4c/0xf0 > >device_reorder_to_tail+0x98/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >device_reorder_to_tail+0x60/0xf0 > >... > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > > Reorder devices on successful probe"). It stops happening when I revert > > this commit. > > Thanks for the report! > > Greg, please revert commit 5b6164d3465f, it clearly is not an > improvement, at least at this point. Now reverted, thanks. greg k-h
Re: Infinite recursion in device_reorder_to_tail() due to circular device links
On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold wrote: > > Hi, > > since 5.11-rc1 I get kernel crashes with infinite recursion in > device_reorder_to_tail() in some situations... It's a bit complicated to > explain so I want to apologize in advance for the long mail. :) > > Kernel panic - not syncing: kernel stack overflow > CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1 > Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > Call trace: >... >device_reorder_to_tail+0x4c/0xf0 >device_reorder_to_tail+0x98/0xf0 >device_reorder_to_tail+0x60/0xf0 >device_reorder_to_tail+0x60/0xf0 >device_reorder_to_tail+0x60/0xf0 >... > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core: > Reorder devices on successful probe"). It stops happening when I revert > this commit. Thanks for the report! Greg, please revert commit 5b6164d3465f, it clearly is not an improvement, at least at this point. > But I don't think this commit is the actual problem... Well, it may not be the root cause, but it is a change in behavior that exposes the breakage and this is not the only problem introduced by it. > It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC > by adding a random regulator to the USB node, e.g.: > > diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > index 3a9538e1ec97..9f43fce9e6e3 100644 > --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi > @@ -372,6 +372,7 @@ codec { > > &usb { > status = "okay"; > + vbus-supply = <&pm8916_l5>; > extcon = <&usb_id>, <&usb_id>; > > pinctrl-names = "default", "device"; > > I searched for problems in the regulator core but the problem actually > has nothing to do with regulators: The additional regulator supply just > delays probing of the USB driver long enough to trigger the issue. > > Adding some debug output to device_reorder_to_tail() reveals that it > keeps recursing over the same 4 devices: > > msm_hsusb 78d9000.usb: device_reorder_to_tail() > ci_hdrc ci_hdrc.0: device_reorder_to_tail() > qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail() > phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail() > msm_hsusb 78d9000.usb: device_reorder_to_tail() > ... > > The device hierarchy of these is (children devices): > > 78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0 > > ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(), > the phy-core then attempts to add the following device link: > > phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb > > The device link creation in phy-core is optional (see commit 1d7cb11e1090 > ("phy: core: Fix phy_get() to not return error on link creation failure")) > because this device link is circular in case of ULPI PHYs (like here). > > And indeed, creating this device link usually fails (as it should). > However, adding the "vbus-supply" changes probe order in some way that > this circular device link ends up being created: > /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when > I add the "vbus-supply" as in the diff above. > > Apparently, there is a special situation where device_is_dependent() > does not work properly, and therefore the driver core allows creating > the circular device link. > > To show the problem, I enabled some debug messages and added the > following log message: > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 25e08e5f40bd..ff1344eabb31 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -3089,9 +3089,11 @@ int device_add(struct device *dev) > } > > bus_probe_device(dev); > - if (parent) > + if (parent) { > + dev_info(dev, "add to parent %s\n", dev_name(parent)); > klist_add_tail(&dev->p->knode_parent, >&parent->p->klist_children); > + } > > if (dev->class) { > mutex_lock(&dev->class->p->mutex); > > Running this with "vbus-supply" (where it crashes) produces: > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > > bus: 'platform': probing driver msm_hsusb with device 78d9000.usb > bus: 'platform': probing driver ci_hdrc with device ci_hdrc.0 > bus: 'ulpi': probing driver qcom_usb_hs_phy with device ci_hdrc.0.ulpi > phy phy-ci_hdrc.0.ulpi.0: add to parent ci_hdrc.0.ulpi > qcom_usb_hs_phy ci_hdrc.0.ulpi: add to parent ci_hdrc.0 > (1) msm_hsusb 78d9000.usb: Linked as a consumer to phy-ci_hdrc.0.ulpi.0 > (2) ci_hdrc ci_hdrc.0: add to parent 78d9000.usb > Kernel panic - not syncing: kernel stack overflow > ... > > Note how ci_hdrc is added to the children list of 78d9000.usb (2) after > the device link was already created in (1). This is why device_is_dependent() > does not realize the devices will eventually be dependent on each other. Well, it