Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-24 Thread Hugh Dickins
On Sun, 24 Jan 2021, Greg Kroah-Hartman wrote:
> On Sat, Jan 23, 2021 at 03:37:30PM -0800, Hugh Dickins wrote:
> > On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote:
> > > On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote:
> > > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > > > > device_reorder_to_tail() in some situations... It's a bit complicated 
> > > > > to
> > > > > explain so I want to apologize in advance for the long mail. :)
> > > > >
> > > > >   Kernel panic - not syncing: kernel stack overflow
> > > > >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> > > > >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > > >   Call trace:
> > > > >...
> > > > >device_reorder_to_tail+0x4c/0xf0
> > > > >device_reorder_to_tail+0x98/0xf0
> > > > >device_reorder_to_tail+0x60/0xf0
> > > > >device_reorder_to_tail+0x60/0xf0
> > > > >device_reorder_to_tail+0x60/0xf0
> > > > >...
> > > > >
> > > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > > > > Reorder devices on successful probe"). It stops happening when I 
> > > > > revert
> > > > > this commit.
> > > > 
> > > > Thanks for the report!
> > > > 
> > > > Greg, please revert commit 5b6164d3465f, it clearly is not an
> > > > improvement, at least at this point.
> > > 
> > > Now reverted, thanks.
> > > 
> > > greg k-h
> > 
> > I think that there has been a misunderstanding here: although
> > 5b6164d3465f ("driver core: Reorder devices on successful probe")
> > has been reverted from linux-next (thank you), it has not yet been
> > reverted from 5.11-rc, and still causing problems there (in my case,
> > not the infinite recursion Stephan reported in this thread, but the
> > ThinkPad rmi4 suspend failure that I reported in another thread).
> 
> It will be sent to Linus in a few hours, thanks, so should show up in
> 5.11-rc5.  I had other patches to go along with this to send him at the
> same time :)

And indeed it's now in, thanks Greg: I'm sorry for being importunate,
the misunderstanding was mine.

Hugh


Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-24 Thread Greg Kroah-Hartman
On Sat, Jan 23, 2021 at 03:37:30PM -0800, Hugh Dickins wrote:
> On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote:
> > On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote:
> > > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > > > device_reorder_to_tail() in some situations... It's a bit complicated to
> > > > explain so I want to apologize in advance for the long mail. :)
> > > >
> > > >   Kernel panic - not syncing: kernel stack overflow
> > > >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> > > >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > > >   Call trace:
> > > >...
> > > >device_reorder_to_tail+0x4c/0xf0
> > > >device_reorder_to_tail+0x98/0xf0
> > > >device_reorder_to_tail+0x60/0xf0
> > > >device_reorder_to_tail+0x60/0xf0
> > > >device_reorder_to_tail+0x60/0xf0
> > > >...
> > > >
> > > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > > > Reorder devices on successful probe"). It stops happening when I revert
> > > > this commit.
> > > 
> > > Thanks for the report!
> > > 
> > > Greg, please revert commit 5b6164d3465f, it clearly is not an
> > > improvement, at least at this point.
> > 
> > Now reverted, thanks.
> > 
> > greg k-h
> 
> I think that there has been a misunderstanding here: although
> 5b6164d3465f ("driver core: Reorder devices on successful probe")
> has been reverted from linux-next (thank you), it has not yet been
> reverted from 5.11-rc, and still causing problems there (in my case,
> not the infinite recursion Stephan reported in this thread, but the
> ThinkPad rmi4 suspend failure that I reported in another thread).

It will be sent to Linus in a few hours, thanks, so should show up in
5.11-rc5.  I had other patches to go along with this to send him at the
same time :)

greg k-h


Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-23 Thread Hugh Dickins
On Tue, 12 Jan 2021, Greg Kroah-Hartman wrote:
> On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  wrote:
> > >
> > > Hi,
> > >
> > > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > > device_reorder_to_tail() in some situations... It's a bit complicated to
> > > explain so I want to apologize in advance for the long mail. :)
> > >
> > >   Kernel panic - not syncing: kernel stack overflow
> > >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> > >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > >   Call trace:
> > >...
> > >device_reorder_to_tail+0x4c/0xf0
> > >device_reorder_to_tail+0x98/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >...
> > >
> > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > > Reorder devices on successful probe"). It stops happening when I revert
> > > this commit.
> > 
> > Thanks for the report!
> > 
> > Greg, please revert commit 5b6164d3465f, it clearly is not an
> > improvement, at least at this point.
> 
> Now reverted, thanks.
> 
> greg k-h

I think that there has been a misunderstanding here: although
5b6164d3465f ("driver core: Reorder devices on successful probe")
has been reverted from linux-next (thank you), it has not yet been
reverted from 5.11-rc, and still causing problems there (in my case,
not the infinite recursion Stephan reported in this thread, but the
ThinkPad rmi4 suspend failure that I reported in another thread).

Thanks,
Hugh


Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-14 Thread Stephan Gerhold
Hi Peter,

On Thu, Jan 14, 2021 at 08:54:54AM +0800, Peter Chen wrote:
> On 21-01-13 12:18:35, Stephan Gerhold wrote:
> > 
> > Also, on a completely different note I looked again at the chipidea USB
> > driver that produces this situation. To request the PHY (which ends up
> > in the circular device link) it does:
> > 
> > /* Look for a generic PHY first */
> > ci->phy = devm_phy_get(dev->parent, "usb-phy");
> > 
> > To me it doesn't really seem great to use the devm_* helpers on the
> > parent device either, so I will check if I can refactor this somehow.
> > Perhaps this situation can be prevented entirely.
> > 
> 
> You could try to get the PHY at parent driver
> (drivers/usb/chipidea/ci_hdrc_msm.c) to see the difference.
> 

Unfortunately, I don't think this works in my case because I have an
ULPI PHY. It's not available until ret = ci_ulpi_init(ci); is called
from the ci_hdrc.0 platform device.

I tried the following diff yesterday. It prevents the circular device
link, therefore also the crash and even devm_* on the parent device:

diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
index aa40e510b806..79f556d0c93e 100644
--- a/drivers/usb/chipidea/core.c
+++ b/drivers/usb/chipidea/core.c
@@ -847,6 +847,8 @@ struct platform_device *ci_hdrc_add_device(struct device 
*dev,
}
 
pdev->dev.parent = dev;
+   device_set_of_node_from_dev(&pdev->dev, dev);
+   pdev->driver_override = kstrdup("ci_hdrc", GFP_KERNEL);
 
ret = platform_device_add_resources(pdev, res, nres);
if (ret)
@@ -1027,7 +1029,7 @@ static int ci_hdrc_probe(struct platform_device *pdev)
ci->usb_phy = ci->platdata->usb_phy;
} else {
/* Look for a generic PHY first */
-   ci->phy = devm_phy_get(dev->parent, "usb-phy");
+   ci->phy = devm_phy_get(dev, "usb-phy");
 
if (PTR_ERR(ci->phy) == -EPROBE_DEFER) {
ret = -EPROBE_DEFER;

Basically my idea was to share the of_node with the ci_hdrc.0 platform
device, so that it can request the PHY itself instead of going through
the parent.

It seems to work fine but I had to add pdev->driver_override to prevent
the ci_hdrc.0 device from probing using ci_hdrc_msm (since it considers
the "compatible" value on the of_node otherwise).

This is a bit weird (I think driver_override is mainly intended to be
written through sysfs, not from kernel code directly).
That is why I did not post this as a proper patch yet...

Thanks,
Stephan


Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-13 Thread Peter Chen
On 21-01-13 12:18:35, Stephan Gerhold wrote:
> 
> Also, on a completely different note I looked again at the chipidea USB
> driver that produces this situation. To request the PHY (which ends up
> in the circular device link) it does:
> 
>   /* Look for a generic PHY first */
>   ci->phy = devm_phy_get(dev->parent, "usb-phy");
> 
> To me it doesn't really seem great to use the devm_* helpers on the
> parent device either, so I will check if I can refactor this somehow.
> Perhaps this situation can be prevented entirely.
> 

Hi Stephan,

You could try to get the PHY at parent driver
(drivers/usb/chipidea/ci_hdrc_msm.c) to see the difference.

-- 

Thanks,
Peter Chen



Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-13 Thread Stephan Gerhold
On Tue, Jan 12, 2021 at 07:24:24PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 12, 2021 at 3:32 PM Rafael J. Wysocki  wrote:
> >
> > On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  wrote:
> > >
> > > Hi,
> > >
> > > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > > device_reorder_to_tail() in some situations... It's a bit complicated to
> > > explain so I want to apologize in advance for the long mail. :)
> > >
> > >   Kernel panic - not syncing: kernel stack overflow
> > >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> > >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > >   Call trace:
> > >...
> > >device_reorder_to_tail+0x4c/0xf0
> > >device_reorder_to_tail+0x98/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >device_reorder_to_tail+0x60/0xf0
> > >...
> > >
> > > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > > Reorder devices on successful probe"). It stops happening when I revert
> > > this commit.
> >
> > Thanks for the report!
> >
> > Greg, please revert commit 5b6164d3465f, it clearly is not an
> > improvement, at least at this point.
> >

Thanks a lot for the quick reply and for reverting the patch!

> > > But I don't think this commit is the actual problem...
> >
> > Well, it may not be the root cause, but it is a change in behavior
> > that exposes the breakage and this is not the only problem introduced
> > by it.
> >
> > > It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC
> > > by adding a random regulator to the USB node, e.g.:
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi 
> > > b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > > index 3a9538e1ec97..9f43fce9e6e3 100644
> > > --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > > @@ -372,6 +372,7 @@ codec {
> > >
> > >  &usb {
> > > status = "okay";
> > > +   vbus-supply = <&pm8916_l5>;
> > > extcon = <&usb_id>, <&usb_id>;
> > >
> > > pinctrl-names = "default", "device";
> > >
> > > I searched for problems in the regulator core but the problem actually
> > > has nothing to do with regulators: The additional regulator supply just
> > > delays probing of the USB driver long enough to trigger the issue.
> > >
> > > Adding some debug output to device_reorder_to_tail() reveals that it
> > > keeps recursing over the same 4 devices:
> > >
> > >   msm_hsusb 78d9000.usb: device_reorder_to_tail()
> > >   ci_hdrc ci_hdrc.0: device_reorder_to_tail()
> > >   qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail()
> > >   phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail()
> > >   msm_hsusb 78d9000.usb: device_reorder_to_tail()
> > >   ...
> > >
> > > The device hierarchy of these is (children devices):
> > >
> > >   78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0
> > >
> > > ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(),
> > > the phy-core then attempts to add the following device link:
> > >
> > >   phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb
> > >
> > > The device link creation in phy-core is optional (see commit 1d7cb11e1090
> > > ("phy: core: Fix phy_get() to not return error on link creation failure"))
> > > because this device link is circular in case of ULPI PHYs (like here).
> > >
> > > And indeed, creating this device link usually fails (as it should).
> > > However, adding the "vbus-supply" changes probe order in some way that
> > > this circular device link ends up being created:
> > >   /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when
> > > I add the "vbus-supply" as in the diff above.
> > >
> > > Apparently, there is a special situation where device_is_dependent()
> > > does not work properly, and therefore the driver core allows creating
> > > the circular device link.
> > >
> > > To show the problem, I enabled some debug messages and added the
> > > following log message:
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 25e08e5f40bd..ff1344eabb31 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -3089,9 +3089,11 @@ int device_add(struct device *dev)
> > > }
> > >
> > > bus_probe_device(dev);
> > > -   if (parent)
> > > +   if (parent) {
> > > +   dev_info(dev, "add to parent %s\n", dev_name(parent));
> > > klist_add_tail(&dev->p->knode_parent,
> > >&parent->p->klist_children);
> > > +   }
> > >
> > > if (dev->class) {
> > > mutex_lock(&dev->class->p->mutex);
> > >
> > > Running this with "vbus-supply" (where it crashes) produces:
> > >
> > >  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
> > >
> > >  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
> > >  bus: 'platform': probing driver ci_hdrc with de

Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-12 Thread Rafael J. Wysocki
On Tue, Jan 12, 2021 at 3:32 PM Rafael J. Wysocki  wrote:
>
> On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  wrote:
> >
> > Hi,
> >
> > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > device_reorder_to_tail() in some situations... It's a bit complicated to
> > explain so I want to apologize in advance for the long mail. :)
> >
> >   Kernel panic - not syncing: kernel stack overflow
> >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> >   Call trace:
> >...
> >device_reorder_to_tail+0x4c/0xf0
> >device_reorder_to_tail+0x98/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >...
> >
> > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > Reorder devices on successful probe"). It stops happening when I revert
> > this commit.
>
> Thanks for the report!
>
> Greg, please revert commit 5b6164d3465f, it clearly is not an
> improvement, at least at this point.
>
> > But I don't think this commit is the actual problem...
>
> Well, it may not be the root cause, but it is a change in behavior
> that exposes the breakage and this is not the only problem introduced
> by it.
>
> > It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC
> > by adding a random regulator to the USB node, e.g.:
> >
> > diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi 
> > b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > index 3a9538e1ec97..9f43fce9e6e3 100644
> > --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> > @@ -372,6 +372,7 @@ codec {
> >
> >  &usb {
> > status = "okay";
> > +   vbus-supply = <&pm8916_l5>;
> > extcon = <&usb_id>, <&usb_id>;
> >
> > pinctrl-names = "default", "device";
> >
> > I searched for problems in the regulator core but the problem actually
> > has nothing to do with regulators: The additional regulator supply just
> > delays probing of the USB driver long enough to trigger the issue.
> >
> > Adding some debug output to device_reorder_to_tail() reveals that it
> > keeps recursing over the same 4 devices:
> >
> >   msm_hsusb 78d9000.usb: device_reorder_to_tail()
> >   ci_hdrc ci_hdrc.0: device_reorder_to_tail()
> >   qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail()
> >   phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail()
> >   msm_hsusb 78d9000.usb: device_reorder_to_tail()
> >   ...
> >
> > The device hierarchy of these is (children devices):
> >
> >   78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0
> >
> > ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(),
> > the phy-core then attempts to add the following device link:
> >
> >   phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb
> >
> > The device link creation in phy-core is optional (see commit 1d7cb11e1090
> > ("phy: core: Fix phy_get() to not return error on link creation failure"))
> > because this device link is circular in case of ULPI PHYs (like here).
> >
> > And indeed, creating this device link usually fails (as it should).
> > However, adding the "vbus-supply" changes probe order in some way that
> > this circular device link ends up being created:
> >   /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when
> > I add the "vbus-supply" as in the diff above.
> >
> > Apparently, there is a special situation where device_is_dependent()
> > does not work properly, and therefore the driver core allows creating
> > the circular device link.
> >
> > To show the problem, I enabled some debug messages and added the
> > following log message:
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 25e08e5f40bd..ff1344eabb31 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -3089,9 +3089,11 @@ int device_add(struct device *dev)
> > }
> >
> > bus_probe_device(dev);
> > -   if (parent)
> > +   if (parent) {
> > +   dev_info(dev, "add to parent %s\n", dev_name(parent));
> > klist_add_tail(&dev->p->knode_parent,
> >&parent->p->klist_children);
> > +   }
> >
> > if (dev->class) {
> > mutex_lock(&dev->class->p->mutex);
> >
> > Running this with "vbus-supply" (where it crashes) produces:
> >
> >  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
> >
> >  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
> >  bus: 'platform': probing driver ci_hdrc with device ci_hdrc.0
> >  bus: 'ulpi': probing driver qcom_usb_hs_phy with device ci_hdrc.0.ulpi
> >  phy phy-ci_hdrc.0.ulpi.0: add to parent ci_hdrc.0.ulpi
> >  qcom_usb_hs_phy ci_hdrc.0.ulpi: add to parent ci_hdrc.0
> >  (1) msm_hsusb 78d9000.usb: Linked as a consumer to phy-ci_hdrc.0.ulpi.0
> >  (2) ci_hdrc ci_hdrc.0: add to parent 78d9000.usb
> >  Ker

Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-12 Thread Greg Kroah-Hartman
On Tue, Jan 12, 2021 at 03:32:04PM +0100, Rafael J. Wysocki wrote:
> On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  wrote:
> >
> > Hi,
> >
> > since 5.11-rc1 I get kernel crashes with infinite recursion in
> > device_reorder_to_tail() in some situations... It's a bit complicated to
> > explain so I want to apologize in advance for the long mail. :)
> >
> >   Kernel panic - not syncing: kernel stack overflow
> >   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
> >   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> >   Call trace:
> >...
> >device_reorder_to_tail+0x4c/0xf0
> >device_reorder_to_tail+0x98/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >device_reorder_to_tail+0x60/0xf0
> >...
> >
> > The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> > Reorder devices on successful probe"). It stops happening when I revert
> > this commit.
> 
> Thanks for the report!
> 
> Greg, please revert commit 5b6164d3465f, it clearly is not an
> improvement, at least at this point.

Now reverted, thanks.

greg k-h


Re: Infinite recursion in device_reorder_to_tail() due to circular device links

2021-01-12 Thread Rafael J. Wysocki
On Mon, Jan 11, 2021 at 7:46 PM Stephan Gerhold  wrote:
>
> Hi,
>
> since 5.11-rc1 I get kernel crashes with infinite recursion in
> device_reorder_to_tail() in some situations... It's a bit complicated to
> explain so I want to apologize in advance for the long mail. :)
>
>   Kernel panic - not syncing: kernel stack overflow
>   CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.11.0-rc3 #1
>   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>   Call trace:
>...
>device_reorder_to_tail+0x4c/0xf0
>device_reorder_to_tail+0x98/0xf0
>device_reorder_to_tail+0x60/0xf0
>device_reorder_to_tail+0x60/0xf0
>device_reorder_to_tail+0x60/0xf0
>...
>
> The crash happens only in 5.11 with commit 5b6164d3465f ("driver core:
> Reorder devices on successful probe"). It stops happening when I revert
> this commit.

Thanks for the report!

Greg, please revert commit 5b6164d3465f, it clearly is not an
improvement, at least at this point.

> But I don't think this commit is the actual problem...

Well, it may not be the root cause, but it is a change in behavior
that exposes the breakage and this is not the only problem introduced
by it.

> It's easy to reproduce on any device based on the Qualcomm MSM8916 SoC
> by adding a random regulator to the USB node, e.g.:
>
> diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi 
> b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> index 3a9538e1ec97..9f43fce9e6e3 100644
> --- a/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> +++ b/arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi
> @@ -372,6 +372,7 @@ codec {
>
>  &usb {
> status = "okay";
> +   vbus-supply = <&pm8916_l5>;
> extcon = <&usb_id>, <&usb_id>;
>
> pinctrl-names = "default", "device";
>
> I searched for problems in the regulator core but the problem actually
> has nothing to do with regulators: The additional regulator supply just
> delays probing of the USB driver long enough to trigger the issue.
>
> Adding some debug output to device_reorder_to_tail() reveals that it
> keeps recursing over the same 4 devices:
>
>   msm_hsusb 78d9000.usb: device_reorder_to_tail()
>   ci_hdrc ci_hdrc.0: device_reorder_to_tail()
>   qcom_usb_hs_phy ci_hdrc.0.ulpi: device_reorder_to_tail()
>   phy phy-ci_hdrc.0.ulpi.0: device_reorder_to_tail()
>   msm_hsusb 78d9000.usb: device_reorder_to_tail()
>   ...
>
> The device hierarchy of these is (children devices):
>
>   78d9000.usb -> ci_hdrc.0 -> ci_hdrc.0.ulpi -> phy-ci_hdrc.0.ulpi.0
>
> ci_hdrc.0 calls phy_get(dev->parent, "usb-phy"). In phy_get(),
> the phy-core then attempts to add the following device link:
>
>   phy-ci_hdrc.0.ulpi.0 -> 78d9000.usb
>
> The device link creation in phy-core is optional (see commit 1d7cb11e1090
> ("phy: core: Fix phy_get() to not return error on link creation failure"))
> because this device link is circular in case of ULPI PHYs (like here).
>
> And indeed, creating this device link usually fails (as it should).
> However, adding the "vbus-supply" changes probe order in some way that
> this circular device link ends up being created:
>   /sys/class/devlink/phy-ci_hdrc.0.ulpi.0--78d9000.usb/ exists only when
> I add the "vbus-supply" as in the diff above.
>
> Apparently, there is a special situation where device_is_dependent()
> does not work properly, and therefore the driver core allows creating
> the circular device link.
>
> To show the problem, I enabled some debug messages and added the
> following log message:
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 25e08e5f40bd..ff1344eabb31 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3089,9 +3089,11 @@ int device_add(struct device *dev)
> }
>
> bus_probe_device(dev);
> -   if (parent)
> +   if (parent) {
> +   dev_info(dev, "add to parent %s\n", dev_name(parent));
> klist_add_tail(&dev->p->knode_parent,
>&parent->p->klist_children);
> +   }
>
> if (dev->class) {
> mutex_lock(&dev->class->p->mutex);
>
> Running this with "vbus-supply" (where it crashes) produces:
>
>  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
>
>  bus: 'platform': probing driver msm_hsusb with device 78d9000.usb
>  bus: 'platform': probing driver ci_hdrc with device ci_hdrc.0
>  bus: 'ulpi': probing driver qcom_usb_hs_phy with device ci_hdrc.0.ulpi
>  phy phy-ci_hdrc.0.ulpi.0: add to parent ci_hdrc.0.ulpi
>  qcom_usb_hs_phy ci_hdrc.0.ulpi: add to parent ci_hdrc.0
>  (1) msm_hsusb 78d9000.usb: Linked as a consumer to phy-ci_hdrc.0.ulpi.0
>  (2) ci_hdrc ci_hdrc.0: add to parent 78d9000.usb
>  Kernel panic - not syncing: kernel stack overflow
>   ...
>
> Note how ci_hdrc is added to the children list of 78d9000.usb (2) after
> the device link was already created in (1). This is why device_is_dependent()
> does not realize the devices will eventually be dependent on each other.

Well, it