RE: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-06 Thread Yoshihiro Shimoda
Hi again,


> I'm guessing we should add the following flush_work for 
> deferred_probe_timeout_work().
> # Sorry, I didn't test this for some reasons yet though...
> 
> +   /* wait for the deferred probe timeout workqueue to finish */
> +   if (driver_deferred_probe_timeout > 0)
> +   flush_work(_probe_timeout_work);

I'm sorry. This code caused build error because the deferred_probe_timeout_work
is struct delayed_work. Also, I don't think using flush_delayed_work() is
my expectation (wait until the timeout of deferred)...

Best regards,
Yoshihiro Shimoda

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-06 Thread Yoshihiro Shimoda
Hi John, Geert,

> From: John Stultz, Sent: Saturday, April 4, 2020 1:19 PM
> 
> On Fri, Apr 3, 2020 at 4:47 AM Geert Uytterhoeven  
> wrote:
> > On Thu, Apr 2, 2020 at 7:27 PM John Stultz  wrote:
> > > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> > >  wrote:
> > > >
> > > > I found an issue after applied the following patches:
> > > > ---
> > > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > > bec6c0e pinctrl: Remove use of 
> > > > driver_deferred_probe_check_state_continue()
> > > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
> > > > CONFIG_MODULES is set
> >
> > Note that just setting deferred_probe_timeout = -1 like for the
> > CONFIG_MODULES=n case doesn't help.
> 
> Yea. I can see why in that case, as we're checking
> !IS_ENABLED(CONFIG_MODULES) directly in
> driver_deferred_probe_check_state.
> 
> I guess we could switch that to checking
> (driver_deferred_probe_timeout == -1) which would have the same logic
> and at least make it consistent if someone specifies -1 on the command
> line (since now it will effectively have it EPROBE_DEFER forever in
> that case). But also having a timeout=infinity could be useful if
> folks don't want the deferring to time out.  Maybe in the !modules
> case setting it to =0 would be the most clear.
> 
> But that's sort of a further cleanup. I'm still more worried about the
> NFS failure below.
> 
> 
> > > Hey,
> > >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > > to remove the WARN messages, but I'm a bit more concerned about why
> > > after the 30 second delay, the ethernet driver loads:
> > >   [   36.218666] ravb e680.ethernet eth0: Base address at
> > > 0xe680, 2e:09:0a:02:eb:2d, IRQ 117.
> > > but NFS fails.
> > >
> > > Is it just that the 30 second delay is too long and NFS gives up?
> >
> > I added some debug code to mount_nfs_root(), which shows that the first
> > 3 tries happen before ravb is instantiated, and the last 3 tries happen
> > after.  So NFS root should work, if the network works.
> >
> > However, it seems the Ethernet PHY is never initialized, hence the link
> > never becomes ready.  Dmesg before/after:
> >
> >  ravb e680.ethernet eth0: Base address at 0xe680,
> > 2e:09:0a:02:ea:ff, IRQ 108.
> >
> > Good.
> >
> >  ...
> > -gpio_rcar e6052000.gpio: sense irq = 11, type = 8
> >
> > This is the GPIO the PHY IRQ is connected to.
> > Note that that GPIO controller has been instantiated before.
> >
> >  ...
> > -Micrel KSZ9031 Gigabit PHY e680.ethernet-:00:
> > attached PHY driver [Micrel KSZ9031 Gigabit PHY]
> > (mii_bus:phy_addr=e680.ethernet-:00, irq=197)
> >  ...
> > -ravb e680.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> >
> > Oops.
> >
> > -Sending DHCP requests .., OK
> > -IP-Config: Got DHCP answer from ...
> >  ...
> > +VFS: Unable to mount root fs via NFS, trying floppy.
> > +VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6
> >
> > > Does booting with deferred_probe_timeout=0 work?
> >
> > It does, as now everything using optional links (DMA and IOMMU) is now
> > instantiated on first try.
> 
> Thanks so much for helping clarify this!
> 
> So it's at least good to hear that booting with
> deferred_probe_timeout=0 is working!  But I'm bummed the NFS (or as
> you pointed out in your later mail,  ip_auto_config) falls over
> because the network isn't immediately there.
> 
> Looking a little closer at the ip_auto_config() code, I think the
> issue may be that wait_for_device_probe() is effectively returning too
> early, since the probe_defer_timeout is still active? I need to dig a
> bit more on that code, on Monday, as I don't fully understand it yet.

I think so. I also investigated this issue more and then the following
patch seems to be related because return value is changed a bit.

c8c43ce driver core: Fix driver_deferred_probe_check_state() logic

# By the way, this is other topic though, IIUC we should revise
# the deferred_probe_timeout= in Documentation/admin-guide/kernel-parameters.txt
# for the commit c8c43ce. Especially " A timeout of 0 will timeout at the end 
of initcalls."
# doesn't match after we applied the commit.

I'm guessing we should add the following flush_work for 
deferred_probe_timeout_work().
# Sorry, I didn't test this for some reasons yet though...

+   /* wait for the deferred probe timeout workqueue to finish */
+   if (driver_deferred_probe_timeout > 0)
+   flush_work(_probe_timeout_work);

> If I can't find a way to address that, I think the best course will be
> to set the driver_deferred_probe_timeout value to default to 0
> regardless of the value of CONFIG_MODULES, so we don't cause any
> apparent regression from previous behavior. That will also sort out
> the less intuitive = -1 

Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-03 Thread John Stultz
On Fri, Apr 3, 2020 at 4:47 AM Geert Uytterhoeven  wrote:
> On Thu, Apr 2, 2020 at 7:27 PM John Stultz  wrote:
> > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> >  wrote:
> > >
> > > I found an issue after applied the following patches:
> > > ---
> > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > bec6c0e pinctrl: Remove use of 
> > > driver_deferred_probe_check_state_continue()
> > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
> > > CONFIG_MODULES is set
>
> Note that just setting deferred_probe_timeout = -1 like for the
> CONFIG_MODULES=n case doesn't help.

Yea. I can see why in that case, as we're checking
!IS_ENABLED(CONFIG_MODULES) directly in
driver_deferred_probe_check_state.

I guess we could switch that to checking
(driver_deferred_probe_timeout == -1) which would have the same logic
and at least make it consistent if someone specifies -1 on the command
line (since now it will effectively have it EPROBE_DEFER forever in
that case). But also having a timeout=infinity could be useful if
folks don't want the deferring to time out.  Maybe in the !modules
case setting it to =0 would be the most clear.

But that's sort of a further cleanup. I'm still more worried about the
NFS failure below.


> > Hey,
> >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > to remove the WARN messages, but I'm a bit more concerned about why
> > after the 30 second delay, the ethernet driver loads:
> >   [   36.218666] ravb e680.ethernet eth0: Base address at
> > 0xe680, 2e:09:0a:02:eb:2d, IRQ 117.
> > but NFS fails.
> >
> > Is it just that the 30 second delay is too long and NFS gives up?
>
> I added some debug code to mount_nfs_root(), which shows that the first
> 3 tries happen before ravb is instantiated, and the last 3 tries happen
> after.  So NFS root should work, if the network works.
>
> However, it seems the Ethernet PHY is never initialized, hence the link
> never becomes ready.  Dmesg before/after:
>
>  ravb e680.ethernet eth0: Base address at 0xe680,
> 2e:09:0a:02:ea:ff, IRQ 108.
>
> Good.
>
>  ...
> -gpio_rcar e6052000.gpio: sense irq = 11, type = 8
>
> This is the GPIO the PHY IRQ is connected to.
> Note that that GPIO controller has been instantiated before.
>
>  ...
> -Micrel KSZ9031 Gigabit PHY e680.ethernet-:00:
> attached PHY driver [Micrel KSZ9031 Gigabit PHY]
> (mii_bus:phy_addr=e680.ethernet-:00, irq=197)
>  ...
> -ravb e680.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
>
> Oops.
>
> -Sending DHCP requests .., OK
> -IP-Config: Got DHCP answer from ...
>  ...
> +VFS: Unable to mount root fs via NFS, trying floppy.
> +VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6
>
> > Does booting with deferred_probe_timeout=0 work?
>
> It does, as now everything using optional links (DMA and IOMMU) is now
> instantiated on first try.

Thanks so much for helping clarify this!

So it's at least good to hear that booting with
deferred_probe_timeout=0 is working!  But I'm bummed the NFS (or as
you pointed out in your later mail,  ip_auto_config) falls over
because the network isn't immediately there.

Looking a little closer at the ip_auto_config() code, I think the
issue may be that wait_for_device_probe() is effectively returning too
early, since the probe_defer_timeout is still active? I need to dig a
bit more on that code, on Monday, as I don't fully understand it yet.

If I can't find a way to address that, I think the best course will be
to set the driver_deferred_probe_timeout value to default to 0
regardless of the value of CONFIG_MODULES, so we don't cause any
apparent regression from previous behavior. That will also sort out
the less intuitive = -1 initialization in the non-modules case.

In any case, I'll try to have a patch to send out on Monday.

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-03 Thread Geert Uytterhoeven
Hi John,

On Fri, Apr 3, 2020 at 1:47 PM Geert Uytterhoeven  wrote:
> On Thu, Apr 2, 2020 at 7:27 PM John Stultz  wrote:
> > On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
> >  wrote:
> > >
> > > I found an issue after applied the following patches:
> > > ---
> > > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > > bec6c0e pinctrl: Remove use of 
> > > driver_deferred_probe_check_state_continue()
> > > e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
> > > CONFIG_MODULES is set
>
> Note that just setting deferred_probe_timeout = -1 like for the
> CONFIG_MODULES=n case doesn't help.
>
> > > c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
> > > ---
> > >
> > > Before these patches, on my environment [1], some device drivers
> > > which has iommus property output the following message when probing:
> > >
> > > [3.05] ravb e680.ethernet: ignoring dependency for device, 
> > > assuming no driver
> > > [3.257174] ravb e680.ethernet eth0: Base address at 0xe680, 
> > > 2e:09:0a:02:eb:2d, IRQ 117.
> > >
> > > So, since ravb driver is probed within 4 seconds, we can use NFS rootfs 
> > > correctly.
> > >
> > > However, after these patches are applied, since the patches are always 
> > > waiting for 30 seconds
> > > for of_iommu_configure() when IOMMU hardware is disabled, 
> > > drivers/base/dd.c output WARN.
> > > Also, since ravb cannot be probed for 30 seconds, we cannot use NFS 
> > > rootfs anymore.
> > > JFYI, I copied the kernel log to the end of this email.
> >
> > Hey,
> >   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> > to remove the WARN messages, but I'm a bit more concerned about why
> > after the 30 second delay, the ethernet driver loads:
> >   [   36.218666] ravb e680.ethernet eth0: Base address at
> > 0xe680, 2e:09:0a:02:eb:2d, IRQ 117.
> > but NFS fails.
> >
> > Is it just that the 30 second delay is too long and NFS gives up?
>
> I added some debug code to mount_nfs_root(), which shows that the first
> 3 tries happen before ravb is instantiated, and the last 3 tries happen
> after.  So NFS root should work, if the network works.
>
> However, it seems the Ethernet PHY is never initialized, hence the link
> never becomes ready.

So the issue is not nfsroot in-se, but the ip-config that needs to
happen before that.

The call to wait_for_devices() in ip_auto_config() (which is a
late_initcall()) returns -ENODEV, as the network device hasn't probed
successfully yet, so ip-config is aborted.

The (whitespace-damaged) patch below fixes that, but may have unintended
side-effects.

--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1469,7 +1469,11 @@ static int __init ip_auto_config(void)
/* Wait for devices to appear */
err = wait_for_devices();
if (err)
+#ifdef IPCONFIG_DYNAMIC
+   goto try_try_again;
+#else
return err;
+#endif

/* Setup all network devices */
err = ic_open_devs();

Probably we want at least some CONFIG_ROOT_NFS || CONFIG_CIFS_ROOT,
and ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS checks.

Thanks for your comments!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-03 Thread Geert Uytterhoeven
Hi John,

On Thu, Apr 2, 2020 at 7:27 PM John Stultz  wrote:
> On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
>  wrote:
> >
> > I found an issue after applied the following patches:
> > ---
> > 64c775f driver core: Rename deferred_probe_timeout and make it global
> > 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> > bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> > e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
> > CONFIG_MODULES is set

Note that just setting deferred_probe_timeout = -1 like for the
CONFIG_MODULES=n case doesn't help.

> > c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
> > ---
> >
> > Before these patches, on my environment [1], some device drivers
> > which has iommus property output the following message when probing:
> >
> > [3.05] ravb e680.ethernet: ignoring dependency for device, 
> > assuming no driver
> > [3.257174] ravb e680.ethernet eth0: Base address at 0xe680, 
> > 2e:09:0a:02:eb:2d, IRQ 117.
> >
> > So, since ravb driver is probed within 4 seconds, we can use NFS rootfs 
> > correctly.
> >
> > However, after these patches are applied, since the patches are always 
> > waiting for 30 seconds
> > for of_iommu_configure() when IOMMU hardware is disabled, drivers/base/dd.c 
> > output WARN.
> > Also, since ravb cannot be probed for 30 seconds, we cannot use NFS rootfs 
> > anymore.
> > JFYI, I copied the kernel log to the end of this email.
>
> Hey,
>   Terribly sorry for the trouble. So as Robin mentioned I have a patch
> to remove the WARN messages, but I'm a bit more concerned about why
> after the 30 second delay, the ethernet driver loads:
>   [   36.218666] ravb e680.ethernet eth0: Base address at
> 0xe680, 2e:09:0a:02:eb:2d, IRQ 117.
> but NFS fails.
>
> Is it just that the 30 second delay is too long and NFS gives up?

I added some debug code to mount_nfs_root(), which shows that the first
3 tries happen before ravb is instantiated, and the last 3 tries happen
after.  So NFS root should work, if the network works.

However, it seems the Ethernet PHY is never initialized, hence the link
never becomes ready.  Dmesg before/after:

 ravb e680.ethernet eth0: Base address at 0xe680,
2e:09:0a:02:ea:ff, IRQ 108.

Good.

 ...
-gpio_rcar e6052000.gpio: sense irq = 11, type = 8

This is the GPIO the PHY IRQ is connected to.
Note that that GPIO controller has been instantiated before.

 ...
-Micrel KSZ9031 Gigabit PHY e680.ethernet-:00:
attached PHY driver [Micrel KSZ9031 Gigabit PHY]
(mii_bus:phy_addr=e680.ethernet-:00, irq=197)
 ...
-ravb e680.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

Oops.

-Sending DHCP requests .., OK
-IP-Config: Got DHCP answer from ...
 ...
+VFS: Unable to mount root fs via NFS, trying floppy.
+VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6

> Does booting with deferred_probe_timeout=0 work?

It does, as now everything using optional links (DMA and IOMMU) is now
instantiated on first try.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-02 Thread John Stultz
On Thu, Apr 2, 2020 at 3:17 AM Yoshihiro Shimoda
 wrote:
>
> I found an issue after applied the following patches:
> ---
> 64c775f driver core: Rename deferred_probe_timeout and make it global
> 0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
> bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
> e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
> CONFIG_MODULES is set
> c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
> ---
>
> Before these patches, on my environment [1], some device drivers
> which has iommus property output the following message when probing:
>
> [3.05] ravb e680.ethernet: ignoring dependency for device, 
> assuming no driver
> [3.257174] ravb e680.ethernet eth0: Base address at 0xe680, 
> 2e:09:0a:02:eb:2d, IRQ 117.
>
> So, since ravb driver is probed within 4 seconds, we can use NFS rootfs 
> correctly.
>
> However, after these patches are applied, since the patches are always 
> waiting for 30 seconds
> for of_iommu_configure() when IOMMU hardware is disabled, drivers/base/dd.c 
> output WARN.
> Also, since ravb cannot be probed for 30 seconds, we cannot use NFS rootfs 
> anymore.
> JFYI, I copied the kernel log to the end of this email.

Hey,
  Terribly sorry for the trouble. So as Robin mentioned I have a patch
to remove the WARN messages, but I'm a bit more concerned about why
after the 30 second delay, the ethernet driver loads:
  [   36.218666] ravb e680.ethernet eth0: Base address at
0xe680, 2e:09:0a:02:eb:2d, IRQ 117.
but NFS fails.

Is it just that the 30 second delay is too long and NFS gives up?

Does booting with deferred_probe_timeout=0 work?

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to fix WARN from drivers/base/dd.c in next-20200401 if CONFIG_MODULES=y?

2020-04-02 Thread Robin Murphy

On 2020-04-02 11:16 am, Yoshihiro Shimoda wrote:

Hi John,

I found an issue after applied the following patches:
---
64c775f driver core: Rename deferred_probe_timeout and make it global
0e9f8d0 driver core: Remove driver_deferred_probe_check_state_continue()
bec6c0e pinctrl: Remove use of driver_deferred_probe_check_state_continue()
e2cec7d driver core: Set deferred_probe_timeout to a longer default if 
CONFIG_MODULES is set
c8c43ce driver core: Fix driver_deferred_probe_check_state() logic
---

Before these patches, on my environment [1], some device drivers
which has iommus property output the following message when probing:

[3.05] ravb e680.ethernet: ignoring dependency for device, assuming 
no driver
[3.257174] ravb e680.ethernet eth0: Base address at 0xe680, 
2e:09:0a:02:eb:2d, IRQ 117.

So, since ravb driver is probed within 4 seconds, we can use NFS rootfs 
correctly.

However, after these patches are applied, since the patches are always waiting 
for 30 seconds
for of_iommu_configure() when IOMMU hardware is disabled, drivers/base/dd.c 
output WARN.
Also, since ravb cannot be probed for 30 seconds, we cannot use NFS rootfs 
anymore.
JFYI, I copied the kernel log to the end of this email.

I guess the patches will be merged into v5.7-rc1 because the patches are 
contained from
next-20200316, I'd like to fix the issue in v5.7-rcN cycle somehow.


This already came up in a different context, and there's a proposal from 
John here:


https://lore.kernel.org/lkml/20200330202715.86609-1-john.stu...@linaro.org/

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu