Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-08-11 Thread Takashi Iwai
At Sun, 10 Aug 2014 16:58:02 +0200,
Luis R. Rodriguez wrote:
 
 On Sun, Aug 10, 2014 at 08:43:31PM +0800, Greg KH wrote:
  On Sat, Aug 09, 2014 at 06:41:19PM +0200, Luis R. Rodriguez wrote:
   On Wed, Jul 30, 2014 at 03:11:07PM -0700, David Miller wrote:
From: Luis R. Rodriguez mcg...@do-not-panic.com
Date: Mon, 28 Jul 2014 11:28:28 -0700

 Tetsuo bisected and found that commit 786235ee kthread: make
 kthread_create() killable modified kthread_create() to bail as
 soon as SIGKILL is received. This is causing some issues with
 some drivers and at times boot. Joseph then found that failures
 occur as the systemd-udevd process sends SIGKILL to modprobe if
 probe on a driver takes over 30 seconds. When this happens probe
 will fail on any driver, its why booting on some system will fail
 if the driver happens to be a storage related driver. Some folks
 have suggested fixing this by modifying kthread_create() to not
 leave upon SIGKILL [3], upon review Oleg rejected this change and
 the discussion was punted out to systemd to see if the default
 timeout could be increased from 30 seconds to 120. The opinion of
 the systemd maintainers is that the driver's behavior should
 be fixed [4]. Linus seems to agree [5], however more recently even
 networking drivers have been reported to fail on probe since just
 writing the firmware to a device and kicking it can take easy over
 60 seconds [6]. Benjamim was able to trace the issues recently
 reported on cxgb4 down to the same systemd-udevd 30 second timeout 
 [6].
 
 This is an alternative solution which enables drivers that are
 known to take long to use deferred probe workqueue. This avoids
 the 30 second timeout and lets us annotate drivers with long
 init sequences.
 
 As drivers determine a component is not yet available and needs
 to defer probe you'll be notified this happen upon init for each
 device but now with a message such as:
 
 pci :03:00.0: Driver cxgb4 requests probe deferral on init
 
 You should see one of these per struct device probed.

It seems we're still discussing this.

I think I understand all of the underlying issues, and what I'll say
is that perhaps we should use what Greg KH requested but via a helper
that is easy to grep for.

I don't care if it's something like module_long_probe_init() and
module_long_probe_exit(), but it just needs to be some properly
named interface which does the whole kthread or whatever bit.
   
   I've tested the alternative kthread_run() proposal but unfortunately it
   does not help resolve the issue, the timeout is still hit and a SIGKILL
   still kills the driver probe. Please let me know how you'd all like us
   to proceed, these defer probe patches do help cure the issue though.
  
  Why doesn't it work?  Doesn't modprobe come right back and the init
  sequence still takes a while to run?  What exactly fails?
 
 systemd uses kmod kmod_module_probe_insert_module(), what I see is that using
 kthread_run() as a work around still causes probe to fail on a driver that
 otherwise would work fine if all you do is sprinkle ssleep(33) (seconds) on 
 the
 init sequence. To see the issue you can test this on any of your network
 drivers that get loaded automatically, I did my testing with iwlwifi by
 purposely breaking it and then using the work around. It seems the probe
 somehow still gets killed as before, I haven't debugged this further.
 
 For example by breaking and fixing iwlwifi this yields:
 
 ergon:~ # journalctl -b -0 -u systemd-udevd 
 
 -- Logs begin at Mon 2014-08-04 21:55:28 EDT, end at Sun 2014-08-10 10:50:14 
 EDT. --
 Aug 10 10:48:49 ergon systemd-udevd[257]: specified group 'input' unknown
 Aug 10 10:48:55 ergon mtp-probe[493]: checking bus 3, device 2: 
 /sys/devices/pci:00/:00:14.0/usb3/3-7
 Aug 10 10:48:55 ergon mtp-probe[492]: checking bus 3, device 4: 
 /sys/devices/pci:00/:00:14.0/usb3/3-12
 Aug 10 10:49:24 ergon systemd-udevd[465]: seq 1755 
 '/devices/pci:00/:00:1c.1/:03:00.0' killed
 
 ergon:~ # dmesg | grep iwlwifi
 [   10.315538] iwlwifi Going to sleep for 33 seconds
 [   43.323958] iwlwifi Done sleeping
 [   43.324400] iwlwifi :03:00.0: irq 50 for MSI/MSI-X
 [   43.324534] iwlwifi :03:00.0: Error allocating IRQ 50
 [   43.326951] iwlwifi: probe of :03:00.0 failed with error -4
 
 The patch used:
 
 diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c 
 b/drivers/net/wireless/iwlwifi/mvm/ops.c
 index 610dbcb..65e0ab2 100644
 --- a/drivers/net/wireless/iwlwifi/mvm/ops.c
 +++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
 @@ -63,6 +63,7 @@
  #include linux/module.h
  #include linux/vmalloc.h
  #include net/mac80211.h
 +#include linux/kthread.h
  
  #include iwl-notif-wait.h
  #include iwl-trans.h
 @@ -111,7 +112,7 @@ MODULE_PARM_DESC(power_scheme,
  /*
   * module init and 

Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-08-11 Thread Luis R. Rodriguez
On Mon, Aug 11, 2014 at 11:27 AM, Takashi Iwai ti...@suse.de wrote:
 Luis R. Rodriguez wrote:

 On Sun, Aug 10, 2014 at 08:43:31PM +0800, Greg KH wrote:
  On Sat, Aug 09, 2014 at 06:41:19PM +0200, Luis R. Rodriguez wrote:
   On Wed, Jul 30, 2014 at 03:11:07PM -0700, David Miller wrote:
From: Luis R. Rodriguez mcg...@do-not-panic.com
Date: Mon, 28 Jul 2014 11:28:28 -0700
   
 Tetsuo bisected and found that commit 786235ee kthread: make
 kthread_create() killable modified kthread_create() to bail as
 soon as SIGKILL is received. This is causing some issues with
 some drivers and at times boot. Joseph then found that failures
 occur as the systemd-udevd process sends SIGKILL to modprobe if
 probe on a driver takes over 30 seconds. When this happens probe
 will fail on any driver, its why booting on some system will fail
 if the driver happens to be a storage related driver. Some folks
 have suggested fixing this by modifying kthread_create() to not
 leave upon SIGKILL [3], upon review Oleg rejected this change and
 the discussion was punted out to systemd to see if the default
 timeout could be increased from 30 seconds to 120. The opinion of
 the systemd maintainers is that the driver's behavior should
 be fixed [4]. Linus seems to agree [5], however more recently even
 networking drivers have been reported to fail on probe since just
 writing the firmware to a device and kicking it can take easy over
 60 seconds [6]. Benjamim was able to trace the issues recently
 reported on cxgb4 down to the same systemd-udevd 30 second timeout 
 [6].

 This is an alternative solution which enables drivers that are
 known to take long to use deferred probe workqueue. This avoids
 the 30 second timeout and lets us annotate drivers with long
 init sequences.

 As drivers determine a component is not yet available and needs
 to defer probe you'll be notified this happen upon init for each
 device but now with a message such as:

 pci :03:00.0: Driver cxgb4 requests probe deferral on init

 You should see one of these per struct device probed.
   
It seems we're still discussing this.
   
I think I understand all of the underlying issues, and what I'll say
is that perhaps we should use what Greg KH requested but via a helper
that is easy to grep for.
   
I don't care if it's something like module_long_probe_init() and
module_long_probe_exit(), but it just needs to be some properly
named interface which does the whole kthread or whatever bit.
  
   I've tested the alternative kthread_run() proposal but unfortunately it
   does not help resolve the issue, the timeout is still hit and a SIGKILL
   still kills the driver probe. Please let me know how you'd all like us
   to proceed, these defer probe patches do help cure the issue though.
 
  Why doesn't it work?  Doesn't modprobe come right back and the init
  sequence still takes a while to run?  What exactly fails?

 systemd uses kmod kmod_module_probe_insert_module(), what I see is that using
 kthread_run() as a work around still causes probe to fail on a driver that
 otherwise would work fine if all you do is sprinkle ssleep(33) (seconds) on 
 the
 init sequence. To see the issue you can test this on any of your network
 drivers that get loaded automatically, I did my testing with iwlwifi by
 purposely breaking it and then using the work around. It seems the probe
 somehow still gets killed as before, I haven't debugged this further.

 For example by breaking and fixing iwlwifi this yields:

 ergon:~ # journalctl -b -0 -u systemd-udevd

 -- Logs begin at Mon 2014-08-04 21:55:28 EDT, end at Sun 2014-08-10 10:50:14 
 EDT. --
 Aug 10 10:48:49 ergon systemd-udevd[257]: specified group 'input' unknown
 Aug 10 10:48:55 ergon mtp-probe[493]: checking bus 3, device 2: 
 /sys/devices/pci:00/:00:14.0/usb3/3-7
 Aug 10 10:48:55 ergon mtp-probe[492]: checking bus 3, device 4: 
 /sys/devices/pci:00/:00:14.0/usb3/3-12
 Aug 10 10:49:24 ergon systemd-udevd[465]: seq 1755 
 '/devices/pci:00/:00:1c.1/:03:00.0' killed

 ergon:~ # dmesg | grep iwlwifi
 [   10.315538] iwlwifi Going to sleep for 33 seconds
 [   43.323958] iwlwifi Done sleeping
 [   43.324400] iwlwifi :03:00.0: irq 50 for MSI/MSI-X
 [   43.324534] iwlwifi :03:00.0: Error allocating IRQ 50
 [   43.326951] iwlwifi: probe of :03:00.0 failed with error -4

 The patch used:

 diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c 
 b/drivers/net/wireless/iwlwifi/mvm/ops.c
 index 610dbcb..65e0ab2 100644
 --- a/drivers/net/wireless/iwlwifi/mvm/ops.c
 +++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
 @@ -63,6 +63,7 @@
  #include linux/module.h
  #include linux/vmalloc.h
  #include net/mac80211.h
 +#include linux/kthread.h

  #include iwl-notif-wait.h
  #include iwl-trans.h
 @@ -111,7 +112,7 @@ MODULE_PARM_DESC(power_scheme,
  /*
   * module 

Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-08-10 Thread Greg KH
On Sat, Aug 09, 2014 at 06:41:19PM +0200, Luis R. Rodriguez wrote:
 On Wed, Jul 30, 2014 at 03:11:07PM -0700, David Miller wrote:
  From: Luis R. Rodriguez mcg...@do-not-panic.com
  Date: Mon, 28 Jul 2014 11:28:28 -0700
  
   Tetsuo bisected and found that commit 786235ee kthread: make
   kthread_create() killable modified kthread_create() to bail as
   soon as SIGKILL is received. This is causing some issues with
   some drivers and at times boot. Joseph then found that failures
   occur as the systemd-udevd process sends SIGKILL to modprobe if
   probe on a driver takes over 30 seconds. When this happens probe
   will fail on any driver, its why booting on some system will fail
   if the driver happens to be a storage related driver. Some folks
   have suggested fixing this by modifying kthread_create() to not
   leave upon SIGKILL [3], upon review Oleg rejected this change and
   the discussion was punted out to systemd to see if the default
   timeout could be increased from 30 seconds to 120. The opinion of
   the systemd maintainers is that the driver's behavior should
   be fixed [4]. Linus seems to agree [5], however more recently even
   networking drivers have been reported to fail on probe since just
   writing the firmware to a device and kicking it can take easy over
   60 seconds [6]. Benjamim was able to trace the issues recently
   reported on cxgb4 down to the same systemd-udevd 30 second timeout [6].
   
   This is an alternative solution which enables drivers that are
   known to take long to use deferred probe workqueue. This avoids
   the 30 second timeout and lets us annotate drivers with long
   init sequences.
   
   As drivers determine a component is not yet available and needs
   to defer probe you'll be notified this happen upon init for each
   device but now with a message such as:
   
   pci :03:00.0: Driver cxgb4 requests probe deferral on init
   
   You should see one of these per struct device probed.
  
  It seems we're still discussing this.
  
  I think I understand all of the underlying issues, and what I'll say
  is that perhaps we should use what Greg KH requested but via a helper
  that is easy to grep for.
  
  I don't care if it's something like module_long_probe_init() and
  module_long_probe_exit(), but it just needs to be some properly
  named interface which does the whole kthread or whatever bit.
 
 I've tested the alternative kthread_run() proposal but unfortunately it
 does not help resolve the issue, the timeout is still hit and a SIGKILL
 still kills the driver probe. Please let me know how you'd all like us
 to proceed, these defer probe patches do help cure the issue though.

Why doesn't it work?  Doesn't modprobe come right back and the init
sequence still takes a while to run?  What exactly fails?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-08-09 Thread Luis R. Rodriguez
On Wed, Jul 30, 2014 at 03:11:07PM -0700, David Miller wrote:
 From: Luis R. Rodriguez mcg...@do-not-panic.com
 Date: Mon, 28 Jul 2014 11:28:28 -0700
 
  Tetsuo bisected and found that commit 786235ee kthread: make
  kthread_create() killable modified kthread_create() to bail as
  soon as SIGKILL is received. This is causing some issues with
  some drivers and at times boot. Joseph then found that failures
  occur as the systemd-udevd process sends SIGKILL to modprobe if
  probe on a driver takes over 30 seconds. When this happens probe
  will fail on any driver, its why booting on some system will fail
  if the driver happens to be a storage related driver. Some folks
  have suggested fixing this by modifying kthread_create() to not
  leave upon SIGKILL [3], upon review Oleg rejected this change and
  the discussion was punted out to systemd to see if the default
  timeout could be increased from 30 seconds to 120. The opinion of
  the systemd maintainers is that the driver's behavior should
  be fixed [4]. Linus seems to agree [5], however more recently even
  networking drivers have been reported to fail on probe since just
  writing the firmware to a device and kicking it can take easy over
  60 seconds [6]. Benjamim was able to trace the issues recently
  reported on cxgb4 down to the same systemd-udevd 30 second timeout [6].
  
  This is an alternative solution which enables drivers that are
  known to take long to use deferred probe workqueue. This avoids
  the 30 second timeout and lets us annotate drivers with long
  init sequences.
  
  As drivers determine a component is not yet available and needs
  to defer probe you'll be notified this happen upon init for each
  device but now with a message such as:
  
  pci :03:00.0: Driver cxgb4 requests probe deferral on init
  
  You should see one of these per struct device probed.
 
 It seems we're still discussing this.
 
 I think I understand all of the underlying issues, and what I'll say
 is that perhaps we should use what Greg KH requested but via a helper
 that is easy to grep for.
 
 I don't care if it's something like module_long_probe_init() and
 module_long_probe_exit(), but it just needs to be some properly
 named interface which does the whole kthread or whatever bit.

I've tested the alternative kthread_run() proposal but unfortunately it
does not help resolve the issue, the timeout is still hit and a SIGKILL
still kills the driver probe. Please let me know how you'd all like us
to proceed, these defer probe patches do help cure the issue though.

I should also note that these work around patches can only be done once we
already know a driver fails to go over the timeout, root causing and
associating driver issues to the timeout has been very difficult with a few
drivers already, for this reason I've submitted a change for systemd to issue a
warning instead of killing kmod usage on udev after a timeout, that would make
this regression non-fatal, and let us more easily then hunt drivers that need
fixing much easily [0] [1]. As noted we'd still want to have drivers easily
annotated which require fixing, this orignal series would allow us to do that
by hunting for delay_probe. If there alternative and preferred strategies
please let me know.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021812.html
[1] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021821.html

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-30 Thread David Miller
From: Luis R. Rodriguez mcg...@do-not-panic.com
Date: Mon, 28 Jul 2014 11:28:28 -0700

 Tetsuo bisected and found that commit 786235ee kthread: make
 kthread_create() killable modified kthread_create() to bail as
 soon as SIGKILL is received. This is causing some issues with
 some drivers and at times boot. Joseph then found that failures
 occur as the systemd-udevd process sends SIGKILL to modprobe if
 probe on a driver takes over 30 seconds. When this happens probe
 will fail on any driver, its why booting on some system will fail
 if the driver happens to be a storage related driver. Some folks
 have suggested fixing this by modifying kthread_create() to not
 leave upon SIGKILL [3], upon review Oleg rejected this change and
 the discussion was punted out to systemd to see if the default
 timeout could be increased from 30 seconds to 120. The opinion of
 the systemd maintainers is that the driver's behavior should
 be fixed [4]. Linus seems to agree [5], however more recently even
 networking drivers have been reported to fail on probe since just
 writing the firmware to a device and kicking it can take easy over
 60 seconds [6]. Benjamim was able to trace the issues recently
 reported on cxgb4 down to the same systemd-udevd 30 second timeout [6].
 
 This is an alternative solution which enables drivers that are
 known to take long to use deferred probe workqueue. This avoids
 the 30 second timeout and lets us annotate drivers with long
 init sequences.
 
 As drivers determine a component is not yet available and needs
 to defer probe you'll be notified this happen upon init for each
 device but now with a message such as:
 
 pci :03:00.0: Driver cxgb4 requests probe deferral on init
 
 You should see one of these per struct device probed.

It seems we're still discussing this.

I think I understand all of the underlying issues, and what I'll say
is that perhaps we should use what Greg KH requested but via a helper
that is easy to grep for.

I don't care if it's something like module_long_probe_init() and
module_long_probe_exit(), but it just needs to be some properly
named interface which does the whole kthread or whatever bit.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-29 Thread Hannes Reinecke

On 07/29/2014 03:13 AM, Luis R. Rodriguez wrote:

On Mon, Jul 28, 2014 at 5:35 PM, Greg KH gre...@linuxfoundation.org wrote:

On Mon, Jul 28, 2014 at 05:26:34PM -0700, Luis R. Rodriguez wrote:

On Mon, Jul 28, 2014 at 4:46 PM, Greg KH gre...@linuxfoundation.org wrote:

On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:

On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:

On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org wrote:

So, what drivers are having problems in their init sequence, and why
aren't they using async firmware loading?


Fixing drivers is one thing, fixing drivers *now* because *now*
drivers are failing due to a regression is another thing and that's
what we have now so lets just be clear on that. The 30 second rule is
a major driver requirement change and it should not be taken slightly,
all of a sudden this is a new requirement but you won't know that
unless you're reading these threads or hit an issue. That's an issue
in itself.


That regression is something that userspace has decided to do, not
anything the kernel changed,


Actually commit 786235ee seems to have been the one that caused this
issue, systemd would just send the SIGKILL and that change forced a
bail on probe then hence Canonical's work around to modify
kthread_create() to not leave upon SIGKILL:

http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123


so perhaps you should just patch your
modprobe and be done with it until you can fix up those drivers?


To ignore SIGKILL ?


Sorry, I thought this was a userspace change that caused this.

As it's a kernel change, well, maybe that patch should be reverted...


That's certainly viable. Oleg?

If its reverted we won't know which drivers are hitting over the new
30 second limit requirement imposed by userspace, which the culprit
commit forces failure on probe now. This series at least would allow
us to annotate which drivers are brake the 30 second init limit, and
enable a work around alternative if we wish to not revert the commit.
This  series essentially should be considered an alternative solution
to what was proposed initially by Joseph Salisbury, it may not be
perfect but its a proposal. I welcome others.


And putting a horrid hack in the driver core, just because of some
really bad drivers, is not ok, that's an interface _I_ will have to
support for the next few decades.


I understand, hence review.


And it's going to take you a while to get something like this ever
merged in anyway, odds are you can fix up the driver faster...


That requires quite a bit of changes and commitment and again, there
are quite a bit of drivers that we can run into in the community,
we've just spotted 2 so far here for now.


The cxgb4: driver is an example where although I did propose patches
to use asynch firmware loading the entire registration of the
netdevice would need to be changed as well in order to get this right.
In short we have to scramble now to first identify drivers that have
long init sequences and then fix. There's an assumption that we can
easily fix drivers, this can take time. This series, although does
take advantage of a kernel interface for other uses, lets us identify
these drivers on the kernel ring buffer, so we can go and address
fixing them with time.


Another thing that came up during asynch firmware review / integration
on cxgb4 was that it would not be the only thing that would need to be
fixed, the driver also has a ton of ports and at least as per my
review the proper fix would be to use platform regiister stuff. It is
a major rewrite of the driver but an example of a driver that needs
quite a bit of work to meet this new 30 second driver requirement.


It shouldn't be using any platform driver stuff, it's a pci device, not
a platform device.


The general PCI stuff is already used, the reason for suggesting the
platform_device_register_simple() stuff is it has tons of ports and
each port will register in turn a new struct netdevice, essentially
one device can end up having tons of different network devices, the
platform stuff would be to allow handling each netdevice candidate
separately as part of the internal driver architecture, right now its
some scary loop thing that in my eyes can be very error prone.
drivers/net/ethernet/8390/ne.c. This discussion:


No, don't use platform devices as a catch-all for when you don't want
to write your own bus code.  This looks like a device-specific bus,
great, write the code to do that, it can be done in about 200 lines,
with comments and whitespace.

Only use platform devices for, wait for it, platform devices.  And even
then, reconsider and use something else if at all possible.


OK thanks I had asked for advice for this a while back on that old
thread as I wasn't sure if that was the best.


Why not just put the initial register the device in a single-shot
workqueue or thread or something like that so that modprobe 

Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-29 Thread Greg KH
On Mon, Jul 28, 2014 at 06:13:43PM -0700, Luis R. Rodriguez wrote:
   Why not just put the initial register the device in a single-shot
   workqueue or thread or something like that so that modprobe returns
   instantly back with a success and all is fine?
 
  That surely is possible but why not a general solution for such kludges?
 
  Because the driver should be fixed.  How hard would it be to do what I
  just suggested here, 20 lines added at most?
 
 I appreciate the feedback, but don't look at me, I'd happy to go nutty
 on ripping things apart from hairy drivers, however Chelsie folks
 expressed concerns over major rework on the driver... so even if we
 did have something I expect things to take a while to bake / gain
 confidence from others.

rework?  Heh, here's a patch that adds 10 lines to the mptsas driver
that should also work for any other driver as well.  Why not just do
this instead?

 This also just addresses this *one* Ethernet driver, there was that
 SCSI driver that created the original report -- Canonical merged
 Joseph's fix as a work around,

What fix was that?

 there is no general solution for this yet, and again with that work
 around you won't find which drivers run into this issue.

Great, fix them as they are found, that's fine.  Again, don't add stuff
to the driver core to paper over crappy drivers, I'm not going to take
that type of change.  I _only_ took the defer binding code as there was
no other way for an individual driver to deal with things if it's
resources were not present yet due to binding order in the system.

So, anyone care to test the patch below on a system that has this
problem to let me know if it would work or not?  Odds are, this should
be a workqueue, to make it cleaner, but a kthread is just so easy to
use...

thanks,

greg k-h

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 711fcb5cec87..4fd4f36a2d9e 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -51,6 +51,7 @@
 #include linux/jiffies.h
 #include linux/workqueue.h
 #include linux/delay.h   /* for mdelay */
+#include linux/kthread.h
 
 #include scsi/scsi.h
 #include scsi/scsi_cmnd.h
@@ -5393,8 +5394,7 @@ static struct pci_driver mptsas_driver = {
 #endif
 };
 
-static int __init
-mptsas_init(void)
+static int mptsas_real_init(void *data)
 {
int error;
 
@@ -5429,9 +5429,19 @@ mptsas_init(void)
return error;
 }
 
+static struct task_struct *init_thread;
+
+static int __init
+mptsas_init(void)
+{
+   init_thread = kthread_run(mptsas_real_init, NULL, mptsas_init);
+   return 0;
+}
+
 static void __exit
 mptsas_exit(void)
 {
+   kthread_stop(init_thread);
pci_unregister_driver(mptsas_driver);
sas_release_transport(mptsas_transport_template);
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-29 Thread Luis R. Rodriguez
On Tue, Jul 29, 2014 at 04:14:22PM -0700, Greg KH wrote:
 On Mon, Jul 28, 2014 at 06:13:43PM -0700, Luis R. Rodriguez wrote:
Why not just put the initial register the device in a single-shot
workqueue or thread or something like that so that modprobe returns
instantly back with a success and all is fine?
  
   That surely is possible but why not a general solution for such kludges?
  
   Because the driver should be fixed.  How hard would it be to do what I
   just suggested here, 20 lines added at most?
  
  I appreciate the feedback, but don't look at me, I'd happy to go nutty
  on ripping things apart from hairy drivers, however Chelsie folks
  expressed concerns over major rework on the driver... so even if we
  did have something I expect things to take a while to bake / gain
  confidence from others.
 
 rework?  Heh, here's a patch that adds 10 lines to the mptsas driver
 that should also work for any other driver as well.  Why not just do
 this instead?

That's not a rework, that's a kludge, doing something similar for other
drivers would be replicating kludges, the deferred probe use was trying
to generalize a kludge with 3 lines of code. But I'm no not yet convinced
its the best solution now.

  This also just addresses this *one* Ethernet driver, there was that
  SCSI driver that created the original report -- Canonical merged
  Joseph's fix as a work around,
 
 What fix was that?

https://launchpadlibrarian.net/169714201/kthread-Do-not-leave-kthread_create-immediately.patch

It was reviewed and Oleg preferred the timeout instead be reviewed
on systemd devel mailing list. That didn't go anywhere but today Hannes
posted a patch and that got merged. Its still not solving all issues
though as it lets us override the timeout value, systems / drivers
can still crash without a long timeout.

  there is no general solution for this yet, and again with that work
  around you won't find which drivers run into this issue.
 
 Great, fix them as they are found, that's fine. 

Are we really OK in letting distributions have to deal with crashes
because of this new driver 30 second timeout ? I think warning about
it would be better and more friendlier, no? What gains do we have to
kill the damn thing?

 Again, don't add stuff
 to the driver core to paper over crappy drivers, I'm not going to take
 that type of change.  I _only_ took the defer binding code as there was
 no other way for an individual driver to deal with things if it's
 resources were not present yet due to binding order in the system.

Ok but its a bit unfair to force killing drivers over a new driver 30 second
timeout requirement for a change that was made implicitly through a series
of collateral changes.

 So, anyone care to test the patch below on a system that has this
 problem to let me know if it would work or not?  Odds are, this should
 be a workqueue, to make it cleaner, but a kthread is just so easy to
 use...
 
 thanks,
 
 greg k-h
 
 diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
 index 711fcb5cec87..4fd4f36a2d9e 100644
 --- a/drivers/message/fusion/mptsas.c
 +++ b/drivers/message/fusion/mptsas.c
 @@ -51,6 +51,7 @@
  #include linux/jiffies.h
  #include linux/workqueue.h
  #include linux/delay.h /* for mdelay */
 +#include linux/kthread.h
  
  #include scsi/scsi.h
  #include scsi/scsi_cmnd.h
 @@ -5393,8 +5394,7 @@ static struct pci_driver mptsas_driver = {
  #endif
  };
  
 -static int __init
 -mptsas_init(void)
 +static int mptsas_real_init(void *data)
  {
   int error;
  
 @@ -5429,9 +5429,19 @@ mptsas_init(void)
   return error;
  }
  
 +static struct task_struct *init_thread;
 +
 +static int __init
 +mptsas_init(void)
 +{
 + init_thread = kthread_run(mptsas_real_init, NULL, mptsas_init);
 + return 0;
 +}
 +
  static void __exit
  mptsas_exit(void)
  {
 + kthread_stop(init_thread);
   pci_unregister_driver(mptsas_driver);
   sas_release_transport(mptsas_transport_template);

So we're OK to see these kludges sprinkled over the kernel instead of
genrealizing somethiing for them in the meantime?

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-29 Thread Greg KH
On Wed, Jul 30, 2014 at 02:05:41AM +0200, Luis R. Rodriguez wrote:
 On Tue, Jul 29, 2014 at 04:14:22PM -0700, Greg KH wrote:
  On Mon, Jul 28, 2014 at 06:13:43PM -0700, Luis R. Rodriguez wrote:
 Why not just put the initial register the device in a single-shot
 workqueue or thread or something like that so that modprobe returns
 instantly back with a success and all is fine?
   
That surely is possible but why not a general solution for such 
kludges?
   
Because the driver should be fixed.  How hard would it be to do what I
just suggested here, 20 lines added at most?
   
   I appreciate the feedback, but don't look at me, I'd happy to go nutty
   on ripping things apart from hairy drivers, however Chelsie folks
   expressed concerns over major rework on the driver... so even if we
   did have something I expect things to take a while to bake / gain
   confidence from others.
  
  rework?  Heh, here's a patch that adds 10 lines to the mptsas driver
  that should also work for any other driver as well.  Why not just do
  this instead?
 
 That's not a rework, that's a kludge, doing something similar for other
 drivers would be replicating kludges, the deferred probe use was trying
 to generalize a kludge with 3 lines of code. But I'm no not yet convinced
 its the best solution now.

I'm not saying it's pretty, but it confied to just the broken module,
and also, it's obvious what needs to be fixed if someone cares.

It sounds like no one cares about these moduls :)

Adding it to the driver core ensures that the drivers will _never_ be
changed, which isn't ok with me, sorry.


   This also just addresses this *one* Ethernet driver, there was that
   SCSI driver that created the original report -- Canonical merged
   Joseph's fix as a work around,
  
  What fix was that?
 
 https://launchpadlibrarian.net/169714201/kthread-Do-not-leave-kthread_create-immediately.patch
 
 It was reviewed and Oleg preferred the timeout instead be reviewed
 on systemd devel mailing list. That didn't go anywhere but today Hannes
 posted a patch and that got merged. Its still not solving all issues
 though as it lets us override the timeout value, systems / drivers
 can still crash without a long timeout.

Great, work it out with them, again, stay away from the driver core for
this...

   there is no general solution for this yet, and again with that work
   around you won't find which drivers run into this issue.
  
  Great, fix them as they are found, that's fine. 
 
 Are we really OK in letting distributions have to deal with crashes
 because of this new driver 30 second timeout ?

If you don't want to, then revert the kernel change that caused it for
your distro kernels.

 I think warning about it would be better and more friendlier, no? What
 gains do we have to kill the damn thing?

Take it up with the owners of that code...

  Again, don't add stuff
  to the driver core to paper over crappy drivers, I'm not going to take
  that type of change.  I _only_ took the defer binding code as there was
  no other way for an individual driver to deal with things if it's
  resources were not present yet due to binding order in the system.
 
 Ok but its a bit unfair to force killing drivers over a new driver 30 second
 timeout requirement for a change that was made implicitly through a series
 of collateral changes.

I'm not disagreeing with that, but hey, life isn't fair, and things
needs to get fixed all the time...

I say fix it _properly_ by fixing the drivers.

best of luck,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Luis R. Rodriguez
From: Luis R. Rodriguez mcg...@suse.com

Tetsuo bisected and found that commit 786235ee kthread: make
kthread_create() killable modified kthread_create() to bail as
soon as SIGKILL is received. This is causing some issues with
some drivers and at times boot. Joseph then found that failures
occur as the systemd-udevd process sends SIGKILL to modprobe if
probe on a driver takes over 30 seconds. When this happens probe
will fail on any driver, its why booting on some system will fail
if the driver happens to be a storage related driver. Some folks
have suggested fixing this by modifying kthread_create() to not
leave upon SIGKILL [3], upon review Oleg rejected this change and
the discussion was punted out to systemd to see if the default
timeout could be increased from 30 seconds to 120. The opinion of
the systemd maintainers is that the driver's behavior should
be fixed [4]. Linus seems to agree [5], however more recently even
networking drivers have been reported to fail on probe since just
writing the firmware to a device and kicking it can take easy over
60 seconds [6]. Benjamim was able to trace the issues recently
reported on cxgb4 down to the same systemd-udevd 30 second timeout [6].

This is an alternative solution which enables drivers that are
known to take long to use deferred probe workqueue. This avoids
the 30 second timeout and lets us annotate drivers with long
init sequences.

As drivers determine a component is not yet available and needs
to defer probe you'll be notified this happen upon init for each
device but now with a message such as:

pci :03:00.0: Driver cxgb4 requests probe deferral on init

You should see one of these per struct device probed.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
[2] http://lists.freedesktop.org/archives/systemd-devel/2014-March/018006.html
[3] http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123
[4] http://article.gmane.org/gmane.comp.sysutils.systemd.devel/17860
[5] http://article.gmane.org/gmane.linux.kernel/1671333
[6] https://bugzilla.novell.com/show_bug.cgi?id=877622

Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Tetsuo Handa penguin-ker...@i-love.sakura.ne.jp
Cc: Joseph Salisbury joseph.salisb...@canonical.com
Cc: Kay Sievers k...@vrfy.org
Cc: One Thousand Gnomes gno...@lxorguk.ukuu.org.uk
Cc: Tim Gardner tim.gard...@canonical.com
Cc: Pierre Fersing pierre-fers...@pierref.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Oleg Nesterov o...@redhat.com
Cc: Benjamin Poirier bpoir...@suse.de
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Nagalakshmi Nandigama nagalakshmi.nandig...@avagotech.com
Cc: Praveen Krishnamoorthy praveen.krishnamoor...@avagotech.com
Cc: Sreekanth Reddy sreekanth.re...@avagotech.com
Cc: Abhijit Mahajan abhijit.maha...@avagotech.com
Cc: Hariprasad S haripra...@chelsio.com
Cc: Santosh Rastapur sant...@chelsio.com
Cc: mpt-fusionlinux@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: net...@vger.kernel.org
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
 drivers/base/dd.c  | 18 +-
 include/linux/device.h |  7 +++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9947c2e..d59e4b5 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -159,6 +159,9 @@ static void driver_deferred_probe_add(struct device *dev)
list_add_tail(dev-p-deferred_probe, 
deferred_probe_pending_list);
}
mutex_unlock(deferred_probe_mutex);
+
+   if (driver_deferred_probe_enable)
+   driver_deferred_probe_trigger();
 }
 
 void driver_deferred_probe_del(struct device *dev)
@@ -374,6 +377,19 @@ void wait_for_device_probe(void)
 }
 EXPORT_SYMBOL_GPL(wait_for_device_probe);
 
+static int __driver_probe_device(struct device_driver *drv, struct device *dev)
+{
+   if (drv-delay_probe  !dev-init_delayed_probe) {
+   dev_info(dev, Driver %s requests probe deferral on init\n,
+drv-name);
+   dev-init_delayed_probe = true;
+   driver_deferred_probe_add(dev);
+   return -EPROBE_DEFER;
+   }
+
+   return really_probe(dev, drv);
+}
+
 /**
  * driver_probe_device - attempt to bind device  driver together
  * @drv: driver to bind a device to
@@ -396,7 +412,7 @@ int driver_probe_device(struct device_driver *drv, struct 
device *dev)
 drv-bus-name, __func__, dev_name(dev), drv-name);
 
pm_runtime_barrier(dev);
-   ret = really_probe(dev, drv);
+   ret = __driver_probe_device(drv, dev);
pm_request_idle(dev);
 
return ret;
diff --git a/include/linux/device.h b/include/linux/device.h
index af424ac..c317dfa 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -200,6 +200,9 @@ extern struct klist *bus_get_device_klist(struct bus_type 
*bus);
  * 

Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Greg KH
On Mon, Jul 28, 2014 at 11:28:28AM -0700, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 Tetsuo bisected and found that commit 786235ee kthread: make
 kthread_create() killable modified kthread_create() to bail as
 soon as SIGKILL is received. This is causing some issues with
 some drivers and at times boot. Joseph then found that failures
 occur as the systemd-udevd process sends SIGKILL to modprobe if
 probe on a driver takes over 30 seconds.

Because no driver should ever take that long for their probe function to
return.  Why not fix those drivers?

 When this happens probe will fail on any driver, its why booting on
 some system will fail if the driver happens to be a storage related
 driver. Some folks
 have suggested fixing this by modifying kthread_create() to not
 leave upon SIGKILL [3], upon review Oleg rejected this change and
 the discussion was punted out to systemd to see if the default
 timeout could be increased from 30 seconds to 120. The opinion of
 the systemd maintainers is that the driver's behavior should
 be fixed [4]. Linus seems to agree [5], however more recently even
 networking drivers have been reported to fail on probe since just
 writing the firmware to a device and kicking it can take easy over
 60 seconds [6]. Benjamim was able to trace the issues recently
 reported on cxgb4 down to the same systemd-udevd 30 second timeout [6].

Then use the async firmware interface, why is any driver taking longer
than less than a second in their init function?

 This is an alternative solution which enables drivers that are
 known to take long to use deferred probe workqueue. This avoids
 the 30 second timeout and lets us annotate drivers with long
 init sequences.
 
 As drivers determine a component is not yet available and needs
 to defer probe you'll be notified this happen upon init for each
 device but now with a message such as:
 
 pci :03:00.0: Driver cxgb4 requests probe deferral on init
 
 You should see one of these per struct device probed.

I'm all for abusing kernel interfaces, but please, no, don't try to use
the deferred init code to cover up for broken drivers.  Just fix them
properly, we have the interfaces to handle it properly (i.e. async
firmware loading), please use it.

And no PCI driver should ever need deferred init as the resources for
such a device is already present in the system.  Now if userspace is up
and running yet is a different issue, one that deferred init is not
there for at all, sorry.

So, what drivers are having problems in their init sequence, and why
aren't they using async firmware loading?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Luis R. Rodriguez
On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org wrote:
 So, what drivers are having problems in their init sequence, and why
 aren't they using async firmware loading?

Fixing drivers is one thing, fixing drivers *now* because *now*
drivers are failing due to a regression is another thing and that's
what we have now so lets just be clear on that. The 30 second rule is
a major driver requirement change and it should not be taken slightly,
all of a sudden this is a new requirement but you won't know that
unless you're reading these threads or hit an issue. That's an issue
in itself.

The cxgb4: driver is an example where although I did propose patches
to use asynch firmware loading the entire registration of the
netdevice would need to be changed as well in order to get this right.
In short we have to scramble now to first identify drivers that have
long init sequences and then fix. There's an assumption that we can
easily fix drivers, this can take time. This series, although does
take advantage of a kernel interface for other uses, lets us identify
these drivers on the kernel ring buffer, so we can go and address
fixing them with time.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Luis R. Rodriguez
On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org wrote:
 So, what drivers are having problems in their init sequence, and why
 aren't they using async firmware loading?

 Fixing drivers is one thing, fixing drivers *now* because *now*
 drivers are failing due to a regression is another thing and that's
 what we have now so lets just be clear on that. The 30 second rule is
 a major driver requirement change and it should not be taken slightly,
 all of a sudden this is a new requirement but you won't know that
 unless you're reading these threads or hit an issue. That's an issue
 in itself.

 The cxgb4: driver is an example where although I did propose patches
 to use asynch firmware loading the entire registration of the
 netdevice would need to be changed as well in order to get this right.
 In short we have to scramble now to first identify drivers that have
 long init sequences and then fix. There's an assumption that we can
 easily fix drivers, this can take time. This series, although does
 take advantage of a kernel interface for other uses, lets us identify
 these drivers on the kernel ring buffer, so we can go and address
 fixing them with time.

Another thing that came up during asynch firmware review / integration
on cxgb4 was that it would not be the only thing that would need to be
fixed, the driver also has a ton of ports and at least as per my
review the proper fix would be to use platform regiister stuff. It is
a major rewrite of the driver but an example of a driver that needs
quite a bit of work to meet this new 30 second driver requirement.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Greg KH
On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:
 On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
  On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org 
  wrote:
  So, what drivers are having problems in their init sequence, and why
  aren't they using async firmware loading?
 
  Fixing drivers is one thing, fixing drivers *now* because *now*
  drivers are failing due to a regression is another thing and that's
  what we have now so lets just be clear on that. The 30 second rule is
  a major driver requirement change and it should not be taken slightly,
  all of a sudden this is a new requirement but you won't know that
  unless you're reading these threads or hit an issue. That's an issue
  in itself.

That regression is something that userspace has decided to do, not
anything the kernel changed, so perhaps you should just patch your
modprobe and be done with it until you can fix up those drivers?

And putting a horrid hack in the driver core, just because of some
really bad drivers, is not ok, that's an interface _I_ will have to
support for the next few decades.

And it's going to take you a while to get something like this ever
merged in anyway, odds are you can fix up the driver faster...

  The cxgb4: driver is an example where although I did propose patches
  to use asynch firmware loading the entire registration of the
  netdevice would need to be changed as well in order to get this right.
  In short we have to scramble now to first identify drivers that have
  long init sequences and then fix. There's an assumption that we can
  easily fix drivers, this can take time. This series, although does
  take advantage of a kernel interface for other uses, lets us identify
  these drivers on the kernel ring buffer, so we can go and address
  fixing them with time.
 
 Another thing that came up during asynch firmware review / integration
 on cxgb4 was that it would not be the only thing that would need to be
 fixed, the driver also has a ton of ports and at least as per my
 review the proper fix would be to use platform regiister stuff. It is
 a major rewrite of the driver but an example of a driver that needs
 quite a bit of work to meet this new 30 second driver requirement.

It shouldn't be using any platform driver stuff, it's a pci device, not
a platform device.

Why not just put the initial register the device in a single-shot
workqueue or thread or something like that so that modprobe returns
instantly back with a success and all is fine?

Again, trying to hack the deferred init logic for PCI drivers isn't
ok, I'm not going to take that into the driver core if at all possible,
sorry.

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Luis R. Rodriguez
On Mon, Jul 28, 2014 at 4:46 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:
 On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
 mcg...@do-not-panic.com wrote:
  On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org 
  wrote:
  So, what drivers are having problems in their init sequence, and why
  aren't they using async firmware loading?
 
  Fixing drivers is one thing, fixing drivers *now* because *now*
  drivers are failing due to a regression is another thing and that's
  what we have now so lets just be clear on that. The 30 second rule is
  a major driver requirement change and it should not be taken slightly,
  all of a sudden this is a new requirement but you won't know that
  unless you're reading these threads or hit an issue. That's an issue
  in itself.

 That regression is something that userspace has decided to do, not
 anything the kernel changed,

Actually commit 786235ee seems to have been the one that caused this
issue, systemd would just send the SIGKILL and that change forced a
bail on probe then hence Canonical's work around to modify
kthread_create() to not leave upon SIGKILL:

http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123

 so perhaps you should just patch your
 modprobe and be done with it until you can fix up those drivers?

To ignore SIGKILL ?

 And putting a horrid hack in the driver core, just because of some
 really bad drivers, is not ok, that's an interface _I_ will have to
 support for the next few decades.

I understand, hence review.

 And it's going to take you a while to get something like this ever
 merged in anyway, odds are you can fix up the driver faster...

That requires quite a bit of changes and commitment and again, there
are quite a bit of drivers that we can run into in the community,
we've just spotted 2 so far here for now.

  The cxgb4: driver is an example where although I did propose patches
  to use asynch firmware loading the entire registration of the
  netdevice would need to be changed as well in order to get this right.
  In short we have to scramble now to first identify drivers that have
  long init sequences and then fix. There's an assumption that we can
  easily fix drivers, this can take time. This series, although does
  take advantage of a kernel interface for other uses, lets us identify
  these drivers on the kernel ring buffer, so we can go and address
  fixing them with time.

 Another thing that came up during asynch firmware review / integration
 on cxgb4 was that it would not be the only thing that would need to be
 fixed, the driver also has a ton of ports and at least as per my
 review the proper fix would be to use platform regiister stuff. It is
 a major rewrite of the driver but an example of a driver that needs
 quite a bit of work to meet this new 30 second driver requirement.

 It shouldn't be using any platform driver stuff, it's a pci device, not
 a platform device.

The general PCI stuff is already used, the reason for suggesting the
platform_device_register_simple() stuff is it has tons of ports and
each port will register in turn a new struct netdevice, essentially
one device can end up having tons of different network devices, the
platform stuff would be to allow handling each netdevice candidate
separately as part of the internal driver architecture, right now its
some scary loop thing that in my eyes can be very error prone.
drivers/net/ethernet/8390/ne.c. This discussion:

https://lkml.org/lkml/2014/6/25/815

 Why not just put the initial register the device in a single-shot
 workqueue or thread or something like that so that modprobe returns
 instantly back with a success and all is fine?

That surely is possible but why not a general solution for such kludges?

 Again, trying to hack the deferred init logic for PCI drivers isn't
 ok, I'm not going to take that into the driver core if at all possible,
 sorry.

No need to apologize I'm looking for the best solution here after all.
One userspace kludge is surely better than a one per driver while
drivers are fixed for this new driver requirement. Its just kind of
odd the circle of events for a kernel issue from kernel -- systemd
-- modprobe as a work around.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Greg KH
On Mon, Jul 28, 2014 at 05:26:34PM -0700, Luis R. Rodriguez wrote:
 On Mon, Jul 28, 2014 at 4:46 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:
  On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
  mcg...@do-not-panic.com wrote:
   On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org 
   wrote:
   So, what drivers are having problems in their init sequence, and why
   aren't they using async firmware loading?
  
   Fixing drivers is one thing, fixing drivers *now* because *now*
   drivers are failing due to a regression is another thing and that's
   what we have now so lets just be clear on that. The 30 second rule is
   a major driver requirement change and it should not be taken slightly,
   all of a sudden this is a new requirement but you won't know that
   unless you're reading these threads or hit an issue. That's an issue
   in itself.
 
  That regression is something that userspace has decided to do, not
  anything the kernel changed,
 
 Actually commit 786235ee seems to have been the one that caused this
 issue, systemd would just send the SIGKILL and that change forced a
 bail on probe then hence Canonical's work around to modify
 kthread_create() to not leave upon SIGKILL:
 
 http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123
 
  so perhaps you should just patch your
  modprobe and be done with it until you can fix up those drivers?
 
 To ignore SIGKILL ?

Sorry, I thought this was a userspace change that caused this.

As it's a kernel change, well, maybe that patch should be reverted...

  And putting a horrid hack in the driver core, just because of some
  really bad drivers, is not ok, that's an interface _I_ will have to
  support for the next few decades.
 
 I understand, hence review.
 
  And it's going to take you a while to get something like this ever
  merged in anyway, odds are you can fix up the driver faster...
 
 That requires quite a bit of changes and commitment and again, there
 are quite a bit of drivers that we can run into in the community,
 we've just spotted 2 so far here for now.
 
   The cxgb4: driver is an example where although I did propose patches
   to use asynch firmware loading the entire registration of the
   netdevice would need to be changed as well in order to get this right.
   In short we have to scramble now to first identify drivers that have
   long init sequences and then fix. There's an assumption that we can
   easily fix drivers, this can take time. This series, although does
   take advantage of a kernel interface for other uses, lets us identify
   these drivers on the kernel ring buffer, so we can go and address
   fixing them with time.
 
  Another thing that came up during asynch firmware review / integration
  on cxgb4 was that it would not be the only thing that would need to be
  fixed, the driver also has a ton of ports and at least as per my
  review the proper fix would be to use platform regiister stuff. It is
  a major rewrite of the driver but an example of a driver that needs
  quite a bit of work to meet this new 30 second driver requirement.
 
  It shouldn't be using any platform driver stuff, it's a pci device, not
  a platform device.
 
 The general PCI stuff is already used, the reason for suggesting the
 platform_device_register_simple() stuff is it has tons of ports and
 each port will register in turn a new struct netdevice, essentially
 one device can end up having tons of different network devices, the
 platform stuff would be to allow handling each netdevice candidate
 separately as part of the internal driver architecture, right now its
 some scary loop thing that in my eyes can be very error prone.
 drivers/net/ethernet/8390/ne.c. This discussion:

No, don't use platform devices as a catch-all for when you don't want
to write your own bus code.  This looks like a device-specific bus,
great, write the code to do that, it can be done in about 200 lines,
with comments and whitespace.

Only use platform devices for, wait for it, platform devices.  And even
then, reconsider and use something else if at all possible.

  Why not just put the initial register the device in a single-shot
  workqueue or thread or something like that so that modprobe returns
  instantly back with a success and all is fine?
 
 That surely is possible but why not a general solution for such kludges?

Because the driver should be fixed.  How hard would it be to do what I
just suggested here, 20 lines added at most?

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

2014-07-28 Thread Luis R. Rodriguez
On Mon, Jul 28, 2014 at 5:35 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Jul 28, 2014 at 05:26:34PM -0700, Luis R. Rodriguez wrote:
 On Mon, Jul 28, 2014 at 4:46 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:
  On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
  mcg...@do-not-panic.com wrote:
   On Mon, Jul 28, 2014 at 11:55 AM, Greg KH gre...@linuxfoundation.org 
   wrote:
   So, what drivers are having problems in their init sequence, and why
   aren't they using async firmware loading?
  
   Fixing drivers is one thing, fixing drivers *now* because *now*
   drivers are failing due to a regression is another thing and that's
   what we have now so lets just be clear on that. The 30 second rule is
   a major driver requirement change and it should not be taken slightly,
   all of a sudden this is a new requirement but you won't know that
   unless you're reading these threads or hit an issue. That's an issue
   in itself.
 
  That regression is something that userspace has decided to do, not
  anything the kernel changed,

 Actually commit 786235ee seems to have been the one that caused this
 issue, systemd would just send the SIGKILL and that change forced a
 bail on probe then hence Canonical's work around to modify
 kthread_create() to not leave upon SIGKILL:

 http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123

  so perhaps you should just patch your
  modprobe and be done with it until you can fix up those drivers?

 To ignore SIGKILL ?

 Sorry, I thought this was a userspace change that caused this.

 As it's a kernel change, well, maybe that patch should be reverted...

That's certainly viable. Oleg?

If its reverted we won't know which drivers are hitting over the new
30 second limit requirement imposed by userspace, which the culprit
commit forces failure on probe now. This series at least would allow
us to annotate which drivers are brake the 30 second init limit, and
enable a work around alternative if we wish to not revert the commit.
This  series essentially should be considered an alternative solution
to what was proposed initially by Joseph Salisbury, it may not be
perfect but its a proposal. I welcome others.

  And putting a horrid hack in the driver core, just because of some
  really bad drivers, is not ok, that's an interface _I_ will have to
  support for the next few decades.

 I understand, hence review.

  And it's going to take you a while to get something like this ever
  merged in anyway, odds are you can fix up the driver faster...

 That requires quite a bit of changes and commitment and again, there
 are quite a bit of drivers that we can run into in the community,
 we've just spotted 2 so far here for now.

   The cxgb4: driver is an example where although I did propose patches
   to use asynch firmware loading the entire registration of the
   netdevice would need to be changed as well in order to get this right.
   In short we have to scramble now to first identify drivers that have
   long init sequences and then fix. There's an assumption that we can
   easily fix drivers, this can take time. This series, although does
   take advantage of a kernel interface for other uses, lets us identify
   these drivers on the kernel ring buffer, so we can go and address
   fixing them with time.
 
  Another thing that came up during asynch firmware review / integration
  on cxgb4 was that it would not be the only thing that would need to be
  fixed, the driver also has a ton of ports and at least as per my
  review the proper fix would be to use platform regiister stuff. It is
  a major rewrite of the driver but an example of a driver that needs
  quite a bit of work to meet this new 30 second driver requirement.
 
  It shouldn't be using any platform driver stuff, it's a pci device, not
  a platform device.

 The general PCI stuff is already used, the reason for suggesting the
 platform_device_register_simple() stuff is it has tons of ports and
 each port will register in turn a new struct netdevice, essentially
 one device can end up having tons of different network devices, the
 platform stuff would be to allow handling each netdevice candidate
 separately as part of the internal driver architecture, right now its
 some scary loop thing that in my eyes can be very error prone.
 drivers/net/ethernet/8390/ne.c. This discussion:

 No, don't use platform devices as a catch-all for when you don't want
 to write your own bus code.  This looks like a device-specific bus,
 great, write the code to do that, it can be done in about 200 lines,
 with comments and whitespace.

 Only use platform devices for, wait for it, platform devices.  And even
 then, reconsider and use something else if at all possible.

OK thanks I had asked for advice for this a while back on that old
thread as I wasn't sure if that was the best.

  Why not just put the initial register the device in a single-shot