Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-30 Thread Pavel Machek

On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
 On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
  On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
   On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:

Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
scanning of attached devices (once the cards are probed)
   
   What would it do it card was a bit slow to probe?
   
but has a sync
point for ordering.
   
   Quite often we do not really care about ordering of devices. I mean,
   does it matter if your mouse is discovered before your keyboard or
   after?
  
  Actually yes, I suspect it does.
  
  I do evtest /dev/input/eventX by hand, occassionaly. It would be
  annoying if they moved between reboots.
 
 I am sorry but you will have to cope with such annoyances. It' snot like we 
 fail to boot the box here.
 
 The systems are now mostly hot-pluggable and userland is supposed to
 handle it, and it does, at least for input devices. If you want stable naming
 use udev facilities to rename devices as needed or add needed symlinks 
 (by-id, 
 etc.).

Well, it would be nice if udev was not mandatory. Do the sync points
for ordering actually cost us something?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-30 Thread Dmitry Torokhov
On Tue, Sep 30, 2014 at 11:06:34PM +0200, Pavel Machek wrote:
 
 On Mon 2014-09-22 13:23:54, Dmitry Torokhov wrote:
  On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
   On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
 
 Yes, but we mostly do this anyway.  SCSI for instance does 
 asynchronous
 scanning of attached devices (once the cards are probed)

What would it do it card was a bit slow to probe?

 but has a sync
 point for ordering.

Quite often we do not really care about ordering of devices. I mean,
does it matter if your mouse is discovered before your keyboard or
after?
   
   Actually yes, I suspect it does.
   
   I do evtest /dev/input/eventX by hand, occassionaly. It would be
   annoying if they moved between reboots.
  
  I am sorry but you will have to cope with such annoyances. It' snot like we 
  fail to boot the box here.
  
  The systems are now mostly hot-pluggable and userland is supposed to
  handle it, and it does, at least for input devices. If you want stable 
  naming
  use udev facilities to rename devices as needed or add needed symlinks 
  (by-id, 
  etc.).
 
 Well, it would be nice if udev was not mandatory. Do the sync points
 for ordering actually cost us something?

Yes, boot time. We can save a second or two off the boot time if we probe
several devices/drivers simultaneously.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-22 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 7:57 PM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 Why do we care about the priority of probing tasks?  Does that
 actually make any meaningful difference?  If so, how?

 As I noted before -- I have yet to provide clear metrics but at least
 changing both init paths + probe from finit_module() to kthread
 certainly had a measurable time increase, I suspect using
 queue_work(system_unbound_wq, async_probe_work) will make probe
 slower. I'll get to these metrics this week.

The results are in and I'm glad to report my suspicions were incorrect
about kthread() being slower than queue_work(system_unbound_wq), it
actually works faster. Results will likely vary depending on
subsystems but in this particular case the cxgb4 driver was tested
requiring firmware loading and then without requiring firmware loading
and for these two types of driver loading all mechanisms make probe
take just about the same out of time. What was surprising was that
when firmware loading is required the amount of time it takes to run
probe does vary and quite considerably in terms of microseconds. The
discrepancies are by no means terrible... but should be considered if
one is thinking of large systems and if we do wish to optimize things
further and offer equivalent behavior, specially when probing multiple
devices with the same driver. The method used to collect the amount of
time for probe was to use:

ktime_t calltime, delta, rettime;
calltime = ktime_get();
driver_attach();
rettime = ktime_get();
delta = ktime_sub(rettime, calltime);
duration = (unsigned long long) ktime_to_ns(delta)  10;

And then print that time of microsecond out right after it finishes,
whether that be through the default kernel synchronous run or the
async runs.

The collection and testing was then done by Santosh. Details of the
collections are at:

https://bugzilla.novell.com/show_bug.cgi?id=877622

The summary:

The driver actually probed 2 cards in the tests so we don't have
results for 1 card, the kernel serially calls probe for each device so
to get the amount of time for one run lets just divide the results by
2. For each strategy there is the requirement of using firmware and a
run where no firmware loading is required. The results for both cards
are:

=|
strategyfw (usec)   no-fw (usec) |
-|
synchronous 489451382615126  |
kthread 501328312619737  |
queue_work(system_unbound_wq)   498273232615262  |
-|

For one device then that comes out to:

=|
strategyfw (usec)   no-fw (usec) |
-|
synchronous 244725691307563  |
kthread 25066415.5  1309868.5|
queue_work(system_unbound_wq)   24913661.5  1307631  |
-|

Converting that to seconds:

=|
strategyfw (s)  no-fw (s)|
-|
synchronous 24.47   1.31 |
kthread 25.07   1.31 |
queue_work(system_unbound_wq)   24.91   1.31 |
-|

Graph friendly versions of the results for probe of 1 device:

Probe with firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-firmware.png

Probe without firmware:

http://drvbp1.linux-foundation.org/~mcgrof/images/probe-measurements/probe-cgxb4-no-firmware.png

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-22 Thread Pavel Machek
On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
 On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
  
  On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
   On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
 
 The thing is that we have to have dynamic mechanism to listen for
 device attachments no matter what and such mechanism has been in place
 for a long time at this point.  The synchronous wait simply doesn't
 serve any purpose anymore and kinda gets in the way in that it makes
 it a possibly extremely slow process to tell whether loading of a
 module succeeded or not because the wait for the initial round of
 probe is piggybacked.

OK, so we just fire and forget in userland ... why bother inventing an
elaborate new infrastructure in the kernel to do exactly what

modprobe mod 

would do?
   
   Just so we do not forget: we also want the no-modules case to also be able
   to probe asynchronously so that a slow device does not stall kernel 
   booting.
  
  Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
  scanning of attached devices (once the cards are probed)
 
 What would it do it card was a bit slow to probe?
 
  but has a sync
  point for ordering.
 
 Quite often we do not really care about ordering of devices. I mean,
 does it matter if your mouse is discovered before your keyboard or
 after?

Actually yes, I suspect it does.

I do evtest /dev/input/eventX by hand, occassionaly. It would be
annoying if they moved between reboots.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-22 Thread Dmitry Torokhov
On Monday, September 22, 2014 09:49:06 PM Pavel Machek wrote:
 On Thu 2014-09-11 13:23:54, Dmitry Torokhov wrote:
  On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
   On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
 On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
  The thing is that we have to have dynamic mechanism to listen for
  device attachments no matter what and such mechanism has been in
  place
  for a long time at this point.  The synchronous wait simply
  doesn't
  serve any purpose anymore and kinda gets in the way in that it
  makes
  it a possibly extremely slow process to tell whether loading of a
  module succeeded or not because the wait for the initial round of
  probe is piggybacked.
 
 OK, so we just fire and forget in userland ... why bother inventing
 an
 elaborate new infrastructure in the kernel to do exactly what
 
 modprobe mod 
 
 would do?

Just so we do not forget: we also want the no-modules case to also be
able
to probe asynchronously so that a slow device does not stall kernel
booting.  
   Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
   scanning of attached devices (once the cards are probed)
  
  What would it do it card was a bit slow to probe?
  
   but has a sync
   point for ordering.
  
  Quite often we do not really care about ordering of devices. I mean,
  does it matter if your mouse is discovered before your keyboard or
  after?
 
 Actually yes, I suspect it does.
 
 I do evtest /dev/input/eventX by hand, occassionaly. It would be
 annoying if they moved between reboots.

I am sorry but you will have to cope with such annoyances. It' snot like we 
fail to boot the box here.

The systems are now mostly hot-pluggable and userland is supposed to
handle it, and it does, at least for input devices. If you want stable naming
use udev facilities to rename devices as needed or add needed symlinks (by-id, 
etc.).

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-12 Thread Luis R. Rodriguez
On Tue, Sep 9, 2014 at 4:03 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
 Hello,

 On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
  On the systemd side of things it should enable this sysctl and for
  older kernels what should it do?

 Supposing the change is backported via -stable, it can try to set the
 sysctl on all kernels.  If the knob doesn't exist, the fix is not
 there and nothing can be done about it.

 The more I think about it, the more I think this should be a
 per-insmod instance thing rather than a system-wide switch.

Agreed, a good use case that comes to mind would be systemd's
modules-load.d lists used by systemd services to require modules, the
hooks there however likely expect probe to complete as part of the
service, since the timeout is not applicable to these the synchronous
probe for them would be good while systemd would use async probe for
regular modules.

 Currently
 the kernel param code doesn't allow a generic param outside the ones
 specified by the module itself but adding support for something like
 driver.async_load=1 shouldn't be too difficult, applying that to
 existing systems shouldn't be much more difficult than a system-wide
 switch, and it'd be siginificantly cleaner than fiddling with driver
 blacklist.

Agreed.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-11 Thread James Bottomley

On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
 On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
  On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
   
   The thing is that we have to have dynamic mechanism to listen for
   device attachments no matter what and such mechanism has been in place
   for a long time at this point.  The synchronous wait simply doesn't
   serve any purpose anymore and kinda gets in the way in that it makes
   it a possibly extremely slow process to tell whether loading of a
   module succeeded or not because the wait for the initial round of
   probe is piggybacked.
  
  OK, so we just fire and forget in userland ... why bother inventing an
  elaborate new infrastructure in the kernel to do exactly what
  
  modprobe mod 
  
  would do?
 
 Just so we do not forget: we also want the no-modules case to also be able
 to probe asynchronously so that a slow device does not stall kernel booting.

Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
scanning of attached devices (once the cards are probed) but has a sync
point for ordering.

The problem of speeding up boot is different from the one of init
processes killing modprobes.  There are elements in common, but by and
large the biggest headaches at least in large device number boots have
already been tackled by the enterprise crowd (they don't like their
S390's or 1024 core NUMA systems taking half an hour to come up).

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-11 Thread Dmitry Torokhov
On Thu, Sep 11, 2014 at 12:59:25PM -0700, James Bottomley wrote:
 
 On Tue, 2014-09-09 at 16:01 -0700, Dmitry Torokhov wrote:
  On Tuesday, September 09, 2014 03:46:23 PM James Bottomley wrote:
   On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.
   
   OK, so we just fire and forget in userland ... why bother inventing an
   elaborate new infrastructure in the kernel to do exactly what
   
   modprobe mod 
   
   would do?
  
  Just so we do not forget: we also want the no-modules case to also be able
  to probe asynchronously so that a slow device does not stall kernel booting.
 
 Yes, but we mostly do this anyway.  SCSI for instance does asynchronous
 scanning of attached devices (once the cards are probed)

What would it do it card was a bit slow to probe?

 but has a sync
 point for ordering.

Quite often we do not really care about ordering of devices. I mean,
does it matter if your mouse is discovered before your keyboard or
after?


 The problem of speeding up boot is different from the one of init
 processes killing modprobes.

Right. One is systemd doing stupid things, another is kernel could be
smarter.

  There are elements in common, but by and
 large the biggest headaches at least in large device number boots have
 already been tackled by the enterprise crowd (they don't like their
 S390's or 1024 core NUMA systems taking half an hour to come up).

Please do not position this as a mostly solved large systems problem,
For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
a lot given that we boot in seconds.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-11 Thread Luis R. Rodriguez
On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
dmitry.torok...@gmail.com wrote:

  There are elements in common, but by and
 large the biggest headaches at least in large device number boots have
 already been tackled by the enterprise crowd (they don't like their
 S390's or 1024 core NUMA systems taking half an hour to come up).

 Please do not position this as a mostly solved large systems problem,
 For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
 a lot given that we boot in seconds.

Dmitry, would working on top of the aysnc series be reasonable? Then
we could address these as separate things which we'd build on top of.
The one aspect I see us needing to share is the async probe universe
is OK flag.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-11 Thread Dmitry Torokhov
On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
 On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
 dmitry.torok...@gmail.com wrote:
 
   There are elements in common, but by and
  large the biggest headaches at least in large device number boots have
  already been tackled by the enterprise crowd (they don't like their
  S390's or 1024 core NUMA systems taking half an hour to come up).
 
  Please do not position this as a mostly solved large systems problem,
  For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
  a lot given that we boot in seconds.
 
 Dmitry, would working on top of the aysnc series be reasonable? Then
 we could address these as separate things which we'd build on top of.
 The one aspect I see us needing to share is the async probe universe
 is OK flag.

Sure. Are you planning on refreshing your series? I think the
code-related discussion kind of stalled...

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-11 Thread Luis R. Rodriguez
On Thu, Sep 11, 2014 at 1:53 PM, Dmitry Torokhov
dmitry.torok...@gmail.com wrote:
 On Thu, Sep 11, 2014 at 01:42:20PM -0700, Luis R. Rodriguez wrote:
 On Thu, Sep 11, 2014 at 1:23 PM, Dmitry Torokhov
 dmitry.torok...@gmail.com wrote:
 
   There are elements in common, but by and
  large the biggest headaches at least in large device number boots have
  already been tackled by the enterprise crowd (they don't like their
  S390's or 1024 core NUMA systems taking half an hour to come up).
 
  Please do not position this as a mostly solved large systems problem,
  For us it is touchpad detection stalling kernel for 0.5-1 sec. Which is
  a lot given that we boot in seconds.

 Dmitry, would working on top of the aysnc series be reasonable? Then
 we could address these as separate things which we'd build on top of.
 The one aspect I see us needing to share is the async probe universe
 is OK flag.

 Sure. Are you planning on refreshing your series?

Yes.

 I think the code-related discussion kind of stalled...

I was just waiting for any possible brain farts to flush out before a
new respin. I'll tackle this now.

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
 Hello, Luis.

 On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
   I have no idea how the selection should be.  It could be per-insmod or
   maybe just a system-wide flag with explicit exceptions marked on
   drivers is good enough.  I don't know.
 
  Its perfectly understandable if we don't know what path to take yet
  and its also understandable for it to take time to figure out --
  meanwhile though systemd already has merged a policy of a 30 second
  timeout for *all drivers* though so we therefore need:

 I'm not too convinced this is such a difficult problem to figure out.
 We already have most of logic in place and the only thing missing is
 how to switch it.  Wouldn't something like the following work?

 * Add a sysctl knob to enable asynchronous device probing on module
   load and enable asynchronous probing globally if the knob is set.

 * Identify cases which can't be asynchronous and make them
   synchronous.  e.g. keep who's doing request_module() and avoid
   asynchronous probing if current is probing one of those.

 What's wrong with just fixing systemd?  Arbitrary timeouts in init
 scripts for system bring up are plain wrong ... I thought we had this
 sorted out ten years ago when we were first having the arguments about
 how long to wait for root; I'm surprised it's coming back again.

By design it seems systemd should not allow worker processes to block
indefinitely and in fact it currently uses the same timeout for all
types of worker processes. I last recommended a multiplier to at least
allow systemd to distinguish and allow us to modify the timeout based
on the type of process by using an enum used to classify these, kmod
for example would be one type of command:

http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

This was deemed to introduce unnecessary complexity, but I believe
this was before we realized that the timeout was penalizing kmod usage
unfairly given that the original assumption that it was just init that
should be penalized was incorrect given that we batch both init +
probe together. I have been relaying updates back on that thread as we
move along with this discussion on the issues found with the timeout,
but haven't gotten feedback yet as to which path folks on systemd
would like to take in light of recent discussions / clarifications.
Perhaps your arguments might help folks here reconsider things a bit
as well.

If we want *tight* integration between init system / kernel these
discussions are necessary not only when we find issues but also should
be part of the design phase for major changes.

 If we want to sort out some sync/async mechanism for probing devices, as
 an agreement between the init systems and the kernel, that's fine, but
 its a to-be negotiated enhancement.

Unfortunately as Tejun notes the train has left which already made
assumptions on this. I'm afraid distributions that want to avoid this
sigkill at least on the kernel front will have to work around this
issue either on systemd by increasing the default timeout which is now
possible thanks to Hannes' changes or by some other means such as the
combination of a modified non-chatty version of this patch + a check
at the end of load_module() as mentioned earlier on these threads.

 For the current bug fix, just fix  the component that broke ... which would 
 be systemd.

For new systems it seems the proposed fix is to have systemd tell the
kernel what it thought it should be seeing and that is all pure async
probes through a sysctl, and then we'd do async probe on all modules
unless a driver is specifically flagged with a need to run synchronous
(we'll enable this for request_firmware() users for example to start
off with).

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread James Bottomley
On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
 On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  If we want to sort out some sync/async mechanism for probing devices, as
  an agreement between the init systems and the kernel, that's fine, but
  its a to-be negotiated enhancement.
 
 Unfortunately as Tejun notes the train has left which already made
 assumptions on this.

Well, that's why it's a bug.  It's a material regression impacting
users.

  I'm afraid distributions that want to avoid this
 sigkill at least on the kernel front will have to work around this
 issue either on systemd by increasing the default timeout which is now
 possible thanks to Hannes' changes or by some other means such as the
 combination of a modified non-chatty version of this patch + a check
 at the end of load_module() as mentioned earlier on these threads.

Increasing the default timeout in systemd seems like the obvious bug fix
to me.  If the patch exists already, having distros that want it use it
looks to be correct ... not every bug is a kernel bug, after all.

Negotiating a probe vs init split for drivers is fine too, but it's a
longer term thing rather than a bug fix.

  For the current bug fix, just fix  the component that broke ... which would 
  be systemd.
 
 For new systems it seems the proposed fix is to have systemd tell the
 kernel what it thought it should be seeing and that is all pure async
 probes through a sysctl, and then we'd do async probe on all modules
 unless a driver is specifically flagged with a need to run synchronous
 (we'll enable this for request_firmware() users for example to start
 off with).

I don't have very strong views on this one.  However, I've got to say
from a systems point of view that if the desire is to flag when the
module is having problems, probing and initializing synchronously in a
thread spawned by init which the init process can watchdog and thus can
flash up warning messages seems to be more straightforwards than an
elaborate asynchronous mechanism with completion signalling which
achieves the same thing in a more complicated (and thus bug prone)
fashion.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Luis R. Rodriguez
On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
 On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  If we want to sort out some sync/async mechanism for probing devices, as
  an agreement between the init systems and the kernel, that's fine, but
  its a to-be negotiated enhancement.

 Unfortunately as Tejun notes the train has left which already made
 assumptions on this.

 Well, that's why it's a bug.  It's a material regression impacting
 users.

Indeed. I believe the issue with this regression however was that the
original commit e64fae55 (January 2012) was only accepted by *kernel
folks* to be a real regression until recently. More than two years
have gone by on growing design and assumptions on top of that original
commit. I'm not sure if *systemd folks* yet believe its was a design
regression?

  I'm afraid distributions that want to avoid this
 sigkill at least on the kernel front will have to work around this
 issue either on systemd by increasing the default timeout which is now
 possible thanks to Hannes' changes or by some other means such as the
 combination of a modified non-chatty version of this patch + a check
 at the end of load_module() as mentioned earlier on these threads.

 Increasing the default timeout in systemd seems like the obvious bug fix
 to me.  If the patch exists already, having distros that want it use it
 looks to be correct ... not every bug is a kernel bug, after all.

Its merged upstream on systemd now, along with a few fixes on top of
it. I also see Kay merged a change to the default timeout to 60 second
on August 30. Its unclear if these discussions had any impact on that
decision or if that was just because udev firmware loading got now
ripped out. I'll note that the new 60 second timeout wouldn't suffice
for cxgb4 even if it didn't do firmware loading, its probe takes over
one full minute.

 Negotiating a probe vs init split for drivers is fine too, but it's a
 longer term thing rather than a bug fix.

Indeed. What I proposed with a multiplier for the timeout for the
different types of built in commands was deemed complex but saw no
alternatives proposed despite my interest to work on one and
clarifications noted that this was a design regression. Not quite sure
what else I could have done here. I'm interested in learning what the
better approach is for the future as if we want to marry init + kernel
we need a smooth way for us to discuss design without getting worked
up about it, or taking it personal. I really want this to work as I
personally like systemd so far.

  For the current bug fix, just fix  the component that broke ... which 
  would be systemd.

 For new systems it seems the proposed fix is to have systemd tell the
 kernel what it thought it should be seeing and that is all pure async
 probes through a sysctl, and then we'd do async probe on all modules
 unless a driver is specifically flagged with a need to run synchronous
 (we'll enable this for request_firmware() users for example to start
 off with).

 I don't have very strong views on this one.  However, I've got to say
 from a systems point of view that if the desire is to flag when the
 module is having problems, probing and initializing synchronously in a
 thread spawned by init which the init process can watchdog and thus can
 flash up warning messages seems to be more straightforwards

Indeed however it was not understood that module loading did init +
probe synchrounously, and indeed what you recommend is also what I was
hoping systemd *should do* instead of a hard sigkill at the default
timeout.

 than an
 elaborate asynchronous mechanism with completion signalling which
 achieves the same thing in a more complicated (and thus bug prone)
 fashion.

I couldn't be in any more agreement with you. It takes two to tango though.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hey, James.

On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
 I don't have very strong views on this one.  However, I've got to say
 from a systems point of view that if the desire is to flag when the
 module is having problems, probing and initializing synchronously in a
 thread spawned by init which the init process can watchdog and thus can
 flash up warning messages seems to be more straightforwards than an
 elaborate asynchronous mechanism with completion signalling which
 achieves the same thing in a more complicated (and thus bug prone)
 fashion.

We no longer report back error on probe failure on module load.  It
used to make sense to indicate error for module load on probe failure
when the hardware was a lot simpler and drivers did their own device
enumeration.  With the current bus / device setup, it doesn't make any
sense and driver core silently suppresses all probe failures.  There's
nothing the probing thread can monitor anymore.

In that sense, we already separated out device probing from module
loading simply because the hardware reality mandated it and we have
dynamic mechanisms to listen for device probes exactly for the same
reason, so I think it makes sense to separate out the waiting too, at
least in the long term.  In a modern dynamic setup, the waits are
essentially arbitrary and doesn't buy us anything.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Jiri Kosina
On Tue, 9 Sep 2014, Luis R. Rodriguez wrote:

 By design it seems systemd should not allow worker processes to block
 indefinitely and in fact it currently uses the same timeout for all
 types of worker processes. 

And I whole-heartedly believe this is something that fundamentally needs 
to be addressed in systemd, not in the kernel.

This aproach is actually introducing a user-visible regressions. Look, for 
example, exec() never times out. Therefore if your system is on its knees, 
heavily overloaded (or completely broken), you are likely to be able to 
`reboot' it, because exec(/sbin/reboot) ultimately succeeds.

But with all the timeouts, dbus, Failed to issue method call: Did 
not receive a reply messages, this is getting close to impossible.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread James Bottomley
On Wed, 2014-09-10 at 06:42 +0900, Tejun Heo wrote:
 Hey, James.
 
 On Tue, Sep 09, 2014 at 12:35:46PM -0700, James Bottomley wrote:
  I don't have very strong views on this one.  However, I've got to say
  from a systems point of view that if the desire is to flag when the
  module is having problems, probing and initializing synchronously in a
  thread spawned by init which the init process can watchdog and thus can
  flash up warning messages seems to be more straightforwards than an
  elaborate asynchronous mechanism with completion signalling which
  achieves the same thing in a more complicated (and thus bug prone)
  fashion.
 
 We no longer report back error on probe failure on module load.

Yes, we do; for every probe failure of a device on a driver we'll print
a warning (see drivers/base/dd.c).  Now if someone is proposing we
should report this in a better fashion, that's probably a good idea, but
I must have missed that patch.

   It
 used to make sense to indicate error for module load on probe failure
 when the hardware was a lot simpler and drivers did their own device
 enumeration.  With the current bus / device setup, it doesn't make any
 sense and driver core silently suppresses all probe failures.  There's
 nothing the probing thread can monitor anymore.

Except the length of time taken to probe.  That seems to be what systemd
is interested in, hence this whole thread, right?

 In that sense, we already separated out device probing from module
 loading simply because the hardware reality mandated it and we have
 dynamic mechanisms to listen for device probes exactly for the same
 reason, so I think it makes sense to separate out the waiting too, at
 least in the long term.  In a modern dynamic setup, the waits are
 essentially arbitrary and doesn't buy us anything.

But that's nothing to do with sync or async.  Nowadays we register a
driver, the driver may bind to multiple devices.  If one of those
devices encounters an error during probe, we just report the fact in
dmesg and move on.  The module_init thread currently returns when all
the probe routines for all enumerated devices have been called, so
module_init has no indication of any failures (because they might be
mixed with successes); successes are indicated as the device appears but
we have nothing other than the kernel log to indicate the failures.  How
does moving to async probing alter this?  It doesn't as far as I can
see, except that module_init returns earlier but now we no longer have
an indication of when the probe completes, so we have to add yet another
mechanism to tell us if we're interested in that.  I really don't see
what this buys us.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hello,

On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
  We no longer report back error on probe failure on module load.
 
 Yes, we do; for every probe failure of a device on a driver we'll print
 a warning (see drivers/base/dd.c).  Now if someone is proposing we
 should report this in a better fashion, that's probably a good idea, but
 I must have missed that patch.

We can do printks all the same from anywhere.  There's nothing special
about printing from the module loading thread.  The only way to
actually take advantage of the synchronisity would be propagating
error return to the waiting issuer, which we used to do but no longer
can.

It
  used to make sense to indicate error for module load on probe failure
  when the hardware was a lot simpler and drivers did their own device
  enumeration.  With the current bus / device setup, it doesn't make any
  sense and driver core silently suppresses all probe failures.  There's
  nothing the probing thread can monitor anymore.
 
 Except the length of time taken to probe.  That seems to be what systemd
 is interested in, hence this whole thread, right?

No, systemd in this case isn't interested in the time taken to probe
at all.  It is expecting module load to just do that - load the
module.  Modern userlands, systemd or not, no longer depend on or make
use of the wait.

 But that's nothing to do with sync or async.  Nowadays we register a
 driver, the driver may bind to multiple devices.  If one of those
 devices encounters an error during probe, we just report the fact in
 dmesg and move on.  The module_init thread currently returns when all
 the probe routines for all enumerated devices have been called, so
 module_init has no indication of any failures (because they might be
 mixed with successes); successes are indicated as the device appears but
 we have nothing other than the kernel log to indicate the failures.  How
 does moving to async probing alter this?  It doesn't as far as I can
 see, except that module_init returns earlier but now we no longer have
 an indication of when the probe completes, so we have to add yet another
 mechanism to tell us if we're interested in that.  I really don't see
 what this buys us.

The thing is that we have to have dynamic mechanism to listen for
device attachments no matter what and such mechanism has been in place
for a long time at this point.  The synchronous wait simply doesn't
serve any purpose anymore and kinda gets in the way in that it makes
it a possibly extremely slow process to tell whether loading of a
module succeeded or not because the wait for the initial round of
probe is piggybacked.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread James Bottomley
On Wed, 2014-09-10 at 07:41 +0900, Tejun Heo wrote:
 Hello,
 
 On Tue, Sep 09, 2014 at 03:26:02PM -0700, James Bottomley wrote:
   We no longer report back error on probe failure on module load.
  
  Yes, we do; for every probe failure of a device on a driver we'll print
  a warning (see drivers/base/dd.c).  Now if someone is proposing we
  should report this in a better fashion, that's probably a good idea, but
  I must have missed that patch.
 
 We can do printks all the same from anywhere.  There's nothing special
 about printing from the module loading thread.  The only way to
 actually take advantage of the synchronisity would be propagating
 error return to the waiting issuer, which we used to do but no longer
 can.

If you want the return of an individual device probe a log scraper gives
it to you ... and nothing else does currently.  The advantage of the
prink in dd.c is that it's standard for everything and can be scanned
for ... if you take that out, you'll get complaints about the lack of
standard messages (you'd be surprised at the number of enterprise
monitoring systems that actually do log scraping).

 It
   used to make sense to indicate error for module load on probe failure
   when the hardware was a lot simpler and drivers did their own device
   enumeration.  With the current bus / device setup, it doesn't make any
   sense and driver core silently suppresses all probe failures.  There's
   nothing the probing thread can monitor anymore.
  
  Except the length of time taken to probe.  That seems to be what systemd
  is interested in, hence this whole thread, right?
 
 No, systemd in this case isn't interested in the time taken to probe
 at all.  It is expecting module load to just do that - load the
 module.  Modern userlands, systemd or not, no longer depend on or make
 use of the wait.

So what's the problem?  it can just fire and forget; that's what fork()
is for.

  But that's nothing to do with sync or async.  Nowadays we register a
  driver, the driver may bind to multiple devices.  If one of those
  devices encounters an error during probe, we just report the fact in
  dmesg and move on.  The module_init thread currently returns when all
  the probe routines for all enumerated devices have been called, so
  module_init has no indication of any failures (because they might be
  mixed with successes); successes are indicated as the device appears but
  we have nothing other than the kernel log to indicate the failures.  How
  does moving to async probing alter this?  It doesn't as far as I can
  see, except that module_init returns earlier but now we no longer have
  an indication of when the probe completes, so we have to add yet another
  mechanism to tell us if we're interested in that.  I really don't see
  what this buys us.
 
 The thing is that we have to have dynamic mechanism to listen for
 device attachments no matter what and such mechanism has been in place
 for a long time at this point.  The synchronous wait simply doesn't
 serve any purpose anymore and kinda gets in the way in that it makes
 it a possibly extremely slow process to tell whether loading of a
 module succeeded or not because the wait for the initial round of
 probe is piggybacked.

OK, so we just fire and forget in userland ... why bother inventing an
elaborate new infrastructure in the kernel to do exactly what

modprobe mod 

would do?

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
Hello, James.

On Tue, Sep 09, 2014 at 03:46:23PM -0700, James Bottomley wrote:
 If you want the return of an individual device probe a log scraper gives
 it to you ... and nothing else does currently.  The advantage of the
 prink in dd.c is that it's standard for everything and can be scanned
 for ... if you take that out, you'll get complaints about the lack of
 standard messages (you'd be surprised at the number of enterprise
 monitoring systems that actually do log scraping).

Why would a log scaper care about which task is printing the messages?
The printk can stay there.  There's nothing wrong with it.  Log
scapers tend to be asynchronous in nature but if a log scraper wants
to operate synchronously for whatever reason, it can simply not turn
on async probing.

 OK, so we just fire and forget in userland ... why bother inventing an
 elaborate new infrastructure in the kernel to do exactly what
 
 modprobe mod 
 
 would do?

I think the argument there is that the issuer wants to know whether
such operations succeeded or not and wants to report and record the
result and possibly take other actions in response.  We're currently
mixing wait and error reporting for one type of operation with wait
for another.  I'm not saying it's a fatal flaw or anything but it can
get in the way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tejun Heo
On Tue, Sep 09, 2014 at 12:25:29PM +0900, Tejun Heo wrote:
 Hello,
 
 On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
  On the systemd side of things it should enable this sysctl and for
  older kernels what should it do?
 
 Supposing the change is backported via -stable, it can try to set the
 sysctl on all kernels.  If the knob doesn't exist, the fix is not
 there and nothing can be done about it.

The more I think about it, the more I think this should be a
per-insmod instance thing rather than a system-wide switch.  Currently
the kernel param code doesn't allow a generic param outside the ones
specified by the module itself but adding support for something like
driver.async_load=1 shouldn't be too difficult, applying that to
existing systems shouldn't be much more difficult than a system-wide
switch, and it'd be siginificantly cleaner than fiddling with driver
blacklist.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-09 Thread Tom Gundersen
On Tue, Sep 9, 2014 at 3:26 AM, Luis R. Rodriguez
mcg...@do-not-panic.com wrote:
 On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
 I'm not too convinced this is such a difficult problem to figure out.
 We already have most of logic in place and the only thing missing is
 how to switch it.  Wouldn't something like the following work?

 * Add a sysctl knob to enable asynchronous device probing on module
   load and enable asynchronous probing globally if the knob is set.

 Alternatively, add a module-generic param async_probe or whatever
 and use that to switch the behavior should work too.  I don't know
 which way is better but either should work fine.

 I take it by this you meant a generic system-wide sysctl or kernel cmd
 line option to enable this for al drivers?

If the expectation is that this feature should be enabled
unconditionally for all systemd systems, wouldn't it make more sense
to make it a Kconfig option (possibly overridable from the kernel
commandline in case that makes testing simpler)?

Cheers,

Tom
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Fri, Sep 5, 2014 at 3:40 PM, Tejun Heo t...@kernel.org wrote:
 Hello, Luis.

 On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
 Meanwhile we are allowing a major design consideration such as a 30
 second timeout for both init + probe all of a sudden become a hard
 requirement for device drivers. I see your point but can't also be
 introducing major design changes willy nilly either. We *need* a
 solution for the affected drivers.

 Yes, make the behavior specifically specified from userland.  When did
 I ever say that there should be no solution for the problem?  I've
 been saying that the behavior should be selected from userland from
 the get-go, haven't I?

 I have no idea how the selection should be.  It could be per-insmod or
 maybe just a system-wide flag with explicit exceptions marked on
 drivers is good enough.  I don't know.

Its perfectly understandable if we don't know what path to take yet
and its also understandable for it to take time to figure out --
meanwhile though systemd already has merged a policy of a 30 second
timeout for *all drivers* though so we therefore need:

0) a solutions for affected combination of systemd / drivers
1) an agreed path forward

If we want a tight integration between both kernel / init system we
need to be able to communicate effectively folks and I'm afraid this
isn't happening. I last noted on systemd-devel how the 30 second
timeout issue was merged under incorrect assumptions -- that it was
not just init that at times caused delays, and that since we currently
batch both init and probe on the driver core we need a non fatal
userspace solution [0], while we work on design on the kernel side of
things for async'ing for drivers that make sense. A proper kernel
solution may take longer than expected, we can't just assume a
probe_async flag will suffice on drivers, in fact as Tejun notes, its
wrong since historically we have had some random userland depend on
the synhronous behaviour of module loading of some drivers, and that
*could* have taken a while.

Kay, Lennart, any recommendations ?

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/022696.html

 Also what stops drivers from going ahead and just implementing their
 own async probe? Would that now be frowned upon as it strives away

 The drivers can't.  How many times should I explain the same thing
 over and over again.  libata can't simply make probing asynchronous
 w.r.t. module loading no matter how it does it.  Yeah, sure, there can
 be other drivers which can do that without most people noticing it but
 a storage driver isn't one of them and the storage drivers are the
 problematic ones already, right?

Its one of the subsystems that has suffered from this, but not the only one.

 from the original design? The bool would let those drivers do this
 easily, and we would still need to identify these drivers, although
 this particular change can be NAK'd Oleg's suggestion on
 WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
 at least needed. And if its not async probe... what do those with
 failed drivers do?

 I'm getting tired of explaining the same thing over and over again.
 The said change was nacked because the whole approach of let's see
 which drivers get reported on the issue which exists basically for all
 drivers and just change the behavior of them is braindead.  It makes
 no sense whatsoever.  It doesn't address the root cause of the problem
 while making the same class of drivers behave significantly
 differently for no good reason.  Please stop chasing your own tail and
 try to understand the larger picture.

Understood.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello, Luis.

On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
  I have no idea how the selection should be.  It could be per-insmod or
  maybe just a system-wide flag with explicit exceptions marked on
  drivers is good enough.  I don't know.
 
 Its perfectly understandable if we don't know what path to take yet
 and its also understandable for it to take time to figure out --
 meanwhile though systemd already has merged a policy of a 30 second
 timeout for *all drivers* though so we therefore need:

I'm not too convinced this is such a difficult problem to figure out.
We already have most of logic in place and the only thing missing is
how to switch it.  Wouldn't something like the following work?

* Add a sysctl knob to enable asynchronous device probing on module
  load and enable asynchronous probing globally if the knob is set.

* Identify cases which can't be asynchronous and make them
  synchronous.  e.g. keep who's doing request_module() and avoid
  asynchronous probing if current is probing one of those.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
 * Identify cases which can't be asynchronous and make them
   synchronous.  e.g. keep who's doing request_module() and avoid
   asynchronous probing if current is probing one of those.

That wouldn't work as we don't know what's gonna happen in userland
but we can start with just disallowing async probing for char devices
for now.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
 I'm not too convinced this is such a difficult problem to figure out.
 We already have most of logic in place and the only thing missing is
 how to switch it.  Wouldn't something like the following work?
 
 * Add a sysctl knob to enable asynchronous device probing on module
   load and enable asynchronous probing globally if the knob is set.

Alternatively, add a module-generic param async_probe or whatever
and use that to switch the behavior should work too.  I don't know
which way is better but either should work fine.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 6:22 PM, Tejun Heo t...@kernel.org wrote:
 On Tue, Sep 09, 2014 at 10:10:59AM +0900, Tejun Heo wrote:
 I'm not too convinced this is such a difficult problem to figure out.
 We already have most of logic in place and the only thing missing is
 how to switch it.  Wouldn't something like the following work?

 * Add a sysctl knob to enable asynchronous device probing on module
   load and enable asynchronous probing globally if the knob is set.

 Alternatively, add a module-generic param async_probe or whatever
 and use that to switch the behavior should work too.  I don't know
 which way is better but either should work fine.

I take it by this you meant a generic system-wide sysctl or kernel cmd
line option to enable this for al drivers?

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
  Alternatively, add a module-generic param async_probe or whatever
  and use that to switch the behavior should work too.  I don't know
  which way is better but either should work fine.
 
 I take it by this you meant a generic system-wide sysctl or kernel cmd
 line option to enable this for al drivers?

Well, either global or per-insmod switch should work.  There probably
are details that I haven't mentioned - e.g. probably global switch is
easier to backport and deploy to existing systems - but as long as it
works I don't have fundmental objections either way.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 6:29 PM, Tejun Heo t...@kernel.org wrote:
 On Mon, Sep 08, 2014 at 06:26:04PM -0700, Luis R. Rodriguez wrote:
  Alternatively, add a module-generic param async_probe or whatever
  and use that to switch the behavior should work too.  I don't know
  which way is better but either should work fine.

 I take it by this you meant a generic system-wide sysctl or kernel cmd
 line option to enable this for al drivers?

 Well, either global or per-insmod switch should work.  There probably
 are details that I haven't mentioned - e.g. probably global switch is
 easier to backport and deploy to existing systems

Yes a global sysctl solution might make it easier to backport.

 - but as long as it
 works I don't have fundmental objections either way.

OK then one only concern I would have with this is that the presence
of such a flag doesn't necessarily mean that all drivers on a system
have been tested for asynch probe yet. I'd feel much more comfortable
if this global flag allowed say specific drivers that *did* have such
a bool enabled, for example. Then that would enable synchronous
behaviour for the kernel by default, require the flag for enabling the
new async feature but only for drivers that have been tested.

That also still would not technically solve the issue of the current
existence of the timeout, unless of course we wish to ask systemd to
only make the timeout take effect *iff* the global sysctl flag /
whatever was enabled.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
 OK then one only concern I would have with this is that the presence
 of such a flag doesn't necessarily mean that all drivers on a system
 have been tested for asynch probe yet. I'd feel much more comfortable

Given that the behvaior change is from driver core and that device
probing can happen post-loading anyway, I don't think we need to worry
about drivers breaking from probing made asynchronous to loading.  The
problem is the expectation of the entity which initiated loading of
the module.  If it's depending on device being probed synchronously
but insmod returns before that, it can break things.  We probably
should audit request_module() users and see which ones expect such
behavior.

 if this global flag allowed say specific drivers that *did* have such
 a bool enabled, for example. Then that would enable synchronous
 behaviour for the kernel by default, require the flag for enabling the
 new async feature but only for drivers that have been tested.

If we're gonna do the global switch, I personally think the right
approach is blacklisting instead of the other way around because each
specific driver doesn't really have much to do with it and the
exceptions are about specific use cases that we don't have a good way
to identify them from module loading path.

 That also still would not technically solve the issue of the current
 existence of the timeout, unless of course we wish to ask systemd to
 only make the timeout take effect *iff* the global sysctl flag /
 whatever was enabled.

Userland could backport a fix to set the sysctl.  Given that we need
both synchrnous and asynchronous behaviors, it's unlikely that we can
come up with a solution which doesn't need cooperation from userland.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 6:47 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Mon, Sep 08, 2014 at 06:38:34PM -0700, Luis R. Rodriguez wrote:
 OK then one only concern I would have with this is that the presence
 of such a flag doesn't necessarily mean that all drivers on a system
 have been tested for asynch probe yet. I'd feel much more comfortable

 Given that the behvaior change is from driver core and that device
 probing can happen post-loading anyway,

Ah but lets not forget Dmitry's requirement which is for in-kernel
drivers. We'd need to deal with both built-in and modules. Dmitry's
case is completely orthogonal to the systemd issue and is just needed
to help not stall boot but I see no reason to blend these two issues
into one requirement together.

 I don't think we need to worry
 about drivers breaking from probing made asynchronous to loading.  The
 problem is the expectation of the entity which initiated loading of
 the module.  If it's depending on device being probed synchronously
 but insmod returns before that, it can break things.  We probably
 should audit request_module() users and see which ones expect such
 behavior.

Sure. Based on a quick glance I see sloppy uses of this, this should
probably be fixed anyway.

 if this global flag allowed say specific drivers that *did* have such
 a bool enabled, for example. Then that would enable synchronous
 behaviour for the kernel by default, require the flag for enabling the
 new async feature but only for drivers that have been tested.

 If we're gonna do the global switch, I personally think the right
 approach is blacklisting instead of the other way around because each
 specific driver doesn't really have much to do with it and the
 exceptions are about specific use cases that we don't have a good way
 to identify them from module loading path.

OK sure... even if we did whitelist I'm afraid such a white list might
be subjective in terms of design to specific systems anyway... I
suppose the only real way to do it right is to push and strive towards
a full system whitelist and address the black list as you mention.

In terms of approach we would still need to decide on a path for how
to do asynch probing for both in-kernel drivers and modules, do we
want async_schedule(), or queue_work()? If async_schedule() do we want
to use a new domain or a new one shared for all drivers? Priority on
the schedular was one of my other concerns which we'd need to make
right to match existing load on drivers through finit_module() and
synchronous probe.

 That also still would not technically solve the issue of the current
 existence of the timeout, unless of course we wish to ask systemd to
 only make the timeout take effect *iff* the global sysctl flag /
 whatever was enabled.

 Userland could backport a fix to set the sysctl.  Given that we need
 both synchrnous and asynchronous behaviors, it's unlikely that we can
 come up with a solution which doesn't need cooperation from userland.

True and then the timeout would also have to be skipped for device
drivers that have the sync_probe flag set, so I guess we'd need to
expose that too. I'm not too sure if systemd is equipped to be happy
with no timeout on module loading based previous discussions [0] so
we'd need to ensure we're all in agreement there that such drivers
exist and we may need *something*, if at the very least a really long
fucking timeout (TM) for such drivers.

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/021852.html

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
  Given that the behvaior change is from driver core and that device
  probing can happen post-loading anyway,
 
 Ah but lets not forget Dmitry's requirement which is for in-kernel
 drivers. We'd need to deal with both built-in and modules. Dmitry's
 case is completely orthogonal to the systemd issue and is just needed
 to help not stall boot but I see no reason to blend these two issues
 into one requirement together.

Maybe we can piggy back the two on the same mechanism but as you said
the two issues are orthogonal.  Let's keep it that way for now.  We
need them separate anyway for backports.

 In terms of approach we would still need to decide on a path for how
 to do asynch probing for both in-kernel drivers and modules, do we
 want async_schedule(), or queue_work()? If async_schedule() do we want
 to use a new domain or a new one shared for all drivers? Priority on

I don't think async_schedule() is the right mechanism for this use
case as the mechanism is inherently opportunistic.  It also gets
tangled up with async synchronization at the end of module loading.

 the schedular was one of my other concerns which we'd need to make
 right to match existing load on drivers through finit_module() and
 synchronous probe.

Why do we care about the priority of probing tasks?  Does that
actually make any meaningful difference?  If so, how?

  Userland could backport a fix to set the sysctl.  Given that we need
  both synchrnous and asynchronous behaviors, it's unlikely that we can
  come up with a solution which doesn't need cooperation from userland.
 
 True and then the timeout would also have to be skipped for device
 drivers that have the sync_probe flag set, so I guess we'd need to

I'm not sure about skipping for sync_probe flag.  That seems like an
implementation detail to me.  Sure, we do that now because we don't
have a better way of figuring out whether request_module() is waiting
for it or not but hopefully we'd be able to in the future.  I think we
just should make exceptions sensible so that it works fine in practice
for now (and I don't think that'd be too hard).  So, the only
cooperation necessary from userland would be just saying I don't
wanna wait for device probing on module load.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 7:39 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Mon, Sep 08, 2014 at 07:28:58PM -0700, Luis R. Rodriguez wrote:
  Given that the behvaior change is from driver core and that device
  probing can happen post-loading anyway,

 Ah but lets not forget Dmitry's requirement which is for in-kernel
 drivers. We'd need to deal with both built-in and modules. Dmitry's
 case is completely orthogonal to the systemd issue and is just needed
 to help not stall boot but I see no reason to blend these two issues
 into one requirement together.

 Maybe we can piggy back the two on the same mechanism but as you said
 the two issues are orthogonal.  Let's keep it that way for now.  We
 need them separate anyway for backports.

OK.

 In terms of approach we would still need to decide on a path for how
 to do asynch probing for both in-kernel drivers and modules, do we
 want async_schedule(), or queue_work()? If async_schedule() do we want
 to use a new domain or a new one shared for all drivers? Priority on

 I don't think async_schedule() is the right mechanism for this use
 case as the mechanism is inherently opportunistic.  It also gets
 tangled up with async synchronization at the end of module loading.

 the schedular was one of my other concerns which we'd need to make
 right to match existing load on drivers through finit_module() and
 synchronous probe.

 Why do we care about the priority of probing tasks?  Does that
 actually make any meaningful difference?  If so, how?

As I noted before -- I have yet to provide clear metrics but at least
changing both init paths + probe from finit_module() to kthread
certainly had a measurable time increase, I suspect using
queue_work(system_unbound_wq, async_probe_work) will make probe
slower. I'll get to these metrics this week.

  Userland could backport a fix to set the sysctl.  Given that we need
  both synchrnous and asynchronous behaviors, it's unlikely that we can
  come up with a solution which doesn't need cooperation from userland.

 True and then the timeout would also have to be skipped for device
 drivers that have the sync_probe flag set, so I guess we'd need to

 I'm not sure about skipping for sync_probe flag.  That seems like an
 implementation detail to me.  Sure, we do that now because we don't
 have a better way of figuring out whether request_module() is waiting
 for it or not but hopefully we'd be able to in the future.

Oh I was not thinking about just request_modules() users but also any
of those stragglers which we might have ended up finding through run
time analysis. The alternative right now is these drivers won't load.
No bueno.

 I think we
 just should make exceptions sensible so that it works fine in practice
 for now (and I don't think that'd be too hard).  So, the only
 cooperation necessary from userland would be just saying I don't
 wanna wait for device probing on module load.

But we're talking about drivers that have a flag that says 'you gotta
wait sucker', what do we want systemd to do then? I'd be happy if it'd
would not send the sigkill for these drivers, for example.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
  I think we
  just should make exceptions sensible so that it works fine in practice
  for now (and I don't think that'd be too hard).  So, the only
  cooperation necessary from userland would be just saying I don't
  wanna wait for device probing on module load.
 
 But we're talking about drivers that have a flag that says 'you gotta
 wait sucker', what do we want systemd to do then? I'd be happy if it'd
 would not send the sigkill for these drivers, for example.

Hah?  Can you give me an example?  I'm having hard time imagining a
driver with such requirement given our current driver core
implementation.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Luis R. Rodriguez
On Mon, Sep 8, 2014 at 8:03 PM, Tejun Heo t...@kernel.org wrote:
 On Mon, Sep 08, 2014 at 07:57:28PM -0700, Luis R. Rodriguez wrote:
  I think we
  just should make exceptions sensible so that it works fine in practice
  for now (and I don't think that'd be too hard).  So, the only
  cooperation necessary from userland would be just saying I don't
  wanna wait for device probing on module load.

 But we're talking about drivers that have a flag that says 'you gotta
 wait sucker', what do we want systemd to do then? I'd be happy if it'd
 would not send the sigkill for these drivers, for example.

 Hah?  Can you give me an example?  I'm having hard time imagining a
 driver with such requirement given our current driver core
 implementation.

I didn't say I had one in mind, but if you're certain these *shouldn't
exist* that's sufficient by me as well.

OK so I'll respin this series to enable a sysctl that would enable
async probe for *all drivers* using queue_work(system_unbound_wq) and
only use sync probe for now on request_module() users, we'll address
scheduling issues as they come up. I'll be ignoring built-in.

On the systemd side of things it should enable this sysctl and for
older kernels what should it do?

 Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread Tejun Heo
Hello,

On Mon, Sep 08, 2014 at 08:19:12PM -0700, Luis R. Rodriguez wrote:
 On the systemd side of things it should enable this sysctl and for
 older kernels what should it do?

Supposing the change is backported via -stable, it can try to set the
sysctl on all kernels.  If the knob doesn't exist, the fix is not
there and nothing can be done about it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-08 Thread James Bottomley
On Tue, 2014-09-09 at 10:10 +0900, Tejun Heo wrote:
 Hello, Luis.
 
 On Mon, Sep 08, 2014 at 06:04:23PM -0700, Luis R. Rodriguez wrote:
   I have no idea how the selection should be.  It could be per-insmod or
   maybe just a system-wide flag with explicit exceptions marked on
   drivers is good enough.  I don't know.
  
  Its perfectly understandable if we don't know what path to take yet
  and its also understandable for it to take time to figure out --
  meanwhile though systemd already has merged a policy of a 30 second
  timeout for *all drivers* though so we therefore need:
 
 I'm not too convinced this is such a difficult problem to figure out.
 We already have most of logic in place and the only thing missing is
 how to switch it.  Wouldn't something like the following work?
 
 * Add a sysctl knob to enable asynchronous device probing on module
   load and enable asynchronous probing globally if the knob is set.
 
 * Identify cases which can't be asynchronous and make them
   synchronous.  e.g. keep who's doing request_module() and avoid
   asynchronous probing if current is probing one of those.

What's wrong with just fixing systemd?  Arbitrary timeouts in init
scripts for system bring up are plain wrong ... I thought we had this
sorted out ten years ago when we were first having the arguments about
how long to wait for root; I'm surprised it's coming back again.

If we want to sort out some sync/async mechanism for probing devices, as
an agreement between the init systems and the kernel, that's fine, but
its a to-be negotiated enhancement.  For the current bug fix, just fix
the component that broke ... which would be systemd.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
...
 + /*
 +  * I got SIGKILL, but wait for 60 more seconds for completion
 +  * unless chosen by the OOM killer. This delay is there as a
 +  * workaround for boot failure caused by SIGKILL upon device
 +  * driver initialization timeout.
 +  *
 +  * N.B. this will actually let the thread complete regularly,
 +  * wait_for_completion() will be used eventually, the 60 second
 +  * try here is just to check for the OOM over that time.
 +  */
 + WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
 +   Got SIGKILL but not from OOM, if this issue is on 
 probe use .driver.async_probe\n);
 + for (i = 0; i  60  !test_thread_flag(TIF_MEMDIE); i++)
 + if (wait_for_completion_timeout(done, HZ))
 + goto wait_done;
 +

Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
instead of 30?  Why do we even need this with the proposed async
probing changes?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Mike Galbraith
On Fri, 2014-09-05 at 00:47 -0700, Luis R. Rodriguez wrote: 
 On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo t...@kernel.org wrote:
  On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
  ...
  + /*
  +  * I got SIGKILL, but wait for 60 more seconds for completion
  +  * unless chosen by the OOM killer. This delay is there as a
  +  * workaround for boot failure caused by SIGKILL upon device
  +  * driver initialization timeout.
  +  *
  +  * N.B. this will actually let the thread complete regularly,
  +  * wait_for_completion() will be used eventually, the 60 
  second
  +  * try here is just to check for the OOM over that time.
  +  */
  + WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
  +   Got SIGKILL but not from OOM, if this issue is on 
  probe use .driver.async_probe\n);
  + for (i = 0; i  60  !test_thread_flag(TIF_MEMDIE); i++)
  + if (wait_for_completion_timeout(done, HZ))
  + goto wait_done;
  +
 
  Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
  instead of 30?
 
 Nope! I fell into the same trap and only with tons of patience by part
 of Tetsuo with me was I able to grok that the 60 seconds here are not
 for increasing the timeout, this is just time spent checking to ensure
 that the OOM wasn't the one who triggered the SIGKILL. Even if the
 drivers took eons it should be fine now, I tried it :D
 
   Why do we even need this with the proposed async
  probing changes?
 
 Ah -- well without it the way we find drivers that need this new
 async feature is by a bug report and folks saying their system can't
 boot, or they say their device doesn't come up. That's all. Tracing
 this to systemd and a timeout was one of the most ugliest things ever.
 There two insane bug reports you can go check:
 
 mptsas was the first:
 
 http://article.gmane.org/gmane.linux.kernel/1669550
 https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248

quote
(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
30 seconds timeout. As a result, finit_module() of mptsas kernel
module receives SIGKILL when waiting for error handler thread to be
started.
/quote

Hm.  Why is this not a systemd-udevd bug for running around killing
stuff when it has no idea whether progress is being made or not?

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Oleg Nesterov
On 09/04, Luis R. Rodriguez wrote:

 From: Luis R. Rodriguez mcg...@suse.com

 The new umh kill option has allowed kthreads to receive
 kill signals but they are generally accepting all sources
 of kill signals

And I think this is right,

 while the original motivation was to enable
 through the OOM from sending the kill.

even if the main concern was OOM.

 Users can provide a log output and it should be clear on
 the trace what probe / driver got the kill signal.

Well, if you need a WARN output, perhaps you could just add
WARN_ON(fatal_signal_pending()) at the end of load_module() ?

Not only kthread_create() can fail if systemd sends SIGKILL.

 Although Oleg had rejected a
 similar change a while ago

And honestly, I still dislike this change.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
 Ah -- well without it the way we find drivers that need this new
 async feature is by a bug report and folks saying their system can't
 boot, or they say their device doesn't come up. That's all. Tracing
 this to systemd and a timeout was one of the most ugliest things ever.
 There two insane bug reports you can go check:
 
 mptsas was the first:
 
 http://article.gmane.org/gmane.linux.kernel/1669550
 https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
 
 Then cxgb4:
 
 https://bugzilla.novell.com/show_bug.cgi?id=877622
 
 I only had Cc'd you on the newest gem pata_marvell :
 
 https://bugzilla.kernel.org/show_bug.cgi?id=59581
 
 We can't seriously expect to be doing all this work for every driver.
 a WARN_ONCE() would enable us to find the drivers that need this new
 async probe feature.

This whole approach of trying to mark specific drivers as needing
async probing is completely broken for the problem at hand.  It
can't address the problem adequately while breaking backward
compatibility.  I don't think this makes much sense.

Nacked-by: Tejun Heo t...@kernel.org

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Dmitry Torokhov
On Friday, September 05, 2014 11:12:41 PM Tejun Heo wrote:
 On Fri, Sep 05, 2014 at 12:47:16AM -0700, Luis R. Rodriguez wrote:
  Ah -- well without it the way we find drivers that need this new
  async feature is by a bug report and folks saying their system can't
  boot, or they say their device doesn't come up. That's all. Tracing
  this to systemd and a timeout was one of the most ugliest things ever.
  There two insane bug reports you can go check:
  
  mptsas was the first:
  
  http://article.gmane.org/gmane.linux.kernel/1669550
  https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1297248
  
  Then cxgb4:
  
  https://bugzilla.novell.com/show_bug.cgi?id=877622
  
  I only had Cc'd you on the newest gem pata_marvell :
  
  https://bugzilla.kernel.org/show_bug.cgi?id=59581
  
  We can't seriously expect to be doing all this work for every driver.
  a WARN_ONCE() would enable us to find the drivers that need this new
  async probe feature.
 
 This whole approach of trying to mark specific drivers as needing
 async probing is completely broken for the problem at hand.  It
 can't address the problem adequately while breaking backward
 compatibility.  I don't think this makes much sense.
 

Which problem are we talking about here though? It does solve the slow device
stalling the rest if the kernel booting (non-module case) for me.

I also reject the notion that anyone should be relying on drivers to be fully
bound on module loading. It is not nineties anymore. We have hot pluggable
buses, deferred probing, and even for not hot-pluggable ones the module
providing the device itself might not be yet loaded. Any scripts that expect to
find device 100% ready after module loading are simply broken.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Luis R. Rodriguez
On Fri, Sep 05, 2014 at 12:59:49PM +0200, Oleg Nesterov wrote:
 On 09/04, Luis R. Rodriguez wrote:
 
  From: Luis R. Rodriguez mcg...@suse.com
 
  The new umh kill option has allowed kthreads to receive
  kill signals but they are generally accepting all sources
  of kill signals
 
 And I think this is right,
 
  while the original motivation was to enable
  through the OOM from sending the kill.
 
 even if the main concern was OOM.
 
  Users can provide a log output and it should be clear on
  the trace what probe / driver got the kill signal.
 
 Well, if you need a WARN output, perhaps you could just add
 WARN_ON(fatal_signal_pending()) at the end of load_module() ?

We could and that's a good idea, thanks! This however would
at least allow the device to be functional in the case the
kill was received during kthread usage, but it would certainly
also set precedents for doing similar things in the kernel
which I do agree with is hacky. If we had upstream at
least WARN_ON(fatal_signal_pending()) as you note then
I think it would at least be a reasonable compromise.

 Not only kthread_create() can fail if systemd sends SIGKILL.

Sure, although its currently the only source found and debugged.

  Although Oleg had rejected a
  similar change a while ago
 
 And honestly, I still dislike this change.

Don't blame you. The code is sensitive and hacky.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello,

On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
 Which problem are we talking about here though? It does solve the slow device
 stalling the rest if the kernel booting (non-module case) for me.

The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
has slow probing stalling boot problem.

 I also reject the notion that anyone should be relying on drivers to be fully
 bound on module loading. It is not nineties anymore. We have hot pluggable
 buses, deferred probing, and even for not hot-pluggable ones the module
 providing the device itself might not be yet loaded. Any scripts that expect 
 to
 find device 100% ready after module loading are simply broken.

We've been treating loading + probing as a single operation when
loading drivers and the assumption has always been that the existing
devices at the time of loading finished probing by the time insmod
finishes.  We now need to split loading and probing and wait for each
of them differently.  The *only* thing we can do is somehow making the
issuer specify that it's gonna wait for probing separately.  I'm not
sure this can even be up for discussion.  We're talking about a major
userland visible behavior change.  We simply can't change it
underneath the existing users.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Luis R. Rodriguez
On Fri, Sep 5, 2014 at 10:49 AM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Fri, Sep 05, 2014 at 09:44:05AM -0700, Dmitry Torokhov wrote:
 Which problem are we talking about here though? It does solve the slow device
 stalling the rest if the kernel booting (non-module case) for me.

 The other one.  The one with timeout.  Neither cxgb4 or pata_marvell
 has slow probing stalling boot problem.

 I also reject the notion that anyone should be relying on drivers to be fully
 bound on module loading. It is not nineties anymore. We have hot pluggable
 buses, deferred probing, and even for not hot-pluggable ones the module
 providing the device itself might not be yet loaded. Any scripts that expect 
 to
 find device 100% ready after module loading are simply broken.

 We've been treating loading + probing as a single operation when
 loading drivers and the assumption has always been that the existing
 devices at the time of loading finished probing by the time insmod
 finishes.  We now need to split loading and probing and wait for each
 of them differently.  The *only* thing we can do is somehow making the
 issuer specify that it's gonna wait for probing separately.  I'm not
 sure this can even be up for discussion.  We're talking about a major
 userland visible behavior change.  We simply can't change it
 underneath the existing users.

Meanwhile we are allowing a major design consideration such as a 30
second timeout for both init + probe all of a sudden become a hard
requirement for device drivers. I see your point but can't also be
introducing major design changes willy nilly either. We *need* a
solution for the affected drivers.

Also what stops drivers from going ahead and just implementing their
own async probe? Would that now be frowned upon as it strives away
from the original design? The bool would let those drivers do this
easily, and we would still need to identify these drivers, although
this particular change can be NAK'd Oleg's suggestion on
WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
at least needed. And if its not async probe... what do those with
failed drivers do?

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
 It is for storage devices which always have guaranteed synchronous
 probing on module load and well-defined probing order.  Sure, modern
 setups are a lot more dynamic but I'm quite certain that there are
 setups in the wild which depend on storage driver loading being
 synchronous.  We can't simply declare one day that such behavior is
 broken and break, most likely, their boots.

To add a bit, if the argument here is that dependency on such behavior
shouldn't exist and module loading and device probing should always be
asynchronous, the right approach is implementing synchronous_probing
flag not the other way around.  I actually wouldn't hate to see that
change happening but whoever submits and routes such a change should
be ready for a major shitstorm, I'm afraid.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello, Luis.

On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
 Meanwhile we are allowing a major design consideration such as a 30
 second timeout for both init + probe all of a sudden become a hard
 requirement for device drivers. I see your point but can't also be
 introducing major design changes willy nilly either. We *need* a
 solution for the affected drivers.

Yes, make the behavior specifically specified from userland.  When did
I ever say that there should be no solution for the problem?  I've
been saying that the behavior should be selected from userland from
the get-go, haven't I?

I have no idea how the seleciton should be.  It could be per-insmod or
maybe just a system-wide flag with explicit exceptions marked on
drivers is good enough.  I don't know.

 Also what stops drivers from going ahead and just implementing their
 own async probe? Would that now be frowned upon as it strives away

The drivers can't.  How many times should I explain the same thing
over and over again.  libata can't simply make probing asynchronous
w.r.t. module loading no matter how it does it.  Yeah, sure, there can
be other drivers which can do that without most people noticing it but
a storage driver isn't one of them and the storage drivers are the
problematic ones already, right?

 from the original design? The bool would let those drivers do this
 easily, and we would still need to identify these drivers, although
 this particular change can be NAK'd Oleg's suggestion on
 WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
 at least needed. And if its not async probe... what do those with
 failed drivers do?

I'm getting tired of explaining the same thing over and over again.
The said change was nacked because the whole approach of let's see
which drivers get reported on the issue which exists basically for all
drivers and just change the behavior of them is braindead.  It makes
no sense whatsoever.  It doesn't address the root cause of the problem
while making the same class of drivers behave significantly
differently for no good reason.  Please stop chasing your own tail and
try to understand the larger picture.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Arjan van de Ven

On 9/5/2014 3:29 PM, Tejun Heo wrote:

Hello, Dmitry.

On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:

I do not agree that it is actually user-visible change: generally speaking you
do not really know if device is there or not. They come and go. Like I said,
consider all permutations, with hot-pluggable buses, deferred probing, etc,


It is for storage devices which always have guaranteed synchronous
probing on module load and well-defined probing order.  Sure, modern
setups are a lot more dynamic but I'm quite certain that there are
setups in the wild which depend on storage driver loading being
synchronous.  We can't simply declare one day that such behavior is
broken and break, most likely, their boots.


we even depend on this in the mount-by-label cases

many setups assume that the internal storage prevails over the USB stick in the 
case of conflicts.
it's a security issue; you don't want the built in secure bootloader that has a 
kernel root argument
by label/uuid.
the security there tends to assume that built-in wins over USB

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Dmitry Torokhov
On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:
 On 9/5/2014 3:29 PM, Tejun Heo wrote:
 Hello, Dmitry.
 
 On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:
 I do not agree that it is actually user-visible change: generally speaking 
 you
 do not really know if device is there or not. They come and go. Like I said,
 consider all permutations, with hot-pluggable buses, deferred probing, etc,
 
 It is for storage devices which always have guaranteed synchronous
 probing on module load and well-defined probing order.  Sure, modern
 setups are a lot more dynamic but I'm quite certain that there are
 setups in the wild which depend on storage driver loading being
 synchronous.  We can't simply declare one day that such behavior is
 broken and break, most likely, their boots.
 
 we even depend on this in the mount-by-label cases
 
 many setups assume that the internal storage prevails over the USB stick in 
 the case of conflicts.
 it's a security issue; you don't want the built in secure bootloader that has 
 a kernel root argument
 by label/uuid.
 the security there tends to assume that built-in wins over USB

Ahem... and they sure it works reliably with large storage arrays? With
SCSI doing probing asynchronously already?

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello, Dmitry.

On Fri, Sep 05, 2014 at 03:49:17PM -0700, Dmitry Torokhov wrote:
 On Sat, Sep 06, 2014 at 07:31:39AM +0900, Tejun Heo wrote:
  On Sat, Sep 06, 2014 at 07:29:56AM +0900, Tejun Heo wrote:
   It is for storage devices which always have guaranteed synchronous
   probing on module load and well-defined probing order.
 
 Agree about probing order (IIRC that is why we had to revert the
 wholesale asynchronous probing a few years back) but totally disagree
 about synchronous module loading.

I don't get it.  This is a behavior userland already depends on for
boots.  What's there to agree or disagree?  This is just a fact that
we can't do this w/o disturbing some userlands in a major way.

 Anyway, I just posted a patch that I think preserves module loading
 behavior and solves my issue with built-in modules. It does not help
 Luis' issue though (but then I think the main problem is with systemd
 being stupid there).

This sure can be worked around from userland side too by not imposing
any timeout on module loading but that said for the same reasons that
you've been arguing until now, I actually do think that it's kinda
silly to make device probing synchronous to module loading at this
time and age.  What we disagree on is not that we want to separate
those waits.  It is about how to achieve it.

  To add a bit, if the argument here is that dependency on such behavior
  shouldn't exist and module loading and device probing should always be
  asynchronous, the right approach is implementing synchronous_probing
  flag not the other way around.  I actually wouldn't hate to see that
  change happening but whoever submits and routes such a change should
  be ready for a major shitstorm, I'm afraid.
 
 I think we already had this storm and that is why here we have opt-in
 behavior for the drivers.

It's a different shitstorm where we actively break bootings on some
userlands.  Trust me.  That's gonna be a lot worse.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hello,

On Fri, Sep 05, 2014 at 03:52:48PM -0700, Dmitry Torokhov wrote:
 Ahem... and they sure it works reliably with large storage arrays? With
 SCSI doing probing asynchronously already?

I believe this has been mentioned before too but, yes, SCSI device
probing is asynchronous and parallelized but the registration of the
discovered devices are fully serialized according to driver attach
order.  Storage devices are probed in parallel and attached in a fully
deterministic order.  That part has never changed.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Arjan van de Ven

On 9/5/2014 3:52 PM, Dmitry Torokhov wrote:

On Fri, Sep 05, 2014 at 03:45:08PM -0700, Arjan van de Ven wrote:

On 9/5/2014 3:29 PM, Tejun Heo wrote:

Hello, Dmitry.

On Fri, Sep 05, 2014 at 11:10:03AM -0700, Dmitry Torokhov wrote:

I do not agree that it is actually user-visible change: generally speaking you
do not really know if device is there or not. They come and go. Like I said,
consider all permutations, with hot-pluggable buses, deferred probing, etc,


It is for storage devices which always have guaranteed synchronous
probing on module load and well-defined probing order.  Sure, modern
setups are a lot more dynamic but I'm quite certain that there are
setups in the wild which depend on storage driver loading being
synchronous.  We can't simply declare one day that such behavior is
broken and break, most likely, their boots.


we even depend on this in the mount-by-label cases

many setups assume that the internal storage prevails over the USB stick in the 
case of conflicts.
it's a security issue; you don't want the built in secure bootloader that has a 
kernel root argument
by label/uuid.
the security there tends to assume that built-in wins over USB


Ahem... and they sure it works reliably with large storage arrays? With
SCSI doing probing asynchronously already?


you tend to trust your large storage array
you tend to not trust the walk up USB stick.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

2014-09-05 Thread Tejun Heo
Hey,

On Fri, Sep 05, 2014 at 04:22:42PM -0700, Dmitry Torokhov wrote:
  I don't get it.  This is a behavior userland already depends on for
  boots.  What's there to agree or disagree?  This is just a fact that
  we can't do this w/o disturbing some userlands in a major way.
 
 I am just expressing my disbelief that somebody relies on module loading
 being synchronous with probing. Out of curiosity, do you have any
 pointers?

I've seen initrd scripts which depended on the behavior to wait for
storage devices over the years.  AFAIK, none of the modern distros
does it but this has been such a basic feature all along and it seems
highly unlikely to me that there's no userland remaining out there
depending on such behavior.  We do have a lot of different userlands,
many of them quite ad-hoc.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html