Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-31 Thread Rusty Russell
On Thu, 1 Apr 2010 05:33:51 am Brandon Philips wrote:
> On 09:36 Tue 30 Mar 2010, Rusty Russell wrote:
> > The real fix here is to drop the lock, like Brandon suggested, but
> > we need to do it more carefully: when we re-acquire the lock we need
> > to re-lookup the symbol in case the module has vanished or changed.
> > 
> > Brandon, I can't see how libcrc32c's module_init calls
> > crypto_alloc_shash, but the problem is reproducible with simple
> > example modules.  Does it fix your problem?
> 
> Did you see my email yesterday explaining how this came about? Does
> that answer your question?

Yep, thanks.

> Reviewed the patch and it looks good and I tested it on the machine
> and it works. A couple of trivial things inline below if you care.
> 
> Signed-off-by: Brandon Philips 
> Cc: sta...@kernel.org
> 
> Thanks Rusty.

Thanks; already had the two caught by checkpatch; changing the other
printk is not something I like to do in the same patch.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-31 Thread Brandon Philips
On 09:36 Tue 30 Mar 2010, Rusty Russell wrote:
> The real fix here is to drop the lock, like Brandon suggested, but
> we need to do it more carefully: when we re-acquire the lock we need
> to re-lookup the symbol in case the module has vanished or changed.
> 
> Brandon, I can't see how libcrc32c's module_init calls
> crypto_alloc_shash, but the problem is reproducible with simple
> example modules.  Does it fix your problem?

Did you see my email yesterday explaining how this came about? Does
that answer your question?

http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg04363.html

> module: drop the lock while waiting for module to complete initialization.
> 
> This fixes "gave up waiting for init of module libcrc32c." which
> happened at boot time due to multiple parallel module loads.

Since the root cause is known a bit more detail can be added:

This fixes "gave up waiting for init of module libcrc32c." which is
triggered by parallel modprobes by udev for the same driver. For
example on a machine with two network interfaces:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

udev will fork off two modprobe processes:

/sbin/modprobe pci:v14E4d1639sv1014sd037Cbc02sc00i00
/sbin/modprobe pci:v14E4d1639sv1014sd037Cbc02sc00i00

> The problem was a deadlock: we wait for a module to finish
> initializing, but we keep the module_lock mutex so it can't
> complete.  In particular, this could reasonably happen if libcrc32c
> does a request_module() in its initialization routine.
> 
> So we change use_module() to return an errno rather than a bool, and if
> it's -EBUSY we drop the lock and wait in the caller, then reaquire the
> lock.
> 
> Reported-by: Brandon Philips 
> Signed-off-by: Rusty Russell 

Reviewed the patch and it looks good and I tested it on the machine
and it works. A couple of trivial things inline below if you care.

Signed-off-by: Brandon Philips 
Cc: sta...@kernel.org

Thanks Rusty.

Cheers,

Brandon


> - if (b == NULL || already_uses(a, b)) return 1;
> + if (b == NULL || already_uses(a, b)) return 0;

if (b == NULL || already_uses(a, b))
return 0;

>   /* If we're interrupted or time out, we fail. */
> - if (wait_event_interruptible_timeout(
> - module_wq, (err = strong_try_module_get(b)) != -EBUSY,
> - 30 * HZ) <= 0) {
> - printk("%s: gave up waiting for init of module %s.\n",
> -a->name, b->name);
> - return 0;
> - }
> -
> - /* If strong_try_module_get() returned a different error, we fail. */
> + err = strong_try_module_get(b);
>   if (err)
> - return 0;
> + return err;
>  
>   DEBUGP("Allocating new usage for %s.\n", a->name);
>   use = kmalloc(sizeof(*use), GFP_ATOMIC);
>   if (!use) {
>   printk("%s: out of memory loading\n", a->name);

printk(KERN_ERR "%s: out of memory loading\n", a->name);

>   module_put(b);
> - return 0;
> + return -ENOMEM;
>   }
>  
>   use->module_which_uses = a;
>   list_add(&use->list, &b->modules_which_use_me);
>   no_warn = sysfs_create_link(b->holders_dir, &a->mkobj.kobj, a->name);
> - return 1;
> + return 0;
>  }
>  EXPORT_SYMBOL_GPL(use_module);
>  
> @@ -823,7 +815,7 @@ static inline void module_unload_free(st
>  
>  int use_module(struct module *a, struct module *b)
>  {
> - return strong_try_module_get(b) == 0;
> + return strong_try_module_get(b);
>  }
>  EXPORT_SYMBOL_GPL(use_module);
>  
> @@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
>   struct module *owner;
>   const struct kernel_symbol *sym;
>   const unsigned long *crc;
> + DEFINE_WAIT(wait);
> + int err;
> + long timeleft = 30 * HZ;
>  
> +again:
>   sym = find_symbol(name, &owner, &crc,
> !(mod->taints & (1 << TAINT_PROPRIETARY_MODULE)), 
> true);
> - /* use_module can fail due to OOM,
> -or module initialization or unloading */
> - if (sym) {
> - if (!check_version(sechdrs, versindex, name, mod, crc, owner)
> - || !use_module(mod, owner))
> - sym = NULL;
> + if (!sym)
> + return NULL;
> +
> + if (!check_version(sechdrs, versindex, name, mod, crc, owner))
> + return NULL;
> +
> + prepare_to_wait(&module_wq, &wait, TASK_INTERRUPTIBLE);
> + err = use_module(mod, owner);
> + if (likely(!err) || err != -EBUSY || signal_pending(current)) {
> + finish_wait(&module_wq, &wait);
> + return err ? NULL : sym;
>   }
> - return sym;
> +
> + /* Module is still loading.  Drop lock and wait. */
> + mutex_unlo

Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-30 Thread Kay Sievers
On Tue, Mar 30, 2010 at 18:50, Brandon Philips  wrote:
> Side note: Intially I thought that modprobe was forking for each
> module dependency but that was incorrect after looking at the
> code. udev is forking off two modprobes almost at the same time for
> each bnx2x device alias when udevadm trigger is called during the boot
> sequence.

Right, that's the standard behavior, and applies to all usual systems.

> FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
> patched.

Yes, there is no SUSE patch at all. All usual systems ship the same stuff here.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-30 Thread Brandon Philips
On 20:29 Sat 20 Mar 2010, Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.

You are correct. Sorry for the confusion.

> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.
> 
> I think I have a good guess as to why this problem is occuring
> for Brandon.  It is probably the result of two near-simultaneous
> modprobes, one issued against libcrc32c and one against bnx2x.
> 
> So if Suse were doing some kind of parallel booting where multiple
> modules may be loaded together then this could occur.

It is actually due to two nearly simultaneous modprobes of bnx2x from
udev since there are two devices:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

Output from my modprobe strace wrappers:

Process 4589 (first to get to libcrc32c)


4589  execve("/sbin/modprobe.bin", ["/sbin/modprobe.bin", 
"pci:v14E4d1639sv1014"...], [/* 14 vars */]) = 0

...

4589  open("/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko", O_RDONLY) = 3
4589  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat("/sys/module/libcrc32c", 0x7fff3d03bef0) = -1 ENOENT (No such file 
or directory)
4589  read(3, "\177E", 2)   = 2
4589  lseek(3, 0, SEEK_SET) = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4589  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f7963d49000
4589  init_module(0x7f7963d49000, 90482, "") = 0
4589  munmap(0x7f7963d49000, 90482) = 0
4589  close(3)  = 0
4589  open("/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko", O_RDONLY) 
= 3
4589  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat("/sys/module/bnx2", 0x7fff3d03bf90) = -1 ENOENT (No such file or 
directory)
4589  read(3, "\177E", 2)   = 2
4589  lseek(3, 0, SEEK_SET) = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4589  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f7963c4c000
4589  init_module(0x7f7963c4c000, 950325, "") = 0

Process 4590 (loses out on libcrc32c and gets stuck)


4590  execve("/sbin/modprobe.bin", ["/sbin/modprobe.bin", 
"pci:v14E4d1639sv1014"...], [/* 14 vars */]) = 0



4590  open("/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko",O_RDONLY) = 3
4590  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat("/sys/module/libcrc32c", 0x752a0ef0) = -1 ENOENT (No such file 
or directory)
4590  read(3, "\177E", 2)   = 2
4590  lseek(3, 0, SEEK_SET) = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4590  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f1397a1b000
4590  init_module(0x7f1397a1b000, 90482, "") = -1 EEXIST (File exists)
4590  close(3)  = 0
4590  open("/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko", O_RDONLY) 
= 3
4590  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat("/sys/module/bnx2", 0x752a0f90) = -1 ENOENT (No such file or 
directory)
4590  read(3, "\177E", 2)   = 2
4590  lseek(3, 0, SEEK_SET) = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4590  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 
0x7f139791e000
4590  init_module(0x7f139791e000, 950325, "") = -1 ENOENT (No such file or 
directory)
4590  write(2, "FATAL: Error inserting bnx2 (/li"..., 145) = 145

As you can see 4589 inserts libcrc32c and the other starts inserting
bnx2 after seeing libcrc32c was loaded. Creating our situation. Phew,
took me a bit to see what was really going on :)

Side note: Intially I thought that modprobe was forking for each
module dependency but that was incorrect after looking at the
code. udev is forking off two modprobes almost at the same time for
each bnx2x device alias when udevadm trigger is called during the boot
sequence.

FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
patched.

Cheers,

Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majo

Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-29 Thread Rusty Russell
On Sat, 20 Mar 2010 10:59:59 pm Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.
> 
> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.

Not really.  The kernel code is pretty simple: if we can't find a symbol
we need, fail the load.  To avoid nasty spurious races, we *do* wait if the
symbol we need is in a module which is still initializing.  But we time out
after 30 seconds to avoid a module trainwreck.

The real fix here is to drop the lock, like Brandon suggested, but we need
to do it more carefully: when we re-acquire the lock we need to re-lookup
the symbol in case the module has vanished or changed.

Brandon, I can't see how libcrc32c's module_init calls crypto_alloc_shash,
but the problem is reproducible with simple example modules.  Does it fix
your problem?

Tim: note that use_module() return value changes.

Thanks,
Rusty.

module: drop the lock while waiting for module to complete initialization.

This fixes "gave up waiting for init of module libcrc32c." which
happened at boot time due to multiple parallel module loads.

The problem was a deadlock: we wait for a module to finish
initializing, but we keep the module_lock mutex so it can't complete.
In particular, this could reasonably happen if libcrc32c does a
request_module() in its initialization routine.

So we change use_module() to return an errno rather than a bool, and if
it's -EBUSY we drop the lock and wait in the caller, then reaquire the
lock.

Reported-by: Brandon Philips 
Signed-off-by: Rusty Russell 

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -510,33 +510,25 @@ int use_module(struct module *a, struct 
struct module_use *use;
int no_warn, err;
 
-   if (b == NULL || already_uses(a, b)) return 1;
+   if (b == NULL || already_uses(a, b)) return 0;
 
/* If we're interrupted or time out, we fail. */
-   if (wait_event_interruptible_timeout(
-   module_wq, (err = strong_try_module_get(b)) != -EBUSY,
-   30 * HZ) <= 0) {
-   printk("%s: gave up waiting for init of module %s.\n",
-  a->name, b->name);
-   return 0;
-   }
-
-   /* If strong_try_module_get() returned a different error, we fail. */
+   err = strong_try_module_get(b);
if (err)
-   return 0;
+   return err;
 
DEBUGP("Allocating new usage for %s.\n", a->name);
use = kmalloc(sizeof(*use), GFP_ATOMIC);
if (!use) {
printk("%s: out of memory loading\n", a->name);
module_put(b);
-   return 0;
+   return -ENOMEM;
}
 
use->module_which_uses = a;
list_add(&use->list, &b->modules_which_use_me);
no_warn = sysfs_create_link(b->holders_dir, &a->mkobj.kobj, a->name);
-   return 1;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -823,7 +815,7 @@ static inline void module_unload_free(st
 
 int use_module(struct module *a, struct module *b)
 {
-   return strong_try_module_get(b) == 0;
+   return strong_try_module_get(b);
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
struct module *owner;
const struct kernel_symbol *sym;
const unsigned long *crc;
+   DEFINE_WAIT(wait);
+   int err;
+   long timeleft = 30 * HZ;
 
+again:
sym = find_symbol(name, &owner, &crc,
  !(mod->taints & (1 << TAINT_PROPRIETARY_MODULE)), 
true);
-   /* use_module can fail due to OOM,
-  or module initialization or unloading */
-   if (sym) {
-   if (!check_version(sechdrs, versindex, name, mod, crc, owner)
-   || !use_module(mod, owner))
-   sym = NULL;
+   if (!sym)
+   return NULL;
+
+   if (!check_version(sechdrs, versindex, name, mod, crc, owner))
+   return NULL;
+
+   prepare_to_wait(&module_wq, &wait, TASK_INTERRUPTIBLE);
+   err = use_module(mod, owner);
+   if (likely(!err) || err != -EBUSY || signal_pending(current)) {
+   finish_wait(&module_wq, &wait);
+   return err ? NULL : sym;
}
-   return sym;
+
+   /* Module is still loading.  Drop lock and wait. */

Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-20 Thread Neil Horman
On Sat, Mar 20, 2010 at 08:29:59PM +0800, Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.
> 
> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.
> 
> I think I have a good guess as to why this problem is occuring
> for Brandon.  It is probably the result of two near-simultaneous
> modprobes, one issued against libcrc32c and one against bnx2x.
> 
> The libcrc32c module is partially loaded to the point of invoking
> its init function, which then tries to modprobe crc32c.
> 
> However, before this starts the modprobe on bnx2x is already in
> progression.  When bnx2x's loading tries to acquire a reference
> on libcrc32c which it depends on, we hit the dead-lock.
> 
> So if Suse were doing some kind of parallel booting where multiple
> modules may be loaded together then this could occur.
> 
> The easiest solution again would be for modprobe(8) to block the
> loading of bnx2x because the module that it depends on libcrc32c
> hasn't yet finished loading.
> 
> I'm open to a kernel solution too if anyone has suggestions.
> 

FWIW, this sounds like a regression in modprobe to me.  A few years ago I fixed
a deadlock condition in the netfilter conntrack code that was tickled by
parallel rmmod's and modprobes.  modprobe would take file locks on modules, and
if the same module was getting rmmodded and modprobed in parallel we'd wind up
with a deadlock.  I fixed it by making the default modprobe -r behavior to be
non-blocking (which is the same as rmmod).  That commit is here:
http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commit;h=b45a24e9c89a14baf63bffe0a9ff04c1c1bffb29

Later, in late 2009, That behavior was reverted:
http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commit;h=b45a24e9c89a14baf63bffe0a9ff04c1c1bffb29

withuot consideration of the consequences, of which this sounds like one.

JCM I think is working on fixing the problem in a sane way.  I'd suggested that
he reapply the patch, but IIRC he told me that hes planning on trying to fix it
by removing the file locking on the modules in userspace entirely, which I think
is also reasonable.

As a test, you might try massaging my old patch above into the latest
module-init-tools to see if it makes the problem go away.  Note, the result of
this will be that either the modprobe or rmmod will fail and will need to be
retried, but its non-fatal, and a retry is usually successful, as it moves the
rmmod and modprobe further apart in time.

Regards
Neil

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-20 Thread Herbert Xu
On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
>
> I hear what you're saying Herbert, but thinking about this a bit I
> really think we should make this situation work instead of fail.

I think the initial report perhaps painted this in a slight
different fashion than what it really is.  The code that was
looping in module.c is not trying to load libcrc32c, but rather
it is trying to get a reference on the already-loaded libcrc32c
module.

AFAICS the only way to make it "work" would be to reload the
module in question when we can't get a reference on it.  But
that would entail recursively loading a module during the process
of loading another module.

Rusty can chime in on whether this is doable.

I think I have a good guess as to why this problem is occuring
for Brandon.  It is probably the result of two near-simultaneous
modprobes, one issued against libcrc32c and one against bnx2x.

The libcrc32c module is partially loaded to the point of invoking
its init function, which then tries to modprobe crc32c.

However, before this starts the modprobe on bnx2x is already in
progression.  When bnx2x's loading tries to acquire a reference
on libcrc32c which it depends on, we hit the dead-lock.

So if Suse were doing some kind of parallel booting where multiple
modules may be loaded together then this could occur.

The easiest solution again would be for modprobe(8) to block the
loading of bnx2x because the module that it depends on libcrc32c
hasn't yet finished loading.

I'm open to a kernel solution too if anyone has suggestions.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread David Miller
From: Herbert Xu 
Date: Sat, 20 Mar 2010 12:24:34 +0800

> On Fri, Mar 19, 2010 at 09:21:03PM -0700, Brandon Philips wrote:
>>
>> Do you see how it is getting stuck though with modprobe bnx2 and
>> modprobe libcrc32c going at the same time though?
> 
> Yes I agree that's a bug.  Something in those paths should simply
> fail instead of waiting around.

I hear what you're saying Herbert, but thinking about this a bit I
really think we should make this situation work instead of fail.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Herbert Xu
On Fri, Mar 19, 2010 at 09:21:03PM -0700, Brandon Philips wrote:
>
> Do you see how it is getting stuck though with modprobe bnx2 and
> modprobe libcrc32c going at the same time though?

Yes I agree that's a bug.  Something in those paths should simply
fail instead of waiting around.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Brandon Philips
On 09:08 Sat 20 Mar 2010, Herbert Xu wrote:
> On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
> >
> > IIRC modprobe(8) in user-space is supposed to load all dependencies
> > prior to loading the main module.  So something must've gone wrong
> > before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
> > wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.
> 
> In fact I just tried it on my machine and modprobe bnx2x correctly
> loaded libcrc32c + crc32c before loading bnx2x (you don't need the
> hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.
> 
> I suspect it's a user-space problem.  I'm using module-init-tools
> 3.4-1 from Debian.

This is module-init-tools 3.11.1 and SUSE Linux Enterprise SP1 RC1. I
can only reproduce on boot (thus why I changed bnx2 which I have) so
perhaps SUSE is doing something odd in how it loads modules at boot.

I will trace the boot process and get more information on exactly what
happens before it gets stuck.

Do you see how it is getting stuck though with modprobe bnx2 and
modprobe libcrc32c going at the same time though?

Cheers,

Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Herbert Xu
On Fri, Mar 19, 2010 at 07:46:12PM -0700, David Miller wrote:
>
> Even if userland "does" this, nothing stops another thread
> from rmmod'ing the dependency in between the modloads.
> 
> We have to handle this case properly in the kernel, no matter
> what.

Right, if that had happened I think the reasonable response would
be to fail the subsequent load for lack of dependency rather than
looping around forever.

However, I don't think that's what Brandon is seeing here, unless
there is something performing the rmmod?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread David Miller
From: Herbert Xu 
Date: Sat, 20 Mar 2010 09:08:49 +0800

> On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
>>
>> IIRC modprobe(8) in user-space is supposed to load all dependencies
>> prior to loading the main module.  So something must've gone wrong
>> before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
>> wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.
> 
> In fact I just tried it on my machine and modprobe bnx2x correctly
> loaded libcrc32c + crc32c before loading bnx2x (you don't need the
> hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.
> 
> I suspect it's a user-space problem.  I'm using module-init-tools
> 3.4-1 from Debian.

Even if userland "does" this, nothing stops another thread
from rmmod'ing the dependency in between the modloads.

We have to handle this case properly in the kernel, no matter
what.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Herbert Xu
On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
>
> IIRC modprobe(8) in user-space is supposed to load all dependencies
> prior to loading the main module.  So something must've gone wrong
> before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
> wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.

In fact I just tried it on my machine and modprobe bnx2x correctly
loaded libcrc32c + crc32c before loading bnx2x (you don't need the
hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.

I suspect it's a user-space problem.  I'm using module-init-tools
3.4-1 from Debian.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Herbert Xu
On Fri, Mar 19, 2010 at 04:40:45PM -0700, Brandon Philips wrote:
>
> Possible solutions:
> 
> 1) Figure out the right place to drop the module_lock while waiting on
> other modules to load. This boots and seems to work but who knows what
> it will break. :) This might be the right fix after investigating
> possible issues.

I'm not sure that this is the problem.

IIRC modprobe(8) in user-space is supposed to load all dependencies
prior to loading the main module.  So something must've gone wrong
before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fixing gave up waiting for init of module libcrc32c.

2010-03-19 Thread Brandon Philips
Hello Herbert, Dave-

I have a little "lockup" in libcrc32c on boot and I don't know how to
solve it correctly but there are a few ideas.

First, note this bug is happening with bnx2x. But, since I didn't have
the hardware I added usage of crc32c to bnx2 to reproduce.

Here are backtraces of the two processes and the relevant dmesg
output:

[   61.816131] ehci_hcd :00:1d.7: suspend root hub
[   91.512113] bnx2: gave up waiting for init of module libcrc32c.
[   91.524105] bnx2: Unknown symbol crc32c
[   92.215003] bp: lib/libcrc32c.c:66 libcrc32c_mod_init crypto_alloc_shash 
SUCCESS
[   92.401131] Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.4 (Mar 
03, 2010)
[   92.401262]   alloc irq_desc for 16 on node 0
[   92.401266]   alloc kstat_irqs on node 0
[   92.401282] bnx2 :02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16

The issue is that bnx2 is waiting for crc32c on use_module while
libcrc32c is trying to insert crc32-generic to resolve the symbol. You
can see it here via the process stacks for modprobe:

[0]kdb> btp 1661  # modprobe for bnx2
Stack traceback for pid 1661
0x88051209ddc0 1661 1657  0   13   S  0x88051209e300  modprobe
[   75.797063]  8805131e7c78 0046  

[   75.797063]  e480 8805131e7fd8 88051209ddc0 
88051209e1c0
[   75.797063]  1edb 8109517d 8800283ae480 
8800283b3e18
[   75.797063] Call Trace:
[   75.797063]  [] schedule_timeout+0x1e5/0x2f0
[   75.797063]  [] use_module+0x2eb/0x330
[   75.797063]  [] load_module+0x1545/0x1e10
[   75.797063]  [] sys_init_module+0x81/0x2b0
[   75.797063]  [] system_call_fastpath+0x16/0x1b
[   75.797063]  [<7f4bf994dfba>] 0x7f4bf994dfba
[0]kdb> 
[0]kdb> 
[0]kdb> 
[0]kdb> btp 1662 #modprobe for libcrc32c
Stack traceback for pid 1662  # modprobe for libcrc32c blocking on 
0x880513278980 1662 1658  0   13   D  0x880513278ec0  modprobe
[   75.797063]  88051254bb28 0046 88051254baf8 
88051254baf4
[   75.797063]  e480 88051254bfd8 880513278980 
880513278d80
[   75.797063]  1e51 8800282d3e00 8800283ae480 
8800283b3e18
[   75.797063] Call Trace:
[   75.797063]  [] schedule_timeout+0x25d/0x2f0
[   75.797063]  [] wait_for_common+0xdb/0x190
[   75.797063]  [] wait_for_completion+0x1d/0x20
[   75.797063]  [] call_usermodehelper_exec+0xd2/0xe0
[   75.797063]  [] __request_module+0x19c/0x1f0
[   75.797063]  [] crypto_larval_lookup+0x10f/0x210
[   75.797063]  [] crypto_alg_mod_lookup+0x63/0xf0
[   75.797063]  [] crypto_find_alg+0xae/0xd0
[   75.797063]  [] crypto_alloc_tfm+0x84/0x100
[   75.797063]  [] crypto_alloc_shash+0x19/0x20
[   75.797063]  [] libcrc32c_mod_init+0x35/0x72 [libcrc32c]
[   75.797063]  [] do_one_initcall+0x3d/0x1c0
[   75.797063]  [] sys_init_module+0xed/0x2b0
[   75.797063]  [] system_call_fastpath+0x16/0x1b
[   75.797063]  [<7f4aeefaffba>] 0x7f4aeefaffba

Possible solutions:

1) Figure out the right place to drop the module_lock while waiting on
other modules to load. This boots and seems to work but who knows what
it will break. :) This might be the right fix after investigating
possible issues.

Index: linux-2.6.32/kernel/module.c
===
--- linux-2.6.32.orig/kernel/module.c
+++ linux-2.6.32/kernel/module.c
@@ -678,6 +678,8 @@ int use_module(struct module *a, struct
 
if (b == NULL || already_uses(a, b)) return 1;
 
+   mutex_unlock(&module_mutex);
+
/* If we're interrupted or time out, we fail. */
if (wait_event_interruptible_timeout(
module_wq, (err = strong_try_module_get(b)) != -EBUSY,
@@ -687,6 +689,8 @@ int use_module(struct module *a, struct
return 0;
}
 
+   mutex_lock(&module_mutex);
+
/* If strong_try_module_get() returned a different error, we fail. */
if (err)
return 0;

2) Force libcrc32c to be built in?

3) Tell everyone to load it first thing in initrd? This seems hacky.

What are you thoughts on the right way to fix this?

Cheers,

Brandon
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html