Re: Module and device configuration locking [was Re: Modules loading modules?]
Updating the status on these changes: One comment questioned whether or not a version bump was required, and I've more-or-less convinced myself it is at least desired. While properly-working modules from the pre-update will continue to work on a post-update kernel, the reverse is not necessarily true. A module written for a post-update kernel and which takes advantage of the changed locking protocol will fail on a pre-update kernel. Another comment suggested that the name of newly-created file kern/kern_cfg.c should be changed to more closely match its contents, while an earlier comment had suggested generalizing the filename! I've taken the more-specific path and the file is now called kern_cfglock.c If other stuff gets added to this file later, we can come up with a more generic name at that time. Some additional changes have been included in this latest set of diffs. Mostly, there were some KASSERT()s sprinkled throughout that tried to ensure that the code had not been called recursively. Since recursion is now explicitly supported, all of the "module_active" stuff has been reworked. Additionally, there was a statically-allocated list of "pending" modules that needed to have their initialization completed. This has been changed to keep multiple stack-allocated lists, one for each depth. I'm planning to commit these changes next weekend, unless there are any significant objections. If anyone else needs to coordinate changes in order to "ride-the-version-bump", let me know. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - ML.diff.tgz Description: Binary data
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Sun, 8 Aug 2010, Mindaugas Rasiukevicius wrote: Paul Goyette wrote: Following up on all the various comments from jmcneil@, pooka@, and rmind@, I've attached a revised set of diffs. The most significant change between this and the previous revision is the separation of the kernconfig_lock_*() stuff into a separate kern_cfg.c source file. This allows inclusion of the kernel code directly into rump, rather than having to re-implement it in rump/klock.c Please use kern_cflock.c or something like that, as kern_cfg.c is too generic for such specific facility. My original plan was to use kern_cfglock.c but pooka@ suggested using a more-generic name "in case we have more stuff in the future". :) I personally prefer the more-specific name myself, but I cannot make you both happy! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
Paul Goyette wrote: > Following up on all the various comments from jmcneil@, pooka@, and > rmind@, I've attached a revised set of diffs. The most significant > change between this and the previous revision is the separation of the > kernconfig_lock_*() stuff into a separate kern_cfg.c source file. This > allows inclusion of the kernel code directly into rump, rather than > having to re-implement it in rump/klock.c Please use kern_cflock.c or something like that, as kern_cfg.c is too generic for such specific facility. -- Mindaugas
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Thu, 5 Aug 2010, Paul Goyette wrote: On Thu, 5 Aug 2010, Paul Goyette wrote: OK, got it. I'll have another set of patches in a few days. Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: Part 1 defines a set of new kernel locking primitives in file kern/kern_lock.c to implement the recursive kernconfig_mutex. (I'm not really stuck on the name, so open to alternative suggestions!) All of the locking in sys/kern_module.c has been updated to use this instead of the internal module_lock mutex. Additionally, the module_autoload() routine now does its own locking, rather than requiring its caller to do it. And the description of locking in the module(9) man page is updated. Part 2 removes all of the explicit module_{,un}lock() calls from the various xxx_verbose modules that I'd previous worked on. Part 3 removes all of the explicit module_{,un}lock() calls from other bits and pieces of the kernel. In a couple of places, new calls to kernconfig_{,un}lock() are inserted. Part 4 updates rump to provide equivalent locking routines. (pooka, I would appreciate you looking at this.) Part 5 adds a new atf test-case to the existing module tests to verify that recursion actually works! Following up on all the various comments from jmcneil@, pooka@, and rmind@, I've attached a revised set of diffs. The most significant change between this and the previous revision is the separation of the kernconfig_lock_*() stuff into a separate kern_cfg.c source file. This allows inclusion of the kernel code directly into rump, rather than having to re-implement it in rump/klock.c - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | - ML.diff.tgz Description: Binary data
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Fri, 6 Aug 2010, Mindaugas Rasiukevicius wrote: Hello, Paul Goyette wrote: Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: <...> Few comments on the patch: -#define__NetBSD_Version__ 599003800 /* NetBSD 5.99.38 */ +#define__NetBSD_Version__ 599003900 /* NetBSD 5.99.39 */ Why bumping? Seemed like the right thing to do, since the locking protocol is changing, as well as the global module_lock mutex disappearing. But I really can't hink of any ill effects, unless someone has a module that suddenly fails link-and-load because it references module_lock. It's not very likely. I'll defer, or I'll wait to ride someone else's bump just to be safe. + * of of kernel functionality, such as device configuration and loading One "of" too much. Fixed. +void +kernconfig_lock(void) +{ + lwp_t *l; + + /* +* OK to check this unlocked, since it could only be set to +* curlwp by the current thread itself, and not by an interrupt +* or any other LWP. +*/ Please add KASSERT(!cpu_intr_p()); Done. I was worried about this, and forgot to ask. Thanks for picking it up. + l = curlwp; + if (kernconfig_lwp == l) { + kernconfig_recurse++; + KASSERT(kernconfig_recurse != 0); Such (++ then != 0) assert is a bit useless. How about: KASSERT(kernconfig_recurse > 1); The intent here was to ensure that we didn't overflow the counter back to zero. It might have made sense when the counter was a uint8_t, but probably not with an int! It can probably be removed completely. + } else { + mutex_enter(&kernconfig_mutex); + kernconfig_lwp = l; + kernconfig_recurse = 1; + } +} Somebody will confuse l with 1. :) This came straight from ad@'s suggestions, and there seems to be a lot of precedent in the sys/kern/ code to use l. But I will select a better variable name. + if (--kernconfig_recurse == 0) { + kernconfig_lwp = 0; + mutex_exit(&kernconfig_mutex); kernconfig_lwp = NULL; Ooops! :) +bool +kernconfig_is_held(void) +{ + + return (mutex_owned(&kernconfig_mutex)); No need for brackets around function. Got it. + KASSERT(kernconfig_is_held); But missing brackets in few cases. Yeah, you're the second one to point that out. I fixed them once, not sure how they crept back in. But they've been fixed again. Otherwise seems OK to me. Thanks. Thank you for the detailed look. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
Hello, Paul Goyette wrote: > > Well, it was a slow day at the office today, so I found some time to > work on this! :) > > Attached is the latest version of this change. For simplicity, I have > broken the patches up into five separate groups: > > <...> Few comments on the patch: > -#define __NetBSD_Version__ 599003800 /* NetBSD 5.99.38 */ > +#define __NetBSD_Version__ 599003900 /* NetBSD 5.99.39 */ Why bumping? > + * of of kernel functionality, such as device configuration and loading One "of" too much. > +void > +kernconfig_lock(void) > +{ > + lwp_t *l; > + > + /* > + * OK to check this unlocked, since it could only be set to > + * curlwp by the current thread itself, and not by an interrupt > + * or any other LWP. > + */ Please add KASSERT(!cpu_intr_p()); > + l = curlwp; > + if (kernconfig_lwp == l) { > + kernconfig_recurse++; > + KASSERT(kernconfig_recurse != 0); Such (++ then != 0) assert is a bit useless. How about: KASSERT(kernconfig_recurse > 1); > + } else { > + mutex_enter(&kernconfig_mutex); > + kernconfig_lwp = l; > + kernconfig_recurse = 1; > + } > +} Somebody will confuse l with 1. :) > + if (--kernconfig_recurse == 0) { > + kernconfig_lwp = 0; > + mutex_exit(&kernconfig_mutex); kernconfig_lwp = NULL; > +bool > +kernconfig_is_held(void) > +{ > + > + return (mutex_owned(&kernconfig_mutex)); No need for brackets around function. > + KASSERT(kernconfig_is_held); But missing brackets in few cases. Otherwise seems OK to me. Thanks. -- Mindaugas
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Thu, 5 Aug 2010, Paul Goyette wrote: OK, got it. I'll have another set of patches in a few days. Well, it was a slow day at the office today, so I found some time to work on this! :) Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: Part 1 defines a set of new kernel locking primitives in file kern/kern_lock.c to implement the recursive kernconfig_mutex. (I'm not really stuck on the name, so open to alternative suggestions!) All of the locking in sys/kern_module.c has been updated to use this instead of the internal module_lock mutex. Additionally, the module_autoload() routine now does its own locking, rather than requiring its caller to do it. And the description of locking in the module(9) man page is updated. Part 2 removes all of the explicit module_{,un}lock() calls from the various xxx_verbose modules that I'd previous worked on. Part 3 removes all of the explicit module_{,un}lock() calls from other bits and pieces of the kernel. In a couple of places, new calls to kernconfig_{,un}lock() are inserted. Part 4 updates rump to provide equivalent locking routines. (pooka, I would appreciate you looking at this.) Part 5 adds a new atf test-case to the existing module tests to verify that recursion actually works! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -Index: sys/sys/param.h === RCS file: /cvsroot/src/sys/sys/param.h,v retrieving revision 1.373 diff -u -p -r1.373 param.h --- sys/sys/param.h 28 Jul 2010 11:03:47 - 1.373 +++ sys/sys/param.h 6 Aug 2010 00:46:05 - @@ -63,7 +63,7 @@ * 2.99.9 (299000900) */ -#define__NetBSD_Version__ 599003800 /* NetBSD 5.99.38 */ +#define__NetBSD_Version__ 599003900 /* NetBSD 5.99.39 */ #define __NetBSD_Prereq__(M,m,p) (M) * 1) + \ (m) * 100) + (p) * 100) <= __NetBSD_Version__) Index: sys/sys/systm.h === RCS file: /cvsroot/src/sys/sys/systm.h,v retrieving revision 1.239 diff -u -p -r1.239 systm.h --- sys/sys/systm.h 31 Jan 2010 02:04:43 - 1.239 +++ sys/sys/systm.h 6 Aug 2010 00:46:06 - @@ -492,6 +492,11 @@ void kernel_lock_init(void); void _kernel_lock(int); void _kernel_unlock(int, int *); +void kernconfig_lock_init(void); +void kernconfig_lock(void); +void kernconfig_unlock(void); +bool kernconfig_is_held(void); + #if defined(MULTIPROCESSOR) || defined(_MODULE) #defineKERNEL_LOCK(count, lwp) \ do { \ Index: sys/sys/module.h === RCS file: /cvsroot/src/sys/sys/module.h,v retrieving revision 1.24 diff -u -p -r1.24 module.h --- sys/sys/module.h26 Jun 2010 07:23:57 - 1.24 +++ sys/sys/module.h6 Aug 2010 00:46:07 - @@ -115,7 +115,6 @@ __link_set_add_rodata(modules, name##_mo TAILQ_HEAD(modlist, module); extern struct vm_map *module_map; -extern kmutex_tmodule_lock; extern u_int module_count; extern u_int module_builtinlist; extern struct modlist module_list; Index: sys/kern/kern_lock.c === RCS file: /cvsroot/src/sys/kern/kern_lock.c,v retrieving revision 1.150 diff -u -p -r1.150 kern_lock.c --- sys/kern/kern_lock.c20 Dec 2009 20:42:23 - 1.150 +++ sys/kern/kern_lock.c6 Aug 2010 00:46:07 - @@ -39,6 +39,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_lock.c, #include #include #include +#include #include #include #include @@ -56,6 +57,10 @@ bool kernel_lock_dodebug; __cpu_simple_lock_t kernel_lock[CACHE_LINE_SIZE / sizeof(__cpu_simple_lock_t)] __aligned(CACHE_LINE_SIZE); +static kmutex_t kernconfig_mutex; +static lwp_t *kernconfig_lwp; +static int kernconfig_recurse; + void assert_sleepable(void) { @@ -306,3 +311,59 @@ _kernel_unlock(int nlocks, int *countp) if (countp != NULL) *countp = olocks; } + +/* + * Functions for manipulating the kernel configuration lock. This + * recursive lock should be used to protect all additions and removals + * of of kernel functionality, such as device configuration and loading + * of modular kernel components. + */ + +void +kernconfig_lock_init(void) +{ + + mutex_init(&kernconfig_mutex, MUTEX_DEFAULT, IPL_NONE); + kernconfig_lwp = NULL;
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Thu, 5 Aug 2010, Paul Goyette wrote: Attached is the latest version of this change. For simplicity, I have broken the patches up into five separate groups: Part 1 defines a set of new kernel locking primitives in file kern/kern_lock.c to implement the recursive kernconfig_mutex. (I'm not really stuck on the name, so open to alternative suggestions!) All of the locking in sys/kern_module.c has been updated to use this instead of the internal module_lock mutex. Additionally, the module_autoload() routine now does its own locking, rather than requiring its caller to do it. And the description of locking in the module(9) man page is updated. Part 2 removes all of the explicit module_{,un}lock() calls from the various xxx_verbose modules that I'd previous worked on. Part 3 removes all of the explicit module_{,un}lock() calls from other bits and pieces of the kernel. In a couple of places, new calls to kernconfig_{,un}lock() are inserted. Part 4 updates rump to provide equivalent locking routines. (pooka, I would appreciate you looking at this.) Part 5 adds a new atf test-case to the existing module tests to verify that recursion actually works! I should add that I'm actually running with these changes on one of my local machines with no visible problems, and the above-added test case works. The machine has a "AMD Phenom(tm) 9600 Quad-Core Processor" at 2.3MHz. {103} modstat NAME CLASS SOURCE REFS SIZE REQUIRES bpf driver builtin0 -- compat misc builtin0 -- coredump misc filesys1 3402 - exec_elf64 misc filesys0 7960 coredump exec_script misc filesys0 1256 - ffs vfsboot 0 153316 - kernfs vfsfilesys0 12914- nfs vfsfilesys0 159471 - ptyfsvfsfilesys0 10118- secmodel_bsd44 secmodel builtin0 -suser,securelevel securelevel secmodel builtin1 -- susersecmodel builtin1 -- tmpfsvfsfilesys0 19513- wapblvfsbuiltin0 -- {104} - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: Module and device configuration locking [was Re: Modules loading modules?]
OK, got it. I'll have another set of patches in a few days. On Thu, 5 Aug 2010, Andrew Doran wrote: On Wed, Aug 04, 2010 at 10:28:56AM -0700, Paul Goyette wrote: On Wed, 4 Aug 2010, Andrew Doran wrote: Sorry for not replying sooner. Please don't add a generic recursive lock facility, it would be abused. Yeah! My current patches have no new generic facilities at all. I'd like something roughly like the following. I think this should also cover major configuration tasks such as device attach and detach, so it wouldn't really be module specific. The naming is a poor suggestion. void sysconfig_lock(void) { if (mutex_owned(&sysconfig_mutex)) { /* * At this point it's OK for the thread to hold other * locks, since the system configuration lock was the * first taken on the way in. */ sysconfig_recurse++; return; } /* * I don't remember the correct argument sequence to the * following, but basically we need to assert that NO locks, * sleep or spin, other than kernel_lock are held, * as the system configuration lock would be the outermost * lock in the system. */ LOCKDEBUG_BARRIER(...); mutex_enter(&sysconfig_mutex); KASSERT(sysconfig_recurse == 0); } For the boot autoconfig path where you have things being driven by the kernel, this would work OK. Situations where there's a kthread or something attaching and detaching devices on the fly are a bit different, since they likely have local atomicity requirements (need to take device private locks). There you'd probably need the caller to take the lock. USB comes to mind (along with some hardware RAID drivers that do hotplug attach/detach). Thoughts? Well, at first glance, this proposal makes sense. But it is certainly a much larger task than dealing with the immediate problem - modules which want to load other modules. So, I really have three questions: 1. In keeping with earlier concerns about holding locks for long periods of time (eeh and mrg both commented on this), the approach described For module lock the wait time isn't really a big concern. Module load and unload is a heavyweight operation so the wait time is expected. There is nothing intrinsically wrong with holding a mutex for "a long time" provided that you've got a good handle on the potential side effects. It's a different story for other locks in the system like say proc_lock, which can't be held for a long time without disrupting the user experience and/or deadlocking the system. above would seem to have an even longer hold-time than the current module_lock. Would not something similar to haad's condvar approach be appropriate for this proposal, as well as for the more-limited-in- scope recursive module lock? condvar gives you another lock in a roundabout way, without priority inheritance to help move things along if something has clogged the pipework. :-). So I don't see benefit over a mutex. 2. Would it be reasonable to solve the immediate problem as previously proposed, rather than waiting for this "ultimate" solution? I think it would be a long time before we could find and resolve all of the "issues" that might be created in the various "threaded" situations that may exist. I know I'm certainly not sufficiently qualified to identify all those "situations", and I also know that I don't have sufficient available work-hours to do this in any reasonable time- frame. I'd still like to move forward with the most recent solution to the module_lock problem. I'm not suggesting that you need to tackle autoconfiguration locking at the same time.. What I'm suggesting is to put the primitives in place (say in kern_lock.c or something) and then use these for the benefit of the modules code. It would provide a statment of direction and avoid re-hashing things later when somebody decides to tackle autoconf. Sorry for being unclear. 3. Since we're still technically abusing the mutex(9) man-page caveat about using mutex_owned() for making locking decisions, it would be very much appreciated (at least by myself) to have an explanation of WHY it is OK in this case to do it here, but not somewhere else. Ok well I think the simplest course of action would be to replace the mutex_owned() with a global variable. It would do much the same thing but for the price of 4 or 8 bytes there is no question of contradicting the mantra. So like: /* * Ok to check this unlocked, as for the value to equal curlwp it must * have been set by the current thread of execution (i.e. not interrupt * context or another LWP). */ l = curlwp; if (sysconfig_lwp == l) { /* recurse */ } else { mutex_enter(&sysconfig_lock); sysconfig_lwp = l; /* etc etc */ } --
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Wed, Aug 04, 2010 at 10:28:56AM -0700, Paul Goyette wrote: > On Wed, 4 Aug 2010, Andrew Doran wrote: > > >Sorry for not replying sooner. > > > >Please don't add a generic recursive lock facility, it would be abused. > > Yeah! My current patches have no new generic facilities at all. > > >I'd like something roughly like the following. I think this should > >also cover major configuration tasks such as device attach and detach, > >so it wouldn't really be module specific. The naming is a poor suggestion. > > > >void > >sysconfig_lock(void) > >{ > > > > if (mutex_owned(&sysconfig_mutex)) { > > /* > > * At this point it's OK for the thread to hold other > > * locks, since the system configuration lock was the > > * first taken on the way in. > > */ > > sysconfig_recurse++; > > return; > > } > > /* > > * I don't remember the correct argument sequence to the > > * following, but basically we need to assert that NO locks, > > * sleep or spin, other than kernel_lock are held, > >* as the system configuration lock would be the outermost > > * lock in the system. > > */ > > LOCKDEBUG_BARRIER(...); > > mutex_enter(&sysconfig_mutex); > > KASSERT(sysconfig_recurse == 0); > >} > > > >For the boot autoconfig path where you have things being driven by the > >kernel, this would work OK. Situations where there's a kthread or > >something attaching and detaching devices on the fly are a bit different, > >since they likely have local atomicity requirements (need to take device > >private locks). There you'd probably need the caller to take the lock. > >USB comes to mind (along with some hardware RAID drivers that do hotplug > >attach/detach). > > > >Thoughts? > > Well, at first glance, this proposal makes sense. But it is > certainly a much larger task than dealing with the immediate problem > - modules which want to load other modules. > > So, I really have three questions: > > 1. In keeping with earlier concerns about holding locks for long periods >of time (eeh and mrg both commented on this), the approach described For module lock the wait time isn't really a big concern. Module load and unload is a heavyweight operation so the wait time is expected. There is nothing intrinsically wrong with holding a mutex for "a long time" provided that you've got a good handle on the potential side effects. It's a different story for other locks in the system like say proc_lock, which can't be held for a long time without disrupting the user experience and/or deadlocking the system. >above would seem to have an even longer hold-time than the current >module_lock. Would not something similar to haad's condvar approach >be appropriate for this proposal, as well as for the more-limited-in- >scope recursive module lock? condvar gives you another lock in a roundabout way, without priority inheritance to help move things along if something has clogged the pipework. :-). So I don't see benefit over a mutex. > 2. Would it be reasonable to solve the immediate problem as previously >proposed, rather than waiting for this "ultimate" solution? I think >it would be a long time before we could find and resolve all of the >"issues" that might be created in the various "threaded" situations >that may exist. I know I'm certainly not sufficiently qualified to >identify all those "situations", and I also know that I don't have >sufficient available work-hours to do this in any reasonable time- >frame. > >I'd still like to move forward with the most recent solution to the >module_lock problem. I'm not suggesting that you need to tackle autoconfiguration locking at the same time.. What I'm suggesting is to put the primitives in place (say in kern_lock.c or something) and then use these for the benefit of the modules code. It would provide a statment of direction and avoid re-hashing things later when somebody decides to tackle autoconf. Sorry for being unclear. > 3. Since we're still technically abusing the mutex(9) man-page caveat >about using mutex_owned() for making locking decisions, it would >be very much appreciated (at least by myself) to have an explanation >of WHY it is OK in this case to do it here, but not somewhere else. Ok well I think the simplest course of action would be to replace the mutex_owned() with a global variable. It would do much the same thing but for the price of 4 or 8 bytes there is no question of contradicting the mantra. So like: /* * Ok to check this unlocked, as for the value to equal curlwp it must * have been set by the current thread of execution (i.e. not interrupt * context or another LWP). */ l = curlwp; if (sysconfig_lwp == l) { /* recurse */ } else { mutex_enter(&sysconfig_lock); sysconfig_lwp = l; /* etc etc */ }
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Wed, 4 Aug 2010, Andrew Doran wrote: Sorry for not replying sooner. Please don't add a generic recursive lock facility, it would be abused. Yeah! My current patches have no new generic facilities at all. I'd like something roughly like the following. I think this should also cover major configuration tasks such as device attach and detach, so it wouldn't really be module specific. The naming is a poor suggestion. void sysconfig_lock(void) { if (mutex_owned(&sysconfig_mutex)) { /* * At this point it's OK for the thread to hold other * locks, since the system configuration lock was the * first taken on the way in. */ sysconfig_recurse++; return; } /* * I don't remember the correct argument sequence to the * following, but basically we need to assert that NO locks, * sleep or spin, other than kernel_lock are held, * as the system configuration lock would be the outermost * lock in the system. */ LOCKDEBUG_BARRIER(...); mutex_enter(&sysconfig_mutex); KASSERT(sysconfig_recurse == 0); } For the boot autoconfig path where you have things being driven by the kernel, this would work OK. Situations where there's a kthread or something attaching and detaching devices on the fly are a bit different, since they likely have local atomicity requirements (need to take device private locks). There you'd probably need the caller to take the lock. USB comes to mind (along with some hardware RAID drivers that do hotplug attach/detach). Thoughts? Well, at first glance, this proposal makes sense. But it is certainly a much larger task than dealing with the immediate problem - modules which want to load other modules. So, I really have three questions: 1. In keeping with earlier concerns about holding locks for long periods of time (eeh and mrg both commented on this), the approach described above would seem to have an even longer hold-time than the current module_lock. Would not something similar to haad's condvar approach be appropriate for this proposal, as well as for the more-limited-in- scope recursive module lock? 2. Would it be reasonable to solve the immediate problem as previously proposed, rather than waiting for this "ultimate" solution? I think it would be a long time before we could find and resolve all of the "issues" that might be created in the various "threaded" situations that may exist. I know I'm certainly not sufficiently qualified to identify all those "situations", and I also know that I don't have sufficient available work-hours to do this in any reasonable time- frame. I'd still like to move forward with the most recent solution to the module_lock problem. 3. Since we're still technically abusing the mutex(9) man-page caveat about using mutex_owned() for making locking decisions, it would be very much appreciated (at least by myself) to have an explanation of WHY it is OK in this case to do it here, but not somewhere else. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -