Re: kernel fault -current of 23/24 Apr 2015
Can't you defer attachment of your failing device?
Re: kernel fault -current of 23/24 Apr 2015
On Sat, 25 Apr 2015, Masao Uebayashi wrote: Can't you defer attachment of your failing device? That was my initial thought. But there are lots of potential failing devices. A quick scan shows that there are about 125-130 sources files which attempt to register with sysmon_{pswitch,wdog,envsys}_register. Some of those are already modularized, so we could simply refuse to attach during autoconfig, since the modules' modcmd(INIT) routines would be called later, during module initialization, and in an appropriate order to obey inter-module dependencies. But there are a lot of others which are not yet modularized, so they won't get called later during module initialization. An alternative might be to modify all of these 125-130 files to use config_defer() (or some new, similar, config_defer_module() routine) to delay the sysmon_*_register() calls until after module initialization. The big problem with either of these approaches is the scope of the work. There's a lot of work to do. And of course, the other big problem with either of these approaches is the amount of testing (including some for non-tier-1 platforms). :) I'm willing to entertain other suggestions! :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote: A quick scan shows that there are about 125-130 sources files which attempt to register with sysmon_{pswitch,wdog,envsys}_register. Since this is only about early access to the register functions, can't we just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init and sysmon_power_init, and change the functions to just ignore multiple calls? Something like: void sysmon_wdog_init(void) { static bool passed = false; if (passed) return; passed = true; mutex_init(sysmon_wdog_list_mtx, MUTEX_DEFAULT, IPL_NONE); mutex_init(sysmon_wdog_mtx, MUTEX_DEFAULT, IPL_SOFTCLOCK); ... And then add sysmon_wdog_init() calls to the *_register() functions? Martin
Re: kernel fault -current of 23/24 Apr 2015
On Sat, 25 Apr 2015, Martin Husemann wrote: On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote: A quick scan shows that there are about 125-130 sources files which attempt to register with sysmon_{pswitch,wdog,envsys}_register. Since this is only about early access to the register functions, can't we just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init and sysmon_power_init, and change the functions to just ignore multiple calls? Yes, that might work. But remember that sysmon itself is now a pseudo-device and its initialization needs to interact with the auto-config subsystem. Most of this should be handled by reverting kern/init_main.c rather than adding initialization calls to each of the ~130 callers! - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
I do not like the init_main change - the attached patch makes my system boot with a LOCKDEBUG kernel. Not sure if this is complete. Martin Index: sysmon_envsys.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v retrieving revision 1.135 diff -u -p -r1.135 sysmon_envsys.c --- sysmon_envsys.c 25 Apr 2015 02:41:42 - 1.135 +++ sysmon_envsys.c 25 Apr 2015 11:34:57 - @@ -101,6 +101,7 @@ static sme_event_drv_t * sme_add_sensor_ static void sme_initial_refresh(void *); static uint32_t sme_get_max_value(struct sysmon_envsys *, bool (*)(const envsys_data_t*), bool); +static void sme_preinit(void); MODULE(MODULE_CLASS_MISC, sysmon_envsys, sysmon,sysmon_taskq,sysmon_power); @@ -109,6 +110,20 @@ static struct sysmon_opvec sysmon_envsys NULL, NULL, NULL }; +static void +sme_preinit(void) +{ + static bool passed = false; + + if (passed) + return; + + passed = true; + LIST_INIT(sysmon_envsys_list); + mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE); + sme_propd = prop_dictionary_create(); +} + /* * sysmon_envsys_init: * @@ -119,9 +134,7 @@ sysmon_envsys_init(void) { int error; - LIST_INIT(sysmon_envsys_list); - mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE); - sme_propd = prop_dictionary_create(); + sme_preinit(); error = sysmon_attach_minor(SYSMON_MINOR_ENVSYS, sysmon_envsys_opvec); @@ -680,6 +693,8 @@ sysmon_envsys_register(struct sysmon_env KASSERT(sme != NULL); KASSERT(sme-sme_name != NULL); + sme_preinit(); + /* * Check if requested sysmon_envsys device is valid * and does not exist already in the list.
Re: kernel fault -current of 23/24 Apr 2015
The init_main change is really just reverting to the way things were prior to my diving in and breaking things. On Sat, 25 Apr 2015, Martin Husemann wrote: I do not like the init_main change - the attached patch makes my system boot with a LOCKDEBUG kernel. Not sure if this is complete. I'll take a look at the rest of your patch tomorrow morning. I'm not feeling well tonight and I can barely keep my eyes open. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
On Sat, 25 Apr 2015, Paul Goyette wrote: On Sat, 25 Apr 2015, Martin Husemann wrote: On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote: A quick scan shows that there are about 125-130 sources files which attempt to register with sysmon_{pswitch,wdog,envsys}_register. Since this is only about early access to the register functions, can't we just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init and sysmon_power_init, and change the functions to just ignore multiple calls? Yes, that might work. But remember that sysmon itself is now a pseudo-device and its initialization needs to interact with the auto-config subsystem. Most of this should be handled by reverting kern/init_main.c rather than adding initialization calls to each of the ~130 callers! Attached is a patch which should initialize sysmon sufficiently early to make it available for drivers who want to register. Compile-tested but _not_ booted or otherwise execution-tested. I would appreciate your feedback. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -Index: dev/sysmon/sysmon.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon.c,v retrieving revision 1.21 diff -u -p -r1.21 sysmon.c --- dev/sysmon/sysmon.c 24 Apr 2015 03:35:49 - 1.21 +++ dev/sysmon/sysmon.c 25 Apr 2015 08:54:42 - @@ -348,9 +348,14 @@ sysmon_init(void) #ifdef _MODULE devmajor_t bmajor, cmajor; #endif + static bool init_done = false; static struct cfdata cf; int error = 0; + if (init_done) + return 0; + init_done = true; + if (sysmon_dev != NULL) { return EEXIST; } Index: dev/sysmon/sysmon_envsys.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v retrieving revision 1.135 diff -u -p -r1.135 sysmon_envsys.c --- dev/sysmon/sysmon_envsys.c 25 Apr 2015 02:41:42 - 1.135 +++ dev/sysmon/sysmon_envsys.c 25 Apr 2015 08:54:42 - @@ -118,6 +118,11 @@ int sysmon_envsys_init(void) { int error; + static bool init_done = false; + + if (init_done) + return 0; + init_done = true; LIST_INIT(sysmon_envsys_list); mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE); Index: dev/sysmon/sysmon_power.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_power.c,v retrieving revision 1.54 diff -u -p -r1.54 sysmon_power.c --- dev/sysmon/sysmon_power.c 23 Apr 2015 23:22:03 - 1.54 +++ dev/sysmon/sysmon_power.c 25 Apr 2015 08:54:42 - @@ -209,6 +209,11 @@ int sysmon_power_init(void) { int error; + static bool init_done = false; + + if (init_done) + return 0; + init_done = true; mutex_init(sysmon_power_event_queue_mtx, MUTEX_DEFAULT, IPL_NONE); cv_init(sysmon_power_event_queue_cv, smpower); Index: dev/sysmon/sysmon_wdog.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_wdog.c,v retrieving revision 1.26 diff -u -p -r1.26 sysmon_wdog.c --- dev/sysmon/sysmon_wdog.c23 Apr 2015 23:22:03 - 1.26 +++ dev/sysmon/sysmon_wdog.c25 Apr 2015 08:54:42 - @@ -86,6 +86,11 @@ int sysmon_wdog_init(void) { int error; + static bool init_done = false; + + if (init_done) + return 0; + init_done = true; mutex_init(sysmon_wdog_list_mtx, MUTEX_DEFAULT, IPL_NONE); mutex_init(sysmon_wdog_mtx, MUTEX_DEFAULT, IPL_SOFTCLOCK); Index: kern/init_main.c === RCS file: /cvsroot/src/sys/kern/init_main.c,v retrieving revision 1.463 diff -u -p -r1.463 init_main.c --- kern/init_main.c23 Apr 2015 23:23:08 - 1.463 +++ kern/init_main.c25 Apr 2015 08:54:42 - @@ -118,6 +118,10 @@ __KERNEL_RCSID(0, $NetBSD: init_main.c, #include ksyms.h #include sysmon_taskq.h +#include sysmon_wdog.h +#include sysmon_power.h +#include sysmon_envsys.h + #include veriexec.h #include sys/param.h @@ -222,6 +226,10 @@ __KERNEL_RCSID(0, $NetBSD: init_main.c, #include dev/sysmon/sysmon_taskq.h #endif +#if NSYSMON_ENVSYS 0 || NSYSMON_POWER 0 || NSYSMON_WDOG 0 +#include dev/sysmon/sysmonvar.h +#endif + #include dev/cons.h #include net/bpf.h @@ -461,15 +469,29 @@ main(void) kqueue_init(); /* -* Initialize sysmon's task queue. It is used by at -* least one non-modularized component
Re: kernel fault -current of 23/24 Apr 2015
Fyi, running bt in the debugger yields (transcribed BY HAND): db{0} bt breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf lockdebug_abort() at netbsd:locdebug_abort+0x63 mutex_vector_enter() at netbsd:mutex_vector_enter+0x531 sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41 acpiacad_attach() at netbsd:acpiacad_attach+0xcd config_attach_loc() at netbsd:config_attach_loc+0x17a acpi_rescan() at netbsd:acpi_rescan+0x20d acpi_attach() at netbsd:acpi_attach+0x2f1 config_attach_loc() at netbsd:config_attach_loc+0x17a mainbus_attach() at netbsd:mainbus_attach+0x2a6 config_attach_loc() at netbsd:config_attach_loc+0x17a cpu_configure() atnetbsd:cpu_configure+0x26 main() at netbsd:main+0x29e On 4/24/15, bch brad.har...@gmail.com wrote: Transcribed boot msgs: Mutex error: mutex_vector_enter: locking against myself lock address : 0x811bc040 current cpu : 0 current lwp : 0x810cebc0 owner field : 0x810cebc0 wait/spin:0/0 panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 0x811bc040 cpu 0 lwp 0x810cebc0 fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8 rsp 8137fab0 curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave
kernel fault -current of 23/24 Apr 2015
Transcribed boot msgs: Mutex error: mutex_vector_enter: locking against myself lock address : 0x811bc040 current cpu : 0 current lwp : 0x810cebc0 owner field : 0x810cebc0 wait/spin:0/0 panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 0x811bc040 cpu 0 lwp 0x810cebc0 fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8 rsp 8137fab0 curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave
Re: kernel fault -current of 23/24 Apr 2015
Thanks for the back-trace. Wiz had alerted me to this problem some hours ago (in a different thread). Since this one has the most info available, let's follow through on this thread. I'm looking into this, but I don't see anything obvious yet. On Fri, 24 Apr 2015, bch wrote: Fyi, running bt in the debugger yields (transcribed BY HAND): db{0} bt breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf lockdebug_abort() at netbsd:locdebug_abort+0x63 mutex_vector_enter() at netbsd:mutex_vector_enter+0x531 sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41 acpiacad_attach() at netbsd:acpiacad_attach+0xcd config_attach_loc() at netbsd:config_attach_loc+0x17a acpi_rescan() at netbsd:acpi_rescan+0x20d acpi_attach() at netbsd:acpi_attach+0x2f1 config_attach_loc() at netbsd:config_attach_loc+0x17a mainbus_attach() at netbsd:mainbus_attach+0x2a6 config_attach_loc() at netbsd:config_attach_loc+0x17a cpu_configure() atnetbsd:cpu_configure+0x26 main() at netbsd:main+0x29e On 4/24/15, bch brad.har...@gmail.com wrote: Transcribed boot msgs: Mutex error: mutex_vector_enter: locking against myself lock address : 0x811bc040 current cpu : 0 current lwp : 0x810cebc0 owner field : 0x810cebc0 wait/spin:0/0 panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 0x811bc040 cpu 0 lwp 0x810cebc0 fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8 rsp 8137fab0 curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
On Fri, 24 Apr 2015, bch wrote: Fyi, running bt in the debugger yields (transcribed BY HAND): db{0} bt breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf lockdebug_abort() at netbsd:locdebug_abort+0x63 mutex_vector_enter() at netbsd:mutex_vector_enter+0x531 sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41 acpiacad_attach() at netbsd:acpiacad_attach+0xcd config_attach_loc() at netbsd:config_attach_loc+0x17a acpi_rescan() at netbsd:acpi_rescan+0x20d acpi_attach() at netbsd:acpi_attach+0x2f1 config_attach_loc() at netbsd:config_attach_loc+0x17a mainbus_attach() at netbsd:mainbus_attach+0x2a6 config_attach_loc() at netbsd:config_attach_loc+0x17a cpu_configure() atnetbsd:cpu_configure+0x26 main() at netbsd:main+0x29e Very odd For some reason, the sme_global_mtx seems to already be locked when sysmon_envsys_register tries to get it. Code examination shows that there is one error path where the mutex is not released, and I just committed a fix for that. It is unlikely, though, that this is why you two are having problems, as this code path would not (or, at least, should not) have changed from the modularization changes. Brad, do you have the dmesg output leading up to the panic? Wiz already provided his. On 4/24/15, bch brad.har...@gmail.com wrote: Transcribed boot msgs: Mutex error: mutex_vector_enter: locking against myself lock address : 0x811bc040 current cpu : 0 current lwp : 0x810cebc0 owner field : 0x810cebc0 wait/spin:0/0 panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 0x811bc040 cpu 0 lwp 0x810cebc0 fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8 rsp 8137fab0 curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
On Sat, 25 Apr 2015, Martin Husemann wrote: On Sat, Apr 25, 2015 at 08:59:45AM +0800, Paul Goyette wrote: For some reason, the sme_global_mtx seems to already be locked when sysmon_envsys_register tries to get it. No, it is not initialized: That is more in line with my expectations. I was surprised that the lock info from Wiz and bch both showed that the owner was curlwp. The basic problem is that the device configuration is being done long before the module subsystem is fully initialized. And we cannot simply initialize the module subsystem earlier, as many of the modules are devices or pseudodevices which require the configuration system to be available! Chicken vs egg... I've got a couple of ideas, but they'll take a few days to gestate, and then a bit more time to discuss on tech-kern before implementing. As much as I don't want to do it, I might have to revert that series of commits. And due to a current illness, even the revert might take a couple days. snip Additionally there seems to be a missing mutex_exit() in an error branch: I think I already caught that one. :) - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | -
Re: kernel fault -current of 23/24 Apr 2015
On Sat, Apr 25, 2015 at 08:59:45AM +0800, Paul Goyette wrote: For some reason, the sme_global_mtx seems to already be locked when sysmon_envsys_register tries to get it. No, it is not initialized: aibs0 at acpi0 (ASOC, ATK0110-16843024): ASUSTeK AI Booster panic: lockdebug_lookup: uninitialized lock (lock=0x811aa9a0, from=808f320f) fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 8028b8bd cs 8 rflags 246 cr2 0 ilevel 8 rsp 8135ea60 curlwp 0x810e4940 pid 0.1 lowest kstack 0x8135b2c0 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave db{0} bt breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf lockdebug_locked() at netbsd:lockdebug_locked mutex_enter() at netbsd:mutex_enter+0x43f sysmon_envsys_register() at netbsd:sysmon_envsys_register+0x41 aibs_attach() at netbsd:aibs_attach+0x19a config_attach_loc() at netbsd:config_attach_loc+0x17a acpi_rescan() at netbsd:acpi_rescan+0x20d acpi_attach() at netbsd:acpi_attach+0x2f1 config_attach_loc() at netbsd:config_attach_loc+0x17a mainbus_attach() at netbsd:mainbus_attach+0x2a6 config_attach_loc() at netbsd:config_attach_loc+0x17a cpu_configure() at netbsd:cpu_configure+0x26 main() at netbsd:main+0x29e Additionally there seems to be a missing mutex_exit() in an error branch: Index: sysmon_envsys.c === RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v retrieving revision 1.132 diff -u -p -r1.132 sysmon_envsys.c --- sysmon_envsys.c 24 Apr 2015 03:32:25 - 1.132 +++ sysmon_envsys.c 25 Apr 2015 05:28:20 - @@ -779,6 +779,7 @@ sysmon_envsys_register(struct sysmon_env mutex_enter(sme_global_mtx); if (!prop_dictionary_set(sme_propd, sme-sme_name, array)) { error = EINVAL; + mutex_exit(sme_global_mtx); DPRINTF((%s: prop_dictionary_set for '%s'\n, __func__, sme-sme_name)); goto out; Martin