Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Masao Uebayashi
Can't you defer attachment of your failing device?


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Paul Goyette

On Sat, 25 Apr 2015, Masao Uebayashi wrote:


Can't you defer attachment of your failing device?


That was my initial thought.  But there are lots of potential failing 
devices.


A quick scan shows that there are about 125-130 sources files which 
attempt to register with sysmon_{pswitch,wdog,envsys}_register.  Some of 
those are already modularized, so we could simply refuse to attach 
during autoconfig, since the modules' modcmd(INIT) routines would be 
called later, during module initialization, and in an appropriate order 
to obey inter-module dependencies.


But there are a lot of others which are not yet modularized, so they 
won't get called later during module initialization.



An alternative might be to modify all of these 125-130 files to use 
config_defer() (or some new, similar, config_defer_module() routine) to 
delay the sysmon_*_register() calls until after module initialization.



The big problem with either of these approaches is the scope of the 
work.  There's a lot of work to do.


And of course, the other big problem with either of these approaches is 
the amount of testing (including some for non-tier-1 platforms).


:)


I'm willing to entertain other suggestions!  :)


-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Martin Husemann
On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote:
 A quick scan shows that there are about 125-130 sources files which 
 attempt to register with sysmon_{pswitch,wdog,envsys}_register.

Since this is only about early access to the register functions, can't we
just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init
and sysmon_power_init, and change the functions to just ignore multiple
calls?

Something like:

void
sysmon_wdog_init(void)
{
static bool passed = false;

if (passed)
return;
passed = true;

mutex_init(sysmon_wdog_list_mtx, MUTEX_DEFAULT, IPL_NONE);
mutex_init(sysmon_wdog_mtx, MUTEX_DEFAULT, IPL_SOFTCLOCK);
...


And then add sysmon_wdog_init() calls to the *_register() functions?

Martin


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Paul Goyette



On Sat, 25 Apr 2015, Martin Husemann wrote:


On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote:

A quick scan shows that there are about 125-130 sources files which
attempt to register with sysmon_{pswitch,wdog,envsys}_register.


Since this is only about early access to the register functions, can't we
just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init
and sysmon_power_init, and change the functions to just ignore multiple
calls?


Yes, that might work.  But remember that sysmon itself is now a 
pseudo-device and its initialization needs to interact with the 
auto-config subsystem.


Most of this should be handled by reverting kern/init_main.c rather than 
adding initialization calls to each of the ~130 callers!



-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Martin Husemann
I do not like the init_main change - the attached patch makes my system boot
with a LOCKDEBUG kernel.

Not sure if this is complete.

Martin
Index: sysmon_envsys.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v
retrieving revision 1.135
diff -u -p -r1.135 sysmon_envsys.c
--- sysmon_envsys.c 25 Apr 2015 02:41:42 -  1.135
+++ sysmon_envsys.c 25 Apr 2015 11:34:57 -
@@ -101,6 +101,7 @@ static sme_event_drv_t * sme_add_sensor_
 static void sme_initial_refresh(void *);
 static uint32_t sme_get_max_value(struct sysmon_envsys *,
  bool (*)(const envsys_data_t*), bool);
+static void sme_preinit(void);
 
 MODULE(MODULE_CLASS_MISC, sysmon_envsys, sysmon,sysmon_taskq,sysmon_power);
 
@@ -109,6 +110,20 @@ static struct sysmon_opvec sysmon_envsys
 NULL, NULL, NULL
 };
 
+static void
+sme_preinit(void)
+{
+   static bool passed = false;
+
+   if (passed)
+   return;
+
+   passed = true;
+   LIST_INIT(sysmon_envsys_list);
+   mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE);
+   sme_propd = prop_dictionary_create();
+}
+
 /*
  * sysmon_envsys_init:
  *
@@ -119,9 +134,7 @@ sysmon_envsys_init(void)
 {
int error;
 
-   LIST_INIT(sysmon_envsys_list);
-   mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE);
-   sme_propd = prop_dictionary_create();
+   sme_preinit();
 
error = sysmon_attach_minor(SYSMON_MINOR_ENVSYS, sysmon_envsys_opvec);
 
@@ -680,6 +693,8 @@ sysmon_envsys_register(struct sysmon_env
KASSERT(sme != NULL);
KASSERT(sme-sme_name != NULL);
 
+   sme_preinit();
+
/*
 * Check if requested sysmon_envsys device is valid
 * and does not exist already in the list.


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Paul Goyette
The init_main change is really just reverting to the way things were 
prior to my diving in and breaking things.


On Sat, 25 Apr 2015, Martin Husemann wrote:


I do not like the init_main change - the attached patch makes my system boot
with a LOCKDEBUG kernel.

Not sure if this is complete.


I'll take a look at the rest of your patch tomorrow morning.  I'm not 
feeling well tonight and I can barely keep my eyes open.



-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-25 Thread Paul Goyette

On Sat, 25 Apr 2015, Paul Goyette wrote:




On Sat, 25 Apr 2015, Martin Husemann wrote:


On Sat, Apr 25, 2015 at 03:41:46PM +0800, Paul Goyette wrote:

A quick scan shows that there are about 125-130 sources files which
attempt to register with sysmon_{pswitch,wdog,envsys}_register.


Since this is only about early access to the register functions, can't we
just add a static boolean for each of sysmon_envsys_init, sysmon_wdog_init
and sysmon_power_init, and change the functions to just ignore multiple
calls?


Yes, that might work.  But remember that sysmon itself is now a pseudo-device 
and its initialization needs to interact with the auto-config subsystem.


Most of this should be handled by reverting kern/init_main.c rather than 
adding initialization calls to each of the ~130 callers!


Attached is a patch which should initialize sysmon sufficiently early to 
make it available for drivers who want to register.  Compile-tested but 
_not_ booted or otherwise execution-tested.  I would appreciate your 
feedback.



-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-Index: dev/sysmon/sysmon.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon.c,v
retrieving revision 1.21
diff -u -p -r1.21 sysmon.c
--- dev/sysmon/sysmon.c 24 Apr 2015 03:35:49 -  1.21
+++ dev/sysmon/sysmon.c 25 Apr 2015 08:54:42 -
@@ -348,9 +348,14 @@ sysmon_init(void)
 #ifdef _MODULE
devmajor_t bmajor, cmajor;
 #endif
+   static bool init_done = false;
static struct cfdata cf;
int error = 0;
 
+   if (init_done)
+   return 0;
+   init_done = true;
+
if (sysmon_dev != NULL) {
return EEXIST;
}
Index: dev/sysmon/sysmon_envsys.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v
retrieving revision 1.135
diff -u -p -r1.135 sysmon_envsys.c
--- dev/sysmon/sysmon_envsys.c  25 Apr 2015 02:41:42 -  1.135
+++ dev/sysmon/sysmon_envsys.c  25 Apr 2015 08:54:42 -
@@ -118,6 +118,11 @@ int
 sysmon_envsys_init(void)
 {
int error;
+   static bool init_done = false;
+
+   if (init_done)
+   return 0;
+   init_done = true;
 
LIST_INIT(sysmon_envsys_list);
mutex_init(sme_global_mtx, MUTEX_DEFAULT, IPL_NONE);
Index: dev/sysmon/sysmon_power.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_power.c,v
retrieving revision 1.54
diff -u -p -r1.54 sysmon_power.c
--- dev/sysmon/sysmon_power.c   23 Apr 2015 23:22:03 -  1.54
+++ dev/sysmon/sysmon_power.c   25 Apr 2015 08:54:42 -
@@ -209,6 +209,11 @@ int
 sysmon_power_init(void)
 {
int error;
+   static bool init_done = false;
+
+   if (init_done)
+   return 0;
+   init_done = true;
 
mutex_init(sysmon_power_event_queue_mtx, MUTEX_DEFAULT, IPL_NONE);
cv_init(sysmon_power_event_queue_cv, smpower);
Index: dev/sysmon/sysmon_wdog.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_wdog.c,v
retrieving revision 1.26
diff -u -p -r1.26 sysmon_wdog.c
--- dev/sysmon/sysmon_wdog.c23 Apr 2015 23:22:03 -  1.26
+++ dev/sysmon/sysmon_wdog.c25 Apr 2015 08:54:42 -
@@ -86,6 +86,11 @@ int
 sysmon_wdog_init(void)
 {
int error;
+   static bool init_done = false;
+
+   if (init_done)
+   return 0;
+   init_done = true;
 
mutex_init(sysmon_wdog_list_mtx, MUTEX_DEFAULT, IPL_NONE);
mutex_init(sysmon_wdog_mtx, MUTEX_DEFAULT, IPL_SOFTCLOCK);
Index: kern/init_main.c
===
RCS file: /cvsroot/src/sys/kern/init_main.c,v
retrieving revision 1.463
diff -u -p -r1.463 init_main.c
--- kern/init_main.c23 Apr 2015 23:23:08 -  1.463
+++ kern/init_main.c25 Apr 2015 08:54:42 -
@@ -118,6 +118,10 @@ __KERNEL_RCSID(0, $NetBSD: init_main.c,
 #include ksyms.h
 
 #include sysmon_taskq.h
+#include sysmon_wdog.h
+#include sysmon_power.h
+#include sysmon_envsys.h
+
 #include veriexec.h
 
 #include sys/param.h
@@ -222,6 +226,10 @@ __KERNEL_RCSID(0, $NetBSD: init_main.c,
 #include dev/sysmon/sysmon_taskq.h
 #endif
 
+#if NSYSMON_ENVSYS  0 || NSYSMON_POWER  0 || NSYSMON_WDOG  0
+#include dev/sysmon/sysmonvar.h
+#endif
+
 #include dev/cons.h
 
 #include net/bpf.h
@@ -461,15 +469,29 @@ main(void)
kqueue_init();
 
/*
-* Initialize sysmon's task queue.  It is used by at
-* least one non-modularized component 

Re: kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread bch
Fyi, running bt in the debugger yields (transcribed BY HAND):

db{0} bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_abort() at netbsd:locdebug_abort+0x63
mutex_vector_enter() at netbsd:mutex_vector_enter+0x531
sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41
acpiacad_attach() at netbsd:acpiacad_attach+0xcd
config_attach_loc() at netbsd:config_attach_loc+0x17a
acpi_rescan() at netbsd:acpi_rescan+0x20d
acpi_attach() at netbsd:acpi_attach+0x2f1
config_attach_loc() at netbsd:config_attach_loc+0x17a
mainbus_attach() at netbsd:mainbus_attach+0x2a6
config_attach_loc() at netbsd:config_attach_loc+0x17a
cpu_configure() atnetbsd:cpu_configure+0x26
main() at netbsd:main+0x29e




On 4/24/15, bch brad.har...@gmail.com wrote:
 Transcribed boot msgs:

 Mutex error: mutex_vector_enter: locking against myself

 lock address : 0x811bc040
 current cpu   :   0
 current lwp   : 0x810cebc0
 owner field   : 0x810cebc0 wait/spin:0/0

 panic: lock error: Mutex: mutex_vector_enter: locking against myself:
 lock 0x811bc040 cpu 0 lwp 0x810cebc0
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8
 rsp 8137fab0
 curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0
 Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave



kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread bch
Transcribed boot msgs:

Mutex error: mutex_vector_enter: locking against myself

lock address : 0x811bc040
current cpu   :   0
current lwp   : 0x810cebc0
owner field   : 0x810cebc0 wait/spin:0/0

panic: lock error: Mutex: mutex_vector_enter: locking against myself:
lock 0x811bc040 cpu 0 lwp 0x810cebc0
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8
rsp 8137fab0
curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0
Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave


Re: kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread Paul Goyette
Thanks for the back-trace.  Wiz had alerted me to this problem some 
hours ago (in a different thread).  Since this one has the most info 
available, let's follow through on this thread.


I'm looking into this, but I don't see anything obvious yet.


On Fri, 24 Apr 2015, bch wrote:


Fyi, running bt in the debugger yields (transcribed BY HAND):

db{0} bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_abort() at netbsd:locdebug_abort+0x63
mutex_vector_enter() at netbsd:mutex_vector_enter+0x531
sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41
acpiacad_attach() at netbsd:acpiacad_attach+0xcd
config_attach_loc() at netbsd:config_attach_loc+0x17a
acpi_rescan() at netbsd:acpi_rescan+0x20d
acpi_attach() at netbsd:acpi_attach+0x2f1
config_attach_loc() at netbsd:config_attach_loc+0x17a
mainbus_attach() at netbsd:mainbus_attach+0x2a6
config_attach_loc() at netbsd:config_attach_loc+0x17a
cpu_configure() atnetbsd:cpu_configure+0x26
main() at netbsd:main+0x29e




On 4/24/15, bch brad.har...@gmail.com wrote:

Transcribed boot msgs:

Mutex error: mutex_vector_enter: locking against myself

lock address : 0x811bc040
current cpu   :   0
current lwp   : 0x810cebc0
owner field   : 0x810cebc0 wait/spin:0/0

panic: lock error: Mutex: mutex_vector_enter: locking against myself:
lock 0x811bc040 cpu 0 lwp 0x810cebc0
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8
rsp 8137fab0
curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0
Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave





-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread Paul Goyette

On Fri, 24 Apr 2015, bch wrote:


Fyi, running bt in the debugger yields (transcribed BY HAND):

db{0} bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_abort() at netbsd:locdebug_abort+0x63
mutex_vector_enter() at netbsd:mutex_vector_enter+0x531
sysmon_evnsys_register() at netbsd:sysmon_evnsys_register+0x41
acpiacad_attach() at netbsd:acpiacad_attach+0xcd
config_attach_loc() at netbsd:config_attach_loc+0x17a
acpi_rescan() at netbsd:acpi_rescan+0x20d
acpi_attach() at netbsd:acpi_attach+0x2f1
config_attach_loc() at netbsd:config_attach_loc+0x17a
mainbus_attach() at netbsd:mainbus_attach+0x2a6
config_attach_loc() at netbsd:config_attach_loc+0x17a
cpu_configure() atnetbsd:cpu_configure+0x26
main() at netbsd:main+0x29e


Very odd

For some reason, the sme_global_mtx seems to already be locked when 
sysmon_envsys_register tries to get it.  Code examination shows that 
there is one error path where the mutex is not released, and I just 
committed a fix for that.  It is unlikely, though, that this is why you 
two are having problems, as this code path would not (or, at least, 
should not) have changed from the modularization changes.


Brad, do you have the dmesg output leading up to the panic?  Wiz already 
provided his.







On 4/24/15, bch brad.har...@gmail.com wrote:

Transcribed boot msgs:

Mutex error: mutex_vector_enter: locking against myself

lock address : 0x811bc040
current cpu   :   0
current lwp   : 0x810cebc0
owner field   : 0x810cebc0 wait/spin:0/0

panic: lock error: Mutex: mutex_vector_enter: locking against myself:
lock 0x811bc040 cpu 0 lwp 0x810cebc0
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 80289eed cs 8 rflags 246 cr2 0 ilevel 8
rsp 8137fab0
curlwp 0x810cebc0 pid 0.1 lowest kstack 0x8137c2c0
Stopped in pid 0.1 (system) at netbsd:breakpoint+0x5: leave





-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread Paul Goyette

On Sat, 25 Apr 2015, Martin Husemann wrote:


On Sat, Apr 25, 2015 at 08:59:45AM +0800, Paul Goyette wrote:

For some reason, the sme_global_mtx seems to already be locked when
sysmon_envsys_register tries to get it.


No, it is not initialized:


That is more in line with my expectations.  I was surprised that the 
lock info from Wiz and bch both showed that the owner was curlwp.


The basic problem is that the device configuration is being done long 
before the module subsystem is fully initialized.  And we cannot simply 
initialize the module subsystem earlier, as many of the modules are 
devices or pseudodevices which require the configuration system to be 
available!  Chicken vs egg...


I've got a couple of ideas, but they'll take a few days to gestate, and 
then a bit more time to discuss on tech-kern before implementing.  As 
much as I don't want to do it, I might have to revert that series of 
commits.  And due to a current illness, even the revert might take a 
couple days.



 snip

Additionally there seems to be a missing mutex_exit() in an error branch:


I think I already caught that one.  :)


-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
-


Re: kernel fault -current of 23/24 Apr 2015

2015-04-24 Thread Martin Husemann
On Sat, Apr 25, 2015 at 08:59:45AM +0800, Paul Goyette wrote:
 For some reason, the sme_global_mtx seems to already be locked when 
 sysmon_envsys_register tries to get it.

No, it is not initialized:

aibs0 at acpi0 (ASOC, ATK0110-16843024): ASUSTeK AI Booster
panic: lockdebug_lookup: uninitialized lock (lock=0x811aa9a0, 
from=808f320f)
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8028b8bd cs 8 rflags 246 cr2 0 ilevel 8 rsp 
8135ea60
curlwp 0x810e4940 pid 0.1 lowest kstack 0x8135b2c0
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave
db{0} bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_locked() at netbsd:lockdebug_locked
mutex_enter() at netbsd:mutex_enter+0x43f
sysmon_envsys_register() at netbsd:sysmon_envsys_register+0x41
aibs_attach() at netbsd:aibs_attach+0x19a
config_attach_loc() at netbsd:config_attach_loc+0x17a
acpi_rescan() at netbsd:acpi_rescan+0x20d
acpi_attach() at netbsd:acpi_attach+0x2f1
config_attach_loc() at netbsd:config_attach_loc+0x17a
mainbus_attach() at netbsd:mainbus_attach+0x2a6
config_attach_loc() at netbsd:config_attach_loc+0x17a
cpu_configure() at netbsd:cpu_configure+0x26
main() at netbsd:main+0x29e


Additionally there seems to be a missing mutex_exit() in an error branch:


Index: sysmon_envsys.c
===
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys.c,v
retrieving revision 1.132
diff -u -p -r1.132 sysmon_envsys.c
--- sysmon_envsys.c 24 Apr 2015 03:32:25 -  1.132
+++ sysmon_envsys.c 25 Apr 2015 05:28:20 -
@@ -779,6 +779,7 @@ sysmon_envsys_register(struct sysmon_env
mutex_enter(sme_global_mtx);
if (!prop_dictionary_set(sme_propd, sme-sme_name, array)) {
error = EINVAL;
+   mutex_exit(sme_global_mtx);
DPRINTF((%s: prop_dictionary_set for '%s'\n, __func__,
sme-sme_name));
goto out;



Martin