Re: Kernel panic: gpioctl list + odroid-c1
> On May 10, 2022, at 8:46 PM, Brook Milligan wrote: > > One more piece of information. > > src/sys/arch/arm/amlogic/meson8b_pinctrl.c includes the following code: > > static const struct meson_pinctrl_gpio meson8b_cbus_gpios[] = { > > … < deleted sections > … > > /* GPIODV */ > CBUS_GPIO(GPIODV_24, 6, 24, 0, 24), > CBUS_GPIO(GPIODV_25, 6, 25, 0, 25), > CBUS_GPIO(GPIODV_26, 6, 26, 0, 26), > CBUS_GPIO(GPIODV_27, 6, 27, 0, 27), > CBUS_GPIO(GPIODV_28, 6, 28, 0, 28), > CBUS_GPIO(GPIODV_29, 6, 29, 0, 29), > > It seems that GPIODV_9 does not occur in the second list; I would have > expected it to the be first entry. Is there a reason for it to be missing? > > Could this be the cause of the panic? Further along: the short answer is yes. The following patch fixes the immediate problem of the panic, although I have no idea if the data here are correct; I’m just following the pattern of the other entries. Index: meson8b_pinctrl.c === RCS file: /cvsroot/src/sys/arch/arm/amlogic/meson8b_pinctrl.c,v retrieving revision 1.2 diff -u -r1.2 meson8b_pinctrl.c --- meson8b_pinctrl.c 14 Aug 2019 09:50:20 - 1.2 +++ meson8b_pinctrl.c 11 May 2022 13:08:29 - @@ -226,6 +226,7 @@ CBUS_GPIO(GPIOY_14, 3, 14, 3, 14), /* GPIODV */ + CBUS_GPIO(GPIODV_9, 6, 9, 0, 9), CBUS_GPIO(GPIODV_24, 6, 24, 0, 24), CBUS_GPIO(GPIODV_25, 6, 25, 0, 25), CBUS_GPIO(GPIODV_26, 6, 26, 0, 26), I would appreciate confirmation that the data this patch adds to the lookup table is correct. Clearly, however, there is a hidden problem somewhere else in the code. This is a lookup table; the pin number (or potentially name) is the key. Almost certainly, the problem was a missing entry causing the lookup to not match anything. The presumably bogus information returned led to the panic. This means that somewhere else in the code is lookup logic that is not detecting the “missing key” case, which means that there are potential panics lurking in the future whenever a table like this is incomplete. Unfortunately, I have no idea where that lookup code is; ideas? Unless I hear otherwise, I will commit this patch sometime soon. In the meantime, I would appreciate feedback. Thanks a lot. Cheers, Brook
Re: Kernel panic: gpioctl list + odroid-c1
> On May 10, 2022, at 8:06 PM, Brook Milligan wrote: > > I have encountered a totally repeatable kernel panic by running "gpioctl > list” on an odroid-c1 board. > > # name -a > NetBSD armv7 9.99.96 NetBSD 9.99.96 (GENERIC) #0: Mon May 2 10:50:02 UTC > 2022 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC > evbarm > > To investigate, I added some printf() to the gpiolist() function to see what > was happening in the loop through the pins. Here is a bit of the output: > > # ./gpioctl2 gpio0 list > gpioctl.c::gpiolist() > gpioctl.c::gpiolist(): gpio_npins=71 > gpioctl.c::gpiolist(): gpio_pin 0 > 0: gp_pin=0 > 0: gp_value=1 > 0: gp_name=GPIOX_0 > gpioctl.c::gpiolist(): gpio_pin 1 > 1: gp_pin=1 > 1: gp_value=1 > 1: gp_name=GPIOX_1 > > … < lots of pin output deleted > … > > gpioctl.c::gpiolist(): gpio_pin 29 > 29: gp_pin=29 > 29: gp_value=1 > 29: gp_name=GPIOY_14 > gpioctl.c::gpiolist(): gpio_pin 30 > [ 33.9588550] panic: divide by 0 > [ 33.9588550] cpu0: Begin traceback... > [ 33.9588550] 0xbd7cdbd4: netbsd:db_panic+0x14 > [ 33.9677710] 0xbd7cdbf4: netbsd:vpanic+0x114 > [ 33.9677710] 0xbd7cdc0c: netbsd:panic+0x24 > [ 33.9761750] 0xbd7cdc2c: netbsd:__aeabi_idiv0+0x18 > [ 33.9822960] 0xbd7cdc4c: netbsd:meson_pinctrl_pin_read+0x88 > [ 33.9822960] 0xbd7cdcec: netbsd:gpioioctl+0x4f4 > [ 33.9902860] 0xbd7cdd24: netbsd:spec_ioctl+0x60 > [ 33.9902860] 0xbd7cdd54: netbsd:VOP_IOCTL+0x50 > [ 33.9991180] 0xbd7cde24: netbsd:vn_ioctl+0xd8 > [ 34.0057320] 0xbd7cdeec: netbsd:sys_ioctl+0x47c > [ 34.0057320] 0xbd7cdfac: netbsd:syscall+0x188 > [ 34.0135450] cpu0: End traceback... > Stopped in pid 214.214 (gpioctl2) atnetbsd:cpu_Debugger+0x4:bx > > r14 > db{0}> One more piece of information. src/sys/arch/arm/amlogic/meson8b_pinctrl.c includes the following code: /* * GPIO banks. The values must match those in dt-bindings/gpio/meson8b-gpio.h */ enum { … < deleted sections > … GPIODV_9 = 30, GPIODV_24, GPIODV_25, GPIODV_26, GPIODV_27, GPIODV_28, GPIODV_29, … < more deleted sections > … }; … < deleted sections > … static const struct meson_pinctrl_gpio meson8b_cbus_gpios[] = { … < deleted sections > … /* GPIODV */ CBUS_GPIO(GPIODV_24, 6, 24, 0, 24), CBUS_GPIO(GPIODV_25, 6, 25, 0, 25), CBUS_GPIO(GPIODV_26, 6, 26, 0, 26), CBUS_GPIO(GPIODV_27, 6, 27, 0, 27), CBUS_GPIO(GPIODV_28, 6, 28, 0, 28), CBUS_GPIO(GPIODV_29, 6, 29, 0, 29), It seems that GPIODV_9 does not occur in the second list; I would have expected it to the be first entry. Is there a reason for it to be missing? Could this be the cause of the panic? Cheers, Brook
Kernel panic: gpioctl list + odroid-c1
I have encountered a totally repeatable kernel panic by running "gpioctl list” on an odroid-c1 board. # name -a NetBSD armv7 9.99.96 NetBSD 9.99.96 (GENERIC) #0: Mon May 2 10:50:02 UTC 2022 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC evbarm To investigate, I added some printf() to the gpiolist() function to see what was happening in the loop through the pins. Here is a bit of the output: # ./gpioctl2 gpio0 list gpioctl.c::gpiolist() gpioctl.c::gpiolist(): gpio_npins=71 gpioctl.c::gpiolist(): gpio_pin 0 0: gp_pin=0 0: gp_value=1 0: gp_name=GPIOX_0 gpioctl.c::gpiolist(): gpio_pin 1 1: gp_pin=1 1: gp_value=1 1: gp_name=GPIOX_1 … < lots of pin output deleted > … gpioctl.c::gpiolist(): gpio_pin 29 29: gp_pin=29 29: gp_value=1 29: gp_name=GPIOY_14 gpioctl.c::gpiolist(): gpio_pin 30 [ 33.9588550] panic: divide by 0 [ 33.9588550] cpu0: Begin traceback... [ 33.9588550] 0xbd7cdbd4: netbsd:db_panic+0x14 [ 33.9677710] 0xbd7cdbf4: netbsd:vpanic+0x114 [ 33.9677710] 0xbd7cdc0c: netbsd:panic+0x24 [ 33.9761750] 0xbd7cdc2c: netbsd:__aeabi_idiv0+0x18 [ 33.9822960] 0xbd7cdc4c: netbsd:meson_pinctrl_pin_read+0x88 [ 33.9822960] 0xbd7cdcec: netbsd:gpioioctl+0x4f4 [ 33.9902860] 0xbd7cdd24: netbsd:spec_ioctl+0x60 [ 33.9902860] 0xbd7cdd54: netbsd:VOP_IOCTL+0x50 [ 33.9991180] 0xbd7cde24: netbsd:vn_ioctl+0xd8 [ 34.0057320] 0xbd7cdeec: netbsd:sys_ioctl+0x47c [ 34.0057320] 0xbd7cdfac: netbsd:syscall+0x188 [ 34.0135450] cpu0: End traceback... Stopped in pid 214.214 (gpioctl2) atnetbsd:cpu_Debugger+0x4:bx r14 db{0}> I’m guessing this is a device tree problem, given the reference to meson_pinctl_pin_read(), but I have no idea how the kernel data structure is created or what to do about this. For reference, u-boot loads the following device tree before booting the kernel: meson8b-odroidc1.dtb. Any thoughts would be greatly appreciated. Thanks a lot. Cheers, Brook
Re: kernel panic in NetBSD-9.1-amd64-install.img (exiting unheld spin mutex)
r...@reedmedia.net ("Jeremy C. Reed") writes: >panic: lock error: Mutex error: mutex_vector_exit,742: exiting unheld >spin mutex: lock 0x8699588015c0 cpu 0 lwp 0xff... (my photo was >cropped) Index: athn.c === RCS file: /cvsroot/src/sys/dev/ic/athn.c,v retrieving revision 1.23 diff -p -u -r1.23 athn.c --- athn.c 29 Jan 2020 14:09:58 - 1.23 +++ athn.c 15 Nov 2020 07:04:38 - @@ -2734,7 +2734,7 @@ athn_set_multi(struct athn_softc *sc) if ((ifp->if_flags & (IFF_ALLMULTI | IFF_PROMISC)) != 0) { lo = hi = 0x; - goto done; + goto done2; } lo = hi = 0; ETHER_LOCK(ec); @@ -2760,6 +2760,7 @@ athn_set_multi(struct athn_softc *sc) } done: ETHER_UNLOCK(ec); + done2: AR_WRITE(sc, AR_MCAST_FIL0, lo); AR_WRITE(sc, AR_MCAST_FIL1, hi); AR_WRITE_BARRIER(sc); -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
kernel panic in NetBSD-9.1-amd64-install.img (exiting unheld spin mutex)
I booted NetBSD-9.1-amd64-install.img I didn't install. For about 30 minutes I attempted to get my athn device working as an access point plus using dhcpd and hostapd (they are on the installer system which I didn't know before). My android phone could get to a "connecting ..." state but never connected to it. I ran multiple times: ifconfig athn0 inet 172.16.1.1 media autoselect mediaopt hostap chan 5 nwkey (in between using dhcpd and hostap and tcpdump) I had dhcpd and tcpdump running in background the final time I got a kernel panic Mutex error: mutex_vector_exit,742: exiting unheld spin mutex ... panic: lock error: Mutex error: mutex_vector_exit,742: exiting unheld spin mutex: lock 0x8699588015c0 cpu 0 lwp 0xff... (my photo was cropped) ... vpanic() at netbsd:vpanic+0x160 snprintf() at netbsd:snprintf ... athn_ioctl() at netbsd:athn_ioctl+0x18b if_mcast_op() at netbsd:if_mcast_op+0x4b (sorry I don't type all in) in_delmulti in_scrubaddr in_purgeaddr in_control0 udp_ioctl_wrapper compat_ifioctl doifioctl sys_ioctl syscall ... --- syscall (number 54) --- 74df3936822a: cpu0: End traceback... I can type in the rest if needed.
kernel panic in genfs_deadunlock
hi folks, while testing a very recent kernel, and waiting for it to reboot, I got this: Crash version 8.99.37, image version 8.99.37. System panicked: lock error: Reader / writer lock: rw_vector_exit,454: assertion failed: RW_COUNT(rw) != 0: lock 0xed5bd50116b0 cpu 3 lwp 0xed5f70a20ae0 Backtrace from time of crash is available. crash> bt _KERNEL_OPT_NARCNET() at 0 ostype() at ostype+0xb7290 vpanic() at vpanic+0x169 snprintf() at snprintf lockdebug_abort() at lockdebug_abort+0xe7 rw_vector_exit() at rw_vector_exit+0xce genfs_deadunlock() at genfs_deadunlock+0x14 VOP_UNLOCK() at VOP_UNLOCK+0x51 cnclose() at cnclose+0x7e cdev_close() at cdev_close+0xbc spec_close() at spec_close+0x199 VOP_CLOSE() at VOP_CLOSE+0x4c vn_close() at vn_close+0x34 closef() at closef+0x6d fd_close() at fd_close+0x1f4 sys_close() at sys_close+0x20 syscall() at syscall+0x173 --- syscall (number 6) --- 755abfa42bea: If it helps anyone. I have no idea :-)
Re: Recent NetBSD/amd64 7.99.54 kernel panic
Hi, Sorry. This is as same as PR kern/51767. Thank you. From: Ryo ONODERA, Date: Wed, 04 Jan 2017 02:40:14 +0900 (JST) > Hi, > > Recent NetBSD/amd64 kernel panics with the following message > (manually transcripted). > Could anyone investigate this? > Thank you. > > stack overflow detected: terminated > fatal breakpoint trap in supervisor mode > trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 > ilevel 4 rsp fe810e688a70 > curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0 > Stopped in pid 0.3 (system) at netbsd:breakpoint+0x05: leave > db{0}> bt > breakpoint() at netbsd:breakpoint+0x05 > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > ssp_init() at netbsd:ssp_init > tcp_output() at netbsd:tcp_output+0x246e > tcp_input() at netbsd:tcp_input+0x111e > tcp6_input() at netbsd:tcp6_input+0x49 > ip6_input() at netbsd:ip6_input+0x724 > ip6intr() at netbsd:ip6intr+0x71 > softint_dispatch() at netbsd:softint_dispatch+0xda > db{0}> > > -- > Ryo ONODERA // ryo...@yk.rim.or.jp > PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
Re: Recent NetBSD/amd64 7.99.54 kernel panic
On 03.01.2017 18:40, Ryo ONODERA wrote: > Hi, > > Recent NetBSD/amd64 kernel panics with the following message > (manually transcripted). > Could anyone investigate this? > Thank you. > > stack overflow detected: terminated > fatal breakpoint trap in supervisor mode > trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 > ilevel 4 rsp fe810e688a70 > curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0 > Stopped in pid 0.3 (system) at netbsd:breakpoint+0x05: leave > db{0}> bt > breakpoint() at netbsd:breakpoint+0x05 > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > ssp_init() at netbsd:ssp_init > tcp_output() at netbsd:tcp_output+0x246e > tcp_input() at netbsd:tcp_input+0x111e > tcp6_input() at netbsd:tcp6_input+0x49 > ip6_input() at netbsd:ip6_input+0x724 > ip6intr() at netbsd:ip6intr+0x71 > softint_dispatch() at netbsd:softint_dispatch+0xda > db{0}> > > -- > Ryo ONODERA // ryo...@yk.rim.or.jp > PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3 > I've just reproduced it locally: panic: stack overflow detected; terminated cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x140 snprintf() at netbsd:snprintf ssp_init() at netbsd:ssp_init tcp_output() at netbsd:tcp_output+0x246e tcp_input() at netbsd:tcp_input+0x111e ipintr() at netbsd:ipintr+0xa46 softint_dispatch() at netbsd:softint_dispatch+0xda DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe813a2c7ff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt -- It happened when fetching pkgsrc distfiles. signature.asc Description: OpenPGP digital signature
Recent NetBSD/amd64 7.99.54 kernel panic
Hi, Recent NetBSD/amd64 kernel panics with the following message (manually transcripted). Could anyone investigate this? Thank you. stack overflow detected: terminated fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 ilevel 4 rsp fe810e688a70 curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0 Stopped in pid 0.3 (system) at netbsd:breakpoint+0x05: leave db{0}> bt breakpoint() at netbsd:breakpoint+0x05 vpanic() at netbsd:vpanic+0x140 snprintf() at netbsd:snprintf ssp_init() at netbsd:ssp_init tcp_output() at netbsd:tcp_output+0x246e tcp_input() at netbsd:tcp_input+0x111e tcp6_input() at netbsd:tcp6_input+0x49 ip6_input() at netbsd:ip6_input+0x724 ip6intr() at netbsd:ip6intr+0x71 softint_dispatch() at netbsd:softint_dispatch+0xda db{0}> -- Ryo ONODERA // ryo...@yk.rim.or.jp PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB FD1B F404 27FA C7D1 15F3
Re: kernel panic
On Sun, Jun 19, 2016 at 9:23 PM, Michael van Elstwrote: > brad.har...@gmail.com (bch) writes: > >>kernel (adjusted from GENNERIC to allow dtrace support) from latest src >>panics: > >>(transcription): > >>reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, >>struct ieee80211_node *) == NULL)" failed: file >>"/usr/src/sys/80211/ieee80211_output.c", line 1347 > > > That assertion seems to be bogus. It checks a field in an mbuf > that was just allocated in ieee80211_getmgtframe using m_getcl > and that may contain random data in the ctx pointer. Indeed. > > Another similar assertion in the same file is #ifdef __FreeBSD__. > > Looking at the current FreeBSD code, it still abuses the rcvif > pointer for local data. But there are no such assertions, which > would be bogus in FreeBSD either. Thanks. I think we can remove the assertion(s) safely. (I'm not sure why the assertion hadn't failed ever. I guess my changes broke some implicit zeroing rcvif somewhere.) ozaki-r
Re: kernel panic
On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA"wrote: > > Hi, > > On 2016/06/16 8:15, bch wrote: > > I am now at 1.414, and it seems stable. > > Thank you for your checking and reporting. My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some luck macro-ization, rejigging)? Can anybody point me to the commits that apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)? > If it seems there is still > problems, please tell us. Will do. I'd like to have the commit(s) identified and re-witness/characterize the issue. Otherwise, things currently seem stable. Thanks. > > Thanks, > > -- > // > Internet Initiative Japan Inc. > > Device Engineering Section, > IoT Platform Development Department, > Network Division, > Technology Unit > > Kengo NAKAHARA
Re: kernel panic
brad.har...@gmail.com (bch) writes: >kernel (adjusted from GENNERIC to allow dtrace support) from latest src panics: >(transcription): >reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, >struct ieee80211_node *) == NULL)" failed: file >"/usr/src/sys/80211/ieee80211_output.c", line 1347 That assertion seems to be bogus. It checks a field in an mbuf that was just allocated in ieee80211_getmgtframe using m_getcl and that may contain random data in the ctx pointer. Another similar assertion in the same file is #ifdef __FreeBSD__. Looking at the current FreeBSD code, it still abuses the rcvif pointer for local data. But there are no such assertions, which would be bogus in FreeBSD either. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: kernel panic
On Thu, Jun 16, 2016 at 3:04 PM, Ryota Ozakiwrote: > On Thu, Jun 16, 2016 at 1:56 PM, bch wrote: >> >> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA" wrote: >>> >>> Hi, >>> >>> On 2016/06/16 8:15, bch wrote: >>> > I am now at 1.414, and it seems stable. >>> >>> Thank you for your checking and reporting. >> >> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some >> luck macro-ization, rejigging)? > > Not related. > >> Can anybody point me to the commits that >> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)? > > For iwm: > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h > > Commit 1.164 broke iwm (and I guess all other wifi drivers) > and commit 1.165 fixed it. > > For wm: > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c > > Commit 1.413 broke wm and commit 1.414 fixed it. > >> >>> If it seems there is still >>> problems, please tell us. >> >> Will do. I'd like to have the commit(s) identified and >> re-witness/characterize the issue. Otherwise, things currently seem stable. >> Thanks. > > [Timeline] > > - Jun 10 13:31:45: mbuf.h r1.164 > - Jun 11 ??:??:??: you encountered the first panic > - Jun 12 10:14:12: mbuf.h r1.165 oops > - Jun 14 09:07:22: if_wm.c r1.164 ^^ r1.413 > - Jun 14 ??:??:??: you encountered the second panic > - Jun 14 17:09:20: if_wm.c r1.165 ^^ r1.414 > - Jun 16 ??:??:??: you are here > > And I noticed that I forgot to bump the kernel version; my mbuf.h > change required it. (I already bumped.) If you run a kernel between > my mbuf.h change and the bump with network device driver modules > of 7.99.30, something bad will happen. (I guess the issues you saw > aren't related to this though.) > > Thanks, > ozaki-r
Re: kernel panic
On Thu, Jun 16, 2016 at 1:56 PM, bchwrote: > > On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA" wrote: >> >> Hi, >> >> On 2016/06/16 8:15, bch wrote: >> > I am now at 1.414, and it seems stable. >> >> Thank you for your checking and reporting. > > My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some > luck macro-ization, rejigging)? Not related. > Can anybody point me to the commits that > apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)? For iwm: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h Commit 1.164 broke iwm (and I guess all other wifi drivers) and commit 1.165 fixed it. For wm: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c Commit 1.413 broke wm and commit 1.414 fixed it. > >> If it seems there is still >> problems, please tell us. > > Will do. I'd like to have the commit(s) identified and > re-witness/characterize the issue. Otherwise, things currently seem stable. > Thanks. [Timeline] - Jun 10 13:31:45: mbuf.h r1.164 - Jun 11 ??:??:??: you encountered the first panic - Jun 12 10:14:12: mbuf.h r1.165 - Jun 14 09:07:22: if_wm.c r1.164 - Jun 14 ??:??:??: you encountered the second panic - Jun 14 17:09:20: if_wm.c r1.165 - Jun 16 ??:??:??: you are here And I noticed that I forgot to bump the kernel version; my mbuf.h change required it. (I already bumped.) If you run a kernel between my mbuf.h change and the bump with network device driver modules of 7.99.30, something bad will happen. (I guess the issues you saw aren't related to this though.) Thanks, ozaki-r
Re: kernel panic
I am now at 1.414, and it seems stable. On Jun 15, 2016 4:04 PM, "Kengo NAKAHARA"wrote: > Hi, > > On 2016/06/16 1:44, bch wrote: > > On 6/12/16, bch wrote: > >> On 6/11/16, bch wrote: > snip > > And now, on wm(4): > > -rwxr-xr-x 1 root wheel 18218304 Jun 14 10:20 /netbsd > > > > strathcona# crash -M ./netbsd.8.core /netbsd > > Crash version 7.99.30, image version /amd64/compile/G. > > WARNING: versions differ, you may not be able to examine this image. > > System panicked: trap > > Backtrace from time of crash is available. > > crash> bt > > _KERNEL_OPT_NARCNET() at 0 > > _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5 > > aprint_verbose() at aprint_verbose+0x2f > > aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14 > > trap() at trap+0xc4b > > --- trap (number 6) --- > > mutex_enter() at mutex_enter+0xc > > fddi_output() at fddi_output+0x47c > > wm_tick() at wm_tick+0x230 > > in6_update_ifa1() at in6_update_ifa1+0x766 > > in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a > > in6_control1() at in6_control1+0x521 > > in6_control() at in6_control+0x10d > > udp6_connect_wrapper() at udp6_connect_wrapper+0x83 > > compat_43_sa_put() at compat_43_sa_put+0x14 > > if_flags_set() at if_flags_set+0xb5 > > sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d > > handle_modctl_load() at handle_modctl_load+0x108 > > syscall() at syscall+0x14b > > --- syscall (number 54) --- > > 7f7ff74e89fa: > > crash> > > May your if_wm.c ident be r1.413? If so, could you try r1.414? > > > Thanks, > > -- > // > Internet Initiative Japan Inc. > > Device Engineering Section, > IoT Platform Development Department, > Network Division, > Technology Unit > > Kengo NAKAHARA >
Re: kernel panic
Hi, On 2016/06/16 1:44, bch wrote: > On 6/12/16, bchwrote: >> On 6/11/16, bch wrote: snip > And now, on wm(4): > -rwxr-xr-x 1 root wheel 18218304 Jun 14 10:20 /netbsd > > strathcona# crash -M ./netbsd.8.core /netbsd > Crash version 7.99.30, image version /amd64/compile/G. > WARNING: versions differ, you may not be able to examine this image. > System panicked: trap > Backtrace from time of crash is available. > crash> bt > _KERNEL_OPT_NARCNET() at 0 > _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5 > aprint_verbose() at aprint_verbose+0x2f > aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14 > trap() at trap+0xc4b > --- trap (number 6) --- > mutex_enter() at mutex_enter+0xc > fddi_output() at fddi_output+0x47c > wm_tick() at wm_tick+0x230 > in6_update_ifa1() at in6_update_ifa1+0x766 > in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a > in6_control1() at in6_control1+0x521 > in6_control() at in6_control+0x10d > udp6_connect_wrapper() at udp6_connect_wrapper+0x83 > compat_43_sa_put() at compat_43_sa_put+0x14 > if_flags_set() at if_flags_set+0xb5 > sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d > handle_modctl_load() at handle_modctl_load+0x108 > syscall() at syscall+0x14b > --- syscall (number 54) --- > 7f7ff74e89fa: > crash> May your if_wm.c ident be r1.413? If so, could you try r1.414? Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, IoT Platform Development Department, Network Division, Technology Unit Kengo NAKAHARA
Re: kernel panic
On 6/12/16, bchwrote: > On 6/11/16, bch wrote: > previously reported bt on core from iwm(4) crash... > strathcona# crash -M ./netbsd.6.core > Crash version 7.99.30, image version /amd64/compile/G. > WARNING: versions differ, you may not be able to examine this image. > System panicked: kernel diagnostic assertion "M_GETCTX(m, struct > ieee80211_node *) == NULL" failed: file > "/usr/src/sys/net80211/ieee80211_output.c", line 1347 > Backtrace from time of crash is available. > crash> bt > _KERNEL_OPT_NARCNET() at 0 > ?() at fe810f8b7c00 > aprint_error() at aprint_error+0xe > tcp_reass() at tcp_reass+0x2dc > ieee80211_send_probereq() at ieee80211_send_probereq+0xc0 > ieee80211_match_bss() at ieee80211_match_bss+0x2b8 > ieee80211_newstate() at ieee80211_newstate+0xb1 > iwm_newstate_cb() at iwm_newstate_cb+0x11d > xc_init_cpu() at xc_init_cpu+0x13a And now, on wm(4): -rwxr-xr-x 1 root wheel 18218304 Jun 14 10:20 /netbsd strathcona# crash -M ./netbsd.8.core /netbsd Crash version 7.99.30, image version /amd64/compile/G. WARNING: versions differ, you may not be able to examine this image. System panicked: trap Backtrace from time of crash is available. crash> bt _KERNEL_OPT_NARCNET() at 0 _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5 aprint_verbose() at aprint_verbose+0x2f aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14 trap() at trap+0xc4b --- trap (number 6) --- mutex_enter() at mutex_enter+0xc fddi_output() at fddi_output+0x47c wm_tick() at wm_tick+0x230 in6_update_ifa1() at in6_update_ifa1+0x766 in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a in6_control1() at in6_control1+0x521 in6_control() at in6_control+0x10d udp6_connect_wrapper() at udp6_connect_wrapper+0x83 compat_43_sa_put() at compat_43_sa_put+0x14 if_flags_set() at if_flags_set+0xb5 sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d handle_modctl_load() at handle_modctl_load+0x108 syscall() at syscall+0x14b --- syscall (number 54) --- 7f7ff74e89fa: crash> >> On Jun 11, 2016 2:01 AM, "Ryota Ozaki" wrote: >> >>> Hi, >>> >>> On Sat, Jun 11, 2016 at 3:58 AM, bch wrote: >>> > kernel (adjusted from GENNERIC to allow dtrace support) from latest >>> > src >>> panics: >>> > >>> > (transcription): >>> > >>> > reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, >>> > struct ieee80211_node *) == NULL)" failed: file >>> > "/usr/src/sys/80211/ieee80211_output.c", line 1347 >>> >>> Can you show me a backtrace? >>> >>> And let me know the latest version (date) of the kernel that worked for >>> you. >>> >>> ozaki-r >>> >> >
Re: kernel panic
Hi, On Sat, Jun 11, 2016 at 3:58 AM, bchwrote: > kernel (adjusted from GENNERIC to allow dtrace support) from latest src > panics: > > (transcription): > > reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, > struct ieee80211_node *) == NULL)" failed: file > "/usr/src/sys/80211/ieee80211_output.c", line 1347 Can you show me a backtrace? And let me know the latest version (date) of the kernel that worked for you. ozaki-r
kernel panic
kernel (adjusted from GENNERIC to allow dtrace support) from latest src panics: (transcription): reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, struct ieee80211_node *) == NULL)" failed: file "/usr/src/sys/80211/ieee80211_output.c", line 1347
Re: amd64-7.99.29 - Another kernel panic - ffs?
On 29/05/2016 17:43, Robert Swindells wrote: > One thing that could help would be if you could make an image of the > CF card after the panic has happened. > > The filesystem that you were creating is probably a fair bit smaller > than the ones where other people had the same problem. I baked a new hpcarm release from HEAD this evening, going to give it a try now and report back. Sevan
Re: amd64-7.99.29 - Another kernel panic - ffs?
Sevan Janiyanwrote: >On 29/05/2016 11:34, Paul Goyette wrote: >> Hmmm. Sevan opened PR port-hpcarm/50840 but perhaps we should >> recategorize the PR? > >Done. I'm still running a prebuilt image which Jun published back in >February but happy to do some test if that's required. One thing that could help would be if you could make an image of the CF card after the panic has happened. The filesystem that you were creating is probably a fair bit smaller than the ones where other people had the same problem. Robert Swindells
Re: amd64-7.99.29 - Another kernel panic - ffs?
On 29/05/2016 11:34, Paul Goyette wrote: > Hmmm. Sevan opened PR port-hpcarm/50840 but perhaps we should > recategorize the PR? Done. I'm still running a prebuilt image which Jun published back in February but happy to do some test if that's required. Sevan
Re: amd64-7.99.29 - Another kernel panic - ffs?
On Sun, 29 May 2016, Robert Swindells wrote: Paul Goyettewrote: Well, today I just had another crash, this time in ffs_newvnode(). The traceback (manually transcribed) is: [snip] Was the panic message "ffs_init_vnode: dup alloc" ? I missed copying down the panic message, but the backtrace seems to think we were in the printf() calls leading up to that panic message. I had this on a filesystem a couple of months ago, I was using wapbl but didn't have QUOTA or QUOTA2 in the kernel, I confess I just copied everything off and ran newfs(8) on it. Well the only thing that seems to have been trashed is one subdirectory which was being cvs updated, and that is easily recovered. So, no loss of data, just the inconvenience of having to reboot and clean up. There has also been a fairly recent report of it happening when installing NetBSD/hpcarm [1]. [1] https://mail-index.netbsd.org/port-hpcarm/2016/02/27/msg000196.html Hmmm. Sevan opened PR port-hpcarm/50840 but perhaps we should recategorize the PR? +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++
Re: NetBSD-current/i386 install kernel panic (sysv_ipc related)
(cc-ing current-users as a heads-up) Yes, I got another report of this as well. I am looking into it and will fix as quickly as possible. On Fri, 4 Dec 2015, Andreas Gustafsson wrote: Hi Paul, NetBSD-current/i386 panics during the install since yesterday. Since the panic message mentions sysv_ipc and you made some commits in that area between the last successful install and the first unsuccessful one, I'm reporting this to you :) The panic message is: cd0 at atapibus0 drive 1: cdrom removable wd0 at atabus0 drive 0 wd0: wd0: 1024 MB, 2080 cyl, 16 head, 63 sec, 512 bytes/sect x 2097152 sectors syscall 171 is busy WARNING: module error: builtin module `sysv_ipc' failed to init, error 16 panic: kernel diagnostic assertion "sysvipc_listener == NULL" failed: file "/tmp/bracket/build/2015.12.03.03.03.58-i386/src/sys/kern/sysv_ipc.c", line 365 fatal breakpoint trap in supervisor mode trap type 1 code 0 eip c010e424 cs 8 eflags 246 cr2 0 ilevel 0 esp c13e8e98 curlwp 0xc1367ba0 pid 0 lid 1 lowest kstack 0xc13e62c0 Stopped in pid 0.1 (system) at c010e424: popl%ebp db{0}> More logs are at: http://releng.netbsd.org/b5reports/i386/commits-2015.12.html#2015.12.03.02.57.47 -- Andreas Gustafsson, g...@gson.org +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++
Re: NetBSD-current/i386 install kernel panic (sysv_ipc related)
This should be fixed now, although I am still testing a few more combinations. On Sat, 5 Dec 2015, Paul Goyette wrote: (cc-ing current-users as a heads-up) Yes, I got another report of this as well. I am looking into it and will fix as quickly as possible. On Fri, 4 Dec 2015, Andreas Gustafsson wrote: Hi Paul, NetBSD-current/i386 panics during the install since yesterday. Since the panic message mentions sysv_ipc and you made some commits in that area between the last successful install and the first unsuccessful one, I'm reporting this to you :) The panic message is: cd0 at atapibus0 drive 1: cdrom removable wd0 at atabus0 drive 0 wd0: wd0: 1024 MB, 2080 cyl, 16 head, 63 sec, 512 bytes/sect x 2097152 sectors syscall 171 is busy WARNING: module error: builtin module `sysv_ipc' failed to init, error 16 panic: kernel diagnostic assertion "sysvipc_listener == NULL" failed: file "/tmp/bracket/build/2015.12.03.03.03.58-i386/src/sys/kern/sysv_ipc.c", line 365 fatal breakpoint trap in supervisor mode trap type 1 code 0 eip c010e424 cs 8 eflags 246 cr2 0 ilevel 0 esp c13e8e98 curlwp 0xc1367ba0 pid 0 lid 1 lowest kstack 0xc13e62c0 Stopped in pid 0.1 (system) at c010e424: popl%ebp db{0}> More logs are at: http://releng.netbsd.org/b5reports/i386/commits-2015.12.html#2015.12.03.02.57.47 -- Andreas Gustafsson, g...@gson.org +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++ +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | +--+--++
Re: Kernel panic from network traffic
Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Kernel panic from network traffic
Hi, I just fixed one bug related to refcnt. The fix may shut up the panic. Could you try again with a latest kernel? Thanks, ozaki-r On Fri, Jul 24, 2015 at 3:38 PM, Ryota Ozaki ozak...@netbsd.org wrote: On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote: Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Hmm, I cannot reproduce it. Could you tell me the kernel config, network setups and apps running on the box? Thanks, ozaki-r Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Kernel panic from network traffic
On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote: Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Hmm, I cannot reproduce it. Could you tell me the kernel config, network setups and apps running on the box? Thanks, ozaki-r Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Kernel panic from network traffic
On Thu, Jul 23, 2015 at 10:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. Hello fellow moron :) I have a general question: I see some comments around unifying route caches, but in this particular case it seems related to ipv6. Is this an ipv6 problem or a general problem? I have a -current machine and it's not likely to encounter this particular scenario (sorry, heh), but wondering anyway. Thanks! Andy
Re: Kernel panic from network traffic
On Fri, 24 Jul 2015, Andy Ruhl wrote: On Thu, Jul 23, 2015 at 10:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. Hello fellow moron :) I have a general question: I see some comments around unifying route caches, but in this particular case it seems related to ipv6. Is this an ipv6 problem or a general problem? I have a -current machine and it's not likely to encounter this particular scenario (sorry, heh), but wondering anyway. I'm not sure of what kind of flood of traffic was seen at the -current box (tcpdump wasn't being helpful) but I kind of doubt it was IPv6 only. But who knows? My guess is that there is IPv6 routing traffic mixed in with a whole lot of garbage from my switch. -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Kernel panic from network traffic
Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Kernel panic when entering ACPI sleep state S3
Hi all, I'm running the latest snapshot from nyftp.netbsd.org (201504151050Z) on a Thinkpad X120e. I seem to be encountering some issues when attempting to enter ACPI sleep state S3, using: sysctl -w hw.acpi.sleep.state=3 Upon invoking the above command, my system seems to attempt to sleep (blanks the screen, the speakers click) but then halts and reboots. Figured I'd report this, and would appreciate any input. Please see the relevant output from /var/log/messages, and a dmesg following this. /var/log/messages: -- Apr 15 19:47:40 bmo /netbsd: acpi0: entering state S3 Apr 15 19:48:29 bmo syslogd[684]: restart Apr 15 19:48:29 bmo /netbsd: panic: kernel diagnostic assertion (bo-mem.bus.base (PAGE_SIZE - 1)) == 0 failed: file /home/source/ab/HEAD/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c, line 1618 bo bus base addr not page-aligned: fe82125c69b0 Apr 15 19:48:29 bmo /netbsd: cpu0: Begin traceback... Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x13c Apr 15 19:48:29 bmo /netbsd: kern_assert() at netbsd:kern_assert+0x4f Apr 15 19:48:29 bmo /netbsd: ttm_bo_unmap_virtual_locked() at netbsd:ttm_bo_unmap_virtual_locked+0x17b Apr 15 19:48:29 bmo /netbsd: ttm_bo_handle_move_mem() at netbsd:ttm_bo_handle_move_mem+0x22f Apr 15 19:48:29 bmo /netbsd: ttm_mem_evict_first() at netbsd:ttm_mem_evict_first+0x4e0 Apr 15 19:48:29 bmo /netbsd: ttm_bo_force_list_clean() at netbsd:ttm_bo_force_list_clean+0x5a Apr 15 19:48:29 bmo /netbsd: radeon_suspend_kms() at netbsd:radeon_suspend_kms+0x13f Apr 15 19:48:29 bmo /netbsd: radeon_do_suspend() at netbsd:radeon_do_suspend+0x21 Apr 15 19:48:29 bmo /netbsd: device_pmf_driver_suspend() at netbsd:device_pmf_driver_suspend+0x35 Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend_locked() at netbsd:pmf_device_suspend_locked+0xe3 Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend() at netbsd:pmf_device_suspend+0x41 Apr 15 19:48:29 bmo /netbsd: pmf_system_suspend() at netbsd:pmf_system_suspend+0xc1 Apr 15 19:48:29 bmo /netbsd: acpi_enter_sleep_state() at netbsd:acpi_enter_sleep_state+0x115 Apr 15 19:48:29 bmo /netbsd: sysctl_hw_acpi_sleepstate() at netbsd:sysctl_hw_acpi_sleepstate+0xfe Apr 15 19:48:29 bmo /netbsd: sysctl_dispatch() at netbsd:sysctl_dispatch+0xc4 Apr 15 19:48:29 bmo /netbsd: sys___sysctl() at netbsd:sys___sysctl+0xd0 Apr 15 19:48:29 bmo /netbsd: syscall() at netbsd:syscall+0x9c Apr 15 19:48:29 bmo /netbsd: --- syscall (number 202) --- Apr 15 19:48:29 bmo /netbsd: 7f7ff7501d3a: Apr 15 19:48:29 bmo /netbsd: cpu0: End traceback... Apr 15 19:48:29 bmo /netbsd: Apr 15 19:48:29 bmo /netbsd: dumping to dev 0,1 (offset=3496, size=1992749): Apr 15 19:48:29 bmo /netbsd: dump Skipping crash dump on recursive panic Apr 15 19:48:29 bmo /netbsd: panic: wddump: polled command has been queued Apr 15 19:48:29 bmo /netbsd: cpu0: Begin traceback... Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x13c Apr 15 19:48:29 bmo /netbsd: snprintf() at netbsd:snprintf Apr 15 19:48:29 bmo /netbsd: wddump() at netbsd:wddump+0x282 Apr 15 19:48:29 bmo /netbsd: dump_header_flush() at netbsd:dump_header_flush+0x4f Apr 15 19:48:29 bmo /netbsd: dump_header_addbytes() at netbsd:dump_header_addbytes+0x46 Apr 15 19:48:29 bmo /netbsd: dump_header_addseg() at netbsd:dump_header_addseg+0x1e Apr 15 19:48:29 bmo /netbsd: dump_seg_iter() at netbsd:dump_seg_iter+0xce Apr 15 19:48:29 bmo /netbsd: cpu_dump() at netbsd:cpu_dump+0x6a Apr 15 19:48:29 bmo /netbsd: dodumpsys() at netbsd:dodumpsys+0xfb Apr 15 19:48:29 bmo /netbsd: dumpsys() at netbsd:dumpsys+0x1d Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x145 Apr 15 19:48:29 bmo /netbsd: kern_assert() at netbsd:kern_assert+0x4f Apr 15 19:48:29 bmo /netbsd: ttm_bo_unmap_virtual_locked() at netbsd:ttm_bo_unmap_virtual_locked+0x17b Apr 15 19:48:29 bmo /netbsd: ttm_bo_handle_move_mem() at netbsd:ttm_bo_handle_move_mem+0x22f Apr 15 19:48:29 bmo /netbsd: ttm_mem_evict_first() at netbsd:ttm_mem_evict_first+0x4e0 Apr 15 19:48:29 bmo /netbsd: ttm_bo_force_list_clean() at netbsd:ttm_bo_force_list_clean+0x5a Apr 15 19:48:29 bmo /netbsd: radeon_suspend_kms() at netbsd:radeon_suspend_kms+0x13f Apr 15 19:48:29 bmo /netbsd: radeon_do_suspend() at netbsd:radeon_do_suspend+0x21 Apr 15 19:48:29 bmo /netbsd: device_pmf_driver_suspend() at netbsd:device_pmf_driver_suspend+0x35 Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend_locked() at netbsd:pmf_device_suspend_locked+0xe3 Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend() at netbsd:pmf_device_suspend+0x41 Apr 15 19:48:29 bmo /netbsd: pmf_system_suspend() at netbsd:pmf_system_suspend+0xc1 Apr 15 19:48:29 bmo /netbsd: acpi_enter_sleep_state() at netbsd:acpi_enter_sleep_state+0x115 Apr 15 19:48:29 bmo /netbsd: sysctl_hw_acpi_sleepstate() at netbsd:sysctl_hw_acpi_sleepstate+0xfe Apr 15 19:48:29 bmo /netbsd: sysctl_dispatch() at netbsd:sysctl_dispatch+0xc4 Apr 15 19:48:29 bmo /netbsd: sys___sysctl() at netbsd:sys___sysctl+0xd0 Apr 15 19:48:29 bmo /netbsd: syscall() at
Re: kernel panic: uvm_fault
On Mon, Dec 22, 2014 at 03:56:43PM +, Robert Swindells wrote: Thomas Klausner wrote: On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote: Thomas Klausner wrote: I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. It looks the same as the panic you had back in September to me: http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html Can you turn on HW checksumming on this machine ? I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum): wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 ... I had in the back of my memory that hardware checksumming was usually the cause of bugs, not when it's turned off. Am I misremembering? Depends on the network controller, wm works well for me with everything enabled. wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx enabled=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx enabled=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 I have had similar crashes to you when using sw checksumming on amd64, never seen it on i386 or arm. There was also this: http://mail-index.netbsd.org/port-sparc64/2014/11/29/msg002298.html I guess we need to add some more KASSERT() checks. Ok, the Synology installed an opsys update again last night, and a couple minutes ago I tried writing to a still-mounted file system from it. And got a panic. From dmesg: 192.168.1.2:/volume1/roms: re-enabling wcc 192.168.1.2:/volume1/video: re-enabling wcc panic: _bus_virt_to_bus cpu1: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf bus_dmamap_load_mbuf() at netbsd:bus_dmamap_load_mbuf+0xf0 wm_nq_start() at netbsd:wm_nq_start+0x1c5 ifq_enqueue() at netbsd:ifq_enqueue+0xae ether_output() at netbsd:ether_output+0x579 ip_output() at netbsd:ip_output+0xdeb tcp_output() at netbsd:tcp_output+0x15cf tcp_send_wrapper() at netbsd:tcp_send_wrapper+0xa2 sosend() at netbsd:sosend+0x712 nfs_send() at netbsd:nfs_send+0x8e nfs_request() at netbsd:nfs_request+0x39d nfs_writerpc() at netbsd:nfs_writerpc+0x3b0 nfs_doio() at netbsd:nfs_doio+0x250 nfssvc_iod() at netbsd:nfssvc_iod+0x1a1 cpu1: End traceback... I had and still have all the checksum options turned on as you suggested. wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=3ff00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx enabled=3ff00UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx enabled=3ff00UDP6CSUM_Rx,UDP6CSUM_Tx ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 At least the backtrace looks nfs related this time :) Thomas
Re: kernel panic: uvm_fault
On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote: alloc_bouncebus? On amd64? I think you've got a trashed pointer somewhere. I have makeoptions DEBUG=-g # compile full symbol table # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/ Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches Thomas On Mon, Jan 26, 2015 at 02:47:55PM +0100, Thomas Klausner wrote: On Mon, Dec 22, 2014 at 03:56:43PM +, Robert Swindells wrote: Thomas Klausner wrote: On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote: Thomas Klausner wrote: I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. It looks the same as the panic you had back in September to me: http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html Can you turn on HW checksumming on this machine ? I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum): wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 ... I had in the back of my memory that hardware checksumming was usually the cause of bugs, not when it's turned off. Am I misremembering? Depends on the network controller, wm works well for me with everything enabled. wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx enabled=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx enabled=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 I have had similar crashes to you when using sw checksumming on amd64, never seen it on i386 or arm. There was also this: http://mail-index.netbsd.org/port-sparc64/2014/11/29/msg002298.html I guess we need to add some more KASSERT() checks. Ok, the Synology installed an opsys update again last night, and a couple minutes ago I tried writing to a still-mounted file system from it. And got a panic. From dmesg: 192.168.1.2:/volume1/roms: re-enabling wcc 192.168.1.2:/volume1/video: re-enabling wcc panic: _bus_virt_to_bus cpu1: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf bus_dmamap_load_mbuf() at netbsd:bus_dmamap_load_mbuf+0xf0 wm_nq_start() at netbsd:wm_nq_start+0x1c5 ifq_enqueue() at netbsd:ifq_enqueue+0xae ether_output() at netbsd:ether_output+0x579 ip_output() at netbsd:ip_output+0xdeb tcp_output() at netbsd:tcp_output+0x15cf tcp_send_wrapper() at netbsd:tcp_send_wrapper+0xa2 sosend() at netbsd:sosend+0x712 nfs_send() at netbsd:nfs_send+0x8e nfs_request() at netbsd:nfs_request+0x39d nfs_writerpc() at netbsd:nfs_writerpc+0x3b0 nfs_doio() at netbsd:nfs_doio+0x250 nfssvc_iod() at netbsd:nfssvc_iod+0x1a1 cpu1: End traceback... I had and still have all the checksum options turned on as you suggested. wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=3ff00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx enabled=3ff00UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx enabled=3ff00UDP6CSUM_Rx,UDP6CSUM_Tx ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 At least the backtrace looks nfs related this time :) Thomas -- Thor Lancelot Simont...@panix.com From the tooth paste you use in the morning to the salt on your evening meal, it's easy to take for granted the many products brought to us with explosives. - Institute of Manufacturers of Explosives, Explosives Make It Possible
Re: kernel panic: uvm_fault
On Mon, Jan 26, 2015 at 01:07:40PM -0500, Thor Lancelot Simon wrote: On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote: On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote: alloc_bouncebus? On amd64? I think you've got a trashed pointer somewhere. I have makeoptions DEBUG=-g # compile full symbol table # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/ Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform? Because the device doesn't support 64bit DMA? Joerg
Re: kernel panic: uvm_fault
On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote: On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote: alloc_bouncebus? On amd64? I think you've got a trashed pointer somewhere. I have makeoptions DEBUG=-g # compile full symbol table # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/ Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform? Thor
Re: kernel panic: uvm_fault
Thor Lancelot Simon wrote: On Mon, Jan 26, 2015 at 08:03:26PM +0100, Joerg Sonnenberger wrote: On Mon, Jan 26, 2015 at 01:07:40PM -0500, Thor Lancelot Simon wrote: On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote: On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote: alloc_bouncebus? On amd64? I think you've got a trashed pointer somewhere. I have makeoptions DEBUG=-g # compile full symbol table # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/ Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform? Because the device doesn't support 64bit DMA? That doesn't sound right for this device. If there is an error in the normal route through bus_dmamap_load_mbuf then it tries to use a bounce buffer. The code is shared with i386. It looks to me as if it could be triggered by a mbuf with an invalid size.
Re: kernel panic: uvm_fault
On Mon, Dec 22, 2014 at 03:49:20PM +0100, Thomas Klausner wrote: I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. I just saw that the Synology had installed an operating system upgrade again (on its own) on this day. So I guess this is related to the NFS mount I have from Synology (as server) to NetBSD (as client). The mount flags from my /etc/fstab are currently intr,nodev,nosuid,rw,tcp. Thomas
kernel panic: uvm_fault
Hi! I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. I don't know if that's related, but there was no particular load on the machine. From dmesg after reboot: uvm_fault(0x811cf2c0, 0x8003393b8000, 1) - e fatal page fault in supervisor mode trap type 6 code 0 rip 8028b965 cs 8 rflags 10202 cr2 8003393b8000 ilevel 4 rsp fe813bcb8728 curlwp 0xfe8825ee1220 pid 0.143 lowest kstack 0xfe813bcb52c0 panic: trap cpu8: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp cpu8: End traceback... Kernel backtrace: (gdb) bt #0 0x80677a85 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x808ccb54 in vpanic (fmt=fmt@entry=0x80ddb08d trap, ap=ap@entry=0xfe813bcb8510) at /archive/foreign/src/sys/kern/subr_prf.c:340 #2 0x808ccc0f in panic (fmt=fmt@entry=0x80ddb08d trap) at /archive/foreign/src/sys/kern/subr_prf.c:256 #3 0x8091bd87 in trap (frame=0xfe813bcb8630) at /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298 #4 0x8010108e in alltraps () #5 0x8028b965 in .Mmbuf_inner_loop () #6 0xfe8349294000 in ?? () #7 0xfe813bcb8758 in ?? () #8 0x8058733e in in_delayed_cksum (m=0x8003393b8000) at /archive/foreign/src/sys/netinet/ip_output.c:793 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) Is this a valid backtrace? Does it give any useful hints? Thomas
Re: kernel panic: uvm_fault
On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote: Thomas Klausner wrote: I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. last activity I had started was downloading a file from network to an NFS directory mounted from a Synology. [snip] #1 0x808ccb54 in vpanic (fmt=fmt@entry=0x80ddb08d trap, ap=ap@entry=0xfe813bcb8510) at /archive/foreign/src/sys/kern/subr_prf.c:340 #2 0x808ccc0f in panic (fmt=fmt@entry=0x80ddb08d trap) at /archive/foreign/src/sys/kern/subr_prf.c:256 #3 0x8091bd87 in trap (frame=0xfe813bcb8630) at /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298 #4 0x8010108e in alltraps () #5 0x8028b965 in .Mmbuf_inner_loop () #6 0xfe8349294000 in ?? () #7 0xfe813bcb8758 in ?? () #8 0x8058733e in in_delayed_cksum (m=0x8003393b8000) at /archive/foreign/src/sys/netinet/ip_output.c:793 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) Is this a valid backtrace? Does it give any useful hints? It looks the same as the panic you had back in September to me: http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html Can you turn on HW checksumming on this machine ? I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum): wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 ... I had in the back of my memory that hardware checksumming was usually the cause of bugs, not when it's turned off. Am I misremembering? Thomas
Re: kernel panic on a cold start amd64.(msk0 and bridge problem).
On 27.05.2014 16:09, Ilia Zykov wrote: Now I can reproduce it persistent. Kernel panic on a network bridge with a msk interface hasn't connection. Do I need open a new bug? Or it can be fixed easy? The main reason is: msk0: watchdog timeout from source: void msk_watchdog(struct ifnet *ifp) { [...] /* XXX Resets both ports; we shouldn't do that. */ msk_reset(sc_if-sk_softc); msk_init(ifp); [...] } Hello. Now it's happening if the msk0(Marvell Yukon 88E8056 (ethernet network, revision 0x12)) inside the bridge and hasn't a patch cord connection. Maybe I had configured the bridge wrong? NetBSD 6.99.47 NetBSD 6.99.47 (GENERIC.201407150020Z) cat /etc/ifconfig.bridge0 create !brconfig $int add msk0 add re0 up cat /etc/ifconfig.msk0 up cat /etc/ifconfig.re0 up Dhcpcd works on the re0 interface normal. cat /etc/rc.conf ... dhcpcd=YES dhcpcd_flags=-4 ... cat /etc/dhcpcd.conf ... allowinterfaces re* crash -M work/core Crash version 6.99.47, image version 6.99.47. System panicked: kernel diagnostic assertion (!cpu_intr_p() !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != NULL) failed: file /home/source/ab/HEAD/src/sys/kern/subr_pool.c, line 2211 pool 'vmmpepl' is IPL_NONE, but called from interrup Backtrace from time of crash is available. crash bt _KERNEL_OPT_NARCNET() at 0 ?() at fe822fbc8510 vpanic() at vpanic+0x145 kern_assert() at kern_assert+0x4f pool_cache_get_paddr() at pool_cache_get_paddr+0x12c uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20 uvm_map_clip_start() at uvm_map_clip_start+0x1b uvm_unmap_remove() at uvm_unmap_remove+0x2fe uvm_unmap1() at uvm_unmap1+0x35 _bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41 msk_stop() at msk_stop+0x3f4 msk_init() at msk_init+0x35 if_slowtimo() at if_slowtimo+0x46 callout_softclock() at callout_softclock+0x392 softint_dispatch() at softint_dispatch+0xd3 DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e930ff0 Xsoftintr() at Xsoftintr+0x4f --- interrupt --- 0: Ilia.
Re: kernel panic on a cold start amd64.
Now I can reproduce it persistent. Kernel panic on a network bridge with a msk interface hasn't connection. Do I need open a new bug? Or it can be fixed easy? The main reason is: msk0: watchdog timeout from source: void msk_watchdog(struct ifnet *ifp) { [...] /* XXX Resets both ports; we shouldn't do that. */ msk_reset(sc_if-sk_softc); msk_init(ifp); [...] } From /var/log/messages: May 27 13:59:19 bmoy /netbsd: msk0: watchdog timeout May 27 13:59:19 bmoy /netbsd: panic: kernel diagnostic assertion (!cpu_intr_p() !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != NULL) failed: file /home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 2210 pool 'vmmpepl' is IPL_NONE, but called from interrup May 27 13:59:19 bmoy /netbsd: cpu0: Begin traceback... May 27 13:59:19 bmoy /netbsd: vpanic() at netbsd:vpanic+0x13c May 27 13:59:19 bmoy /netbsd: kern_assert() at netbsd:kern_assert+0x4f May 27 13:59:19 bmoy /netbsd: pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x12c May 27 13:59:19 bmoy /netbsd: uvm_mapent_alloc.isra.2() at netbsd:uvm_mapent_alloc.isra.2+0x20 May 27 13:59:19 bmoy /netbsd: uvm_map_clip_start() at netbsd:uvm_map_clip_start+0x1b May 27 13:59:19 bmoy /netbsd: uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x2fe May 27 13:59:19 bmoy /netbsd: uvm_unmap1() at netbsd:uvm_unmap1+0x35 May 27 13:59:19 bmoy /netbsd: _bus_dmamap_destroy.isra.11() at netbsd:_bus_dmamap_destroy.isra.11+0x41 May 27 13:59:19 bmoy /netbsd: msk_stop() at netbsd:msk_stop+0x3f4 May 27 13:59:19 bmoy /netbsd: msk_init() at netbsd:msk_init+0x35 May 27 13:59:19 bmoy /netbsd: if_slowtimo() at netbsd:if_slowtimo+0x46 May 27 13:59:19 bmoy /netbsd: callout_softclock() at netbsd:callout_softclock+0x392 May 27 13:59:19 bmoy /netbsd: softint_dispatch() at netbsd:softint_dispatch+0xd3 May 27 13:59:19 bmoy /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe810e932ff0 May 27 13:59:19 bmoy /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f May 27 13:59:19 bmoy /netbsd: --- interrupt --- May 27 13:59:19 bmoy /netbsd: 0: May 27 13:59:19 bmoy /netbsd: cpu0: End traceback... May 27 13:59:19 bmoy /netbsd: May 27 13:59:19 bmoy /netbsd: dumping to dev 0,1 (offset=4197039, size=2096926): May 27 13:59:19 bmoy /netbsd: dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, May 27 13:59:19 bmoy /netbsd:2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014 May 27 13:59:19 bmoy /netbsd:The NetBSD Foundation, Inc. All rights reserved. May 27 13:59:19 bmoy /netbsd: Copyright (c) 1982, 1986, 1989, 1991, 1993 May 27 13:59:19 bmoy /netbsd:The Regents of the University of California. All rights reserved. May 27 13:59:19 bmoy /netbsd: On 25.05.2014 23:55, Ilia Zykov wrote: But it is five years old the testing machine and can has hardware degradation. Crash version 6.99.42, image version 6.99.42. System panicked: kernel diagnostic assertion (!cpu_intr_p() !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != NULL) failed: file /home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 2210 pool 'vmmpepl' is IPL_NONE, but called from interrup Backtrace from time of crash is available. crash bt _KERNEL_OPT_NARCNET() at 0 ?() at fe822c8c4cf8 vpanic() at vpanic+0x145 kern_assert() at kern_assert+0x4f pool_cache_get_paddr() at pool_cache_get_paddr+0x12c uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20 uvm_map_clip_start() at uvm_map_clip_start+0x1b uvm_unmap_remove() at uvm_unmap_remove+0x2fe uvm_unmap1() at uvm_unmap1+0x35 _bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41 msk_stop() at msk_stop+0x3f4 msk_init() at msk_init+0x35 if_slowtimo() at if_slowtimo+0x46 callout_softclock() at callout_softclock+0x392 softint_dispatch() at softint_dispatch+0xd3 DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e932ff0 Xsoftintr() at Xsoftintr+0x4f --- interrupt --- 0: crash exit 6.99.42 NetBSD 6.99.42 (GENERIC) #0: Thu May 22 20:16:12 UTC 2014 bui...@b2.netbsd.org:/home/builds/ab/HEAD/amd64/201405221850Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64 Ilia.
kernel panic on a cold start amd64.
But it is five years old the testing machine and can has hardware degradation. Crash version 6.99.42, image version 6.99.42. System panicked: kernel diagnostic assertion (!cpu_intr_p() !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != NULL) failed: file /home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 2210 pool 'vmmpepl' is IPL_NONE, but called from interrup Backtrace from time of crash is available. crash bt _KERNEL_OPT_NARCNET() at 0 ?() at fe822c8c4cf8 vpanic() at vpanic+0x145 kern_assert() at kern_assert+0x4f pool_cache_get_paddr() at pool_cache_get_paddr+0x12c uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20 uvm_map_clip_start() at uvm_map_clip_start+0x1b uvm_unmap_remove() at uvm_unmap_remove+0x2fe uvm_unmap1() at uvm_unmap1+0x35 _bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41 msk_stop() at msk_stop+0x3f4 msk_init() at msk_init+0x35 if_slowtimo() at if_slowtimo+0x46 callout_softclock() at callout_softclock+0x392 softint_dispatch() at softint_dispatch+0xd3 DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e932ff0 Xsoftintr() at Xsoftintr+0x4f --- interrupt --- 0: crash exit 6.99.42 NetBSD 6.99.42 (GENERIC) #0: Thu May 22 20:16:12 UTC 2014 bui...@b2.netbsd.org:/home/builds/ab/HEAD/amd64/201405221850Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC amd64 Ilia.
kernel panic in xhci on boot
On my desktop machine I am pretty much guaranteed a kernel panic when I reboot from windows 8 into NetBSD, the traceback is: softint_schedule() usb_schedsoftintr() xhci_intr1() The assert in softint_schedule is firing because offset is 0, this is passed in by usb_schedsoftintr, it should be the soft intr cookie. It looks to me like there is a race there that a softintr can try to be queued before the softintr queue has been set up. Usually, if I reboot again I don't see the assertion and things work fine from that point on. This is on an amd64 system with sources from a few days ago. -- Brett Lymn
Re: Kernel panic when trying to mount non-existing file-system
On Sat, Mar 29, 2014 at 03:19:48PM -0700, Andy Ruhl wrote: I didn't get the memo either. Sorry about the unusefull answer - of course the crash is not intended, but I couldn't reproduce it at first try last night - might depend on the architecture and concrete kernel (e.g. wether trying to load modules fails) or something. Adam, could you please file a PR? Thanks, Martin
Re: Kernel panic when trying to mount non-existing file-system
On Sat, Mar 29, 2014 at 08:54:36PM +0100, Adam Ciarci?ski wrote: Is that intentional? Yes, didn't you get the memo? Martin
Re: Kernel panic when trying to mount non-existing file-system
Is that intentional? Yes, didn't you get the memo? Martin No. I guess the postman stole it again. 8-) Adam
Re: Kernel panic when trying to mount non-existing file-system
On Sat, Mar 29, 2014 at 1:01 PM, Adam Ciarciński a...@netbsd.org wrote: Is that intentional? Yes, didn't you get the memo? Martin No. I guess the postman stole it again. 8-) I didn't get the memo either. This is not true in a relatively recent build of 6 when trying to mount NTFS and there was no kernel support. I didn't see a recent PR. I'm wondering what is going on because I'm considering putting current on one of my machines. Andy