Re: Kernel panic: gpioctl list + odroid-c1

2022-05-11 Thread Brook Milligan


> On May 10, 2022, at 8:46 PM, Brook Milligan  wrote:
> 
> One more piece of information.
> 
> src/sys/arch/arm/amlogic/meson8b_pinctrl.c includes the following code:
> 
> static const struct meson_pinctrl_gpio meson8b_cbus_gpios[] = {
> 
>   … < deleted sections > …
> 
>   /* GPIODV */
>   CBUS_GPIO(GPIODV_24, 6, 24, 0, 24),
>   CBUS_GPIO(GPIODV_25, 6, 25, 0, 25),
>   CBUS_GPIO(GPIODV_26, 6, 26, 0, 26),
>   CBUS_GPIO(GPIODV_27, 6, 27, 0, 27),
>   CBUS_GPIO(GPIODV_28, 6, 28, 0, 28),
>   CBUS_GPIO(GPIODV_29, 6, 29, 0, 29),
> 
> It seems that GPIODV_9 does not occur in the second list; I would have 
> expected it to the be first entry.  Is there a reason for it to be missing?
> 
> Could this be the cause of the panic?

Further along: the short answer is yes.  The following patch fixes the 
immediate problem of the panic, although I have no idea if the data here are 
correct; I’m just following the pattern of the other entries.

Index: meson8b_pinctrl.c
===
RCS file: /cvsroot/src/sys/arch/arm/amlogic/meson8b_pinctrl.c,v
retrieving revision 1.2
diff -u -r1.2 meson8b_pinctrl.c
--- meson8b_pinctrl.c   14 Aug 2019 09:50:20 -  1.2
+++ meson8b_pinctrl.c   11 May 2022 13:08:29 -
@@ -226,6 +226,7 @@
CBUS_GPIO(GPIOY_14, 3, 14, 3, 14),
 
/* GPIODV */
+   CBUS_GPIO(GPIODV_9, 6, 9, 0, 9),
CBUS_GPIO(GPIODV_24, 6, 24, 0, 24),
CBUS_GPIO(GPIODV_25, 6, 25, 0, 25),
CBUS_GPIO(GPIODV_26, 6, 26, 0, 26),

I would appreciate confirmation that the data this patch adds to the lookup 
table is correct.

Clearly, however, there is a hidden problem somewhere else in the code.  This 
is a lookup table; the pin number (or potentially name) is the key.  Almost 
certainly, the problem was a missing entry causing the lookup to not match 
anything.  The presumably bogus information returned led to the panic.  This 
means that somewhere else in the code is lookup logic that is not detecting the 
“missing key” case, which means that there are potential panics lurking in the 
future whenever a table like this is incomplete.  Unfortunately, I have no idea 
where that lookup code is; ideas?

Unless I hear otherwise, I will commit this patch sometime soon.  In the 
meantime, I would appreciate feedback.

Thanks a lot.

Cheers,
Brook



Re: Kernel panic: gpioctl list + odroid-c1

2022-05-10 Thread Brook Milligan


> On May 10, 2022, at 8:06 PM, Brook Milligan  wrote:
> 
> I have encountered a totally repeatable kernel panic by running "gpioctl 
> list” on an odroid-c1 board.
> 
> # name -a
> NetBSD armv7 9.99.96 NetBSD 9.99.96 (GENERIC) #0: Mon May  2 10:50:02 UTC 
> 2022  mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC 
> evbarm
> 
> To investigate, I added some printf() to the gpiolist() function to see what 
> was happening in the loop through the pins.  Here is a bit of the output:
> 
> # ./gpioctl2 gpio0 list
> gpioctl.c::gpiolist()
> gpioctl.c::gpiolist(): gpio_npins=71
> gpioctl.c::gpiolist(): gpio_pin 0
>  0: gp_pin=0
>  0: gp_value=1
>  0: gp_name=GPIOX_0
> gpioctl.c::gpiolist(): gpio_pin 1
>  1: gp_pin=1
>  1: gp_value=1
>  1: gp_name=GPIOX_1
> 
> … < lots of pin output deleted > …
> 
> gpioctl.c::gpiolist(): gpio_pin 29
>  29: gp_pin=29
>  29: gp_value=1
>  29: gp_name=GPIOY_14
> gpioctl.c::gpiolist(): gpio_pin 30
> [  33.9588550] panic: divide by 0
> [  33.9588550] cpu0: Begin traceback...
> [  33.9588550] 0xbd7cdbd4: netbsd:db_panic+0x14
> [  33.9677710] 0xbd7cdbf4: netbsd:vpanic+0x114
> [  33.9677710] 0xbd7cdc0c: netbsd:panic+0x24
> [  33.9761750] 0xbd7cdc2c: netbsd:__aeabi_idiv0+0x18
> [  33.9822960] 0xbd7cdc4c: netbsd:meson_pinctrl_pin_read+0x88
> [  33.9822960] 0xbd7cdcec: netbsd:gpioioctl+0x4f4
> [  33.9902860] 0xbd7cdd24: netbsd:spec_ioctl+0x60
> [  33.9902860] 0xbd7cdd54: netbsd:VOP_IOCTL+0x50
> [  33.9991180] 0xbd7cde24: netbsd:vn_ioctl+0xd8
> [  34.0057320] 0xbd7cdeec: netbsd:sys_ioctl+0x47c
> [  34.0057320] 0xbd7cdfac: netbsd:syscall+0x188
> [  34.0135450] cpu0: End traceback...
> Stopped in pid 214.214 (gpioctl2) atnetbsd:cpu_Debugger+0x4:bx
>   
> r14
> db{0}>

One more piece of information.

src/sys/arch/arm/amlogic/meson8b_pinctrl.c includes the following code:

/*
 * GPIO banks. The values must match those in dt-bindings/gpio/meson8b-gpio.h
 */
enum {

… < deleted sections > …

GPIODV_9 = 30,
GPIODV_24,
GPIODV_25,
GPIODV_26,
GPIODV_27,
GPIODV_28,
GPIODV_29,

… < more deleted sections > …

};

… < deleted sections > …

static const struct meson_pinctrl_gpio meson8b_cbus_gpios[] = {

… < deleted sections > …

/* GPIODV */
CBUS_GPIO(GPIODV_24, 6, 24, 0, 24),
CBUS_GPIO(GPIODV_25, 6, 25, 0, 25),
CBUS_GPIO(GPIODV_26, 6, 26, 0, 26),
CBUS_GPIO(GPIODV_27, 6, 27, 0, 27),
CBUS_GPIO(GPIODV_28, 6, 28, 0, 28),
CBUS_GPIO(GPIODV_29, 6, 29, 0, 29),

It seems that GPIODV_9 does not occur in the second list; I would have expected 
it to the be first entry.  Is there a reason for it to be missing?

Could this be the cause of the panic?

Cheers,
Brook



Kernel panic: gpioctl list + odroid-c1

2022-05-10 Thread Brook Milligan
I have encountered a totally repeatable kernel panic by running "gpioctl list” 
on an odroid-c1 board.

# name -a
NetBSD armv7 9.99.96 NetBSD 9.99.96 (GENERIC) #0: Mon May  2 10:50:02 UTC 2022  
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/evbarm/compile/GENERIC evbarm

To investigate, I added some printf() to the gpiolist() function to see what 
was happening in the loop through the pins.  Here is a bit of the output:

# ./gpioctl2 gpio0 list
gpioctl.c::gpiolist()
gpioctl.c::gpiolist(): gpio_npins=71
gpioctl.c::gpiolist(): gpio_pin 0
  0: gp_pin=0
  0: gp_value=1
  0: gp_name=GPIOX_0
gpioctl.c::gpiolist(): gpio_pin 1
  1: gp_pin=1
  1: gp_value=1
  1: gp_name=GPIOX_1

… < lots of pin output deleted > …

gpioctl.c::gpiolist(): gpio_pin 29
  29: gp_pin=29
  29: gp_value=1
  29: gp_name=GPIOY_14
gpioctl.c::gpiolist(): gpio_pin 30
[  33.9588550] panic: divide by 0
[  33.9588550] cpu0: Begin traceback...
[  33.9588550] 0xbd7cdbd4: netbsd:db_panic+0x14
[  33.9677710] 0xbd7cdbf4: netbsd:vpanic+0x114
[  33.9677710] 0xbd7cdc0c: netbsd:panic+0x24
[  33.9761750] 0xbd7cdc2c: netbsd:__aeabi_idiv0+0x18
[  33.9822960] 0xbd7cdc4c: netbsd:meson_pinctrl_pin_read+0x88
[  33.9822960] 0xbd7cdcec: netbsd:gpioioctl+0x4f4
[  33.9902860] 0xbd7cdd24: netbsd:spec_ioctl+0x60
[  33.9902860] 0xbd7cdd54: netbsd:VOP_IOCTL+0x50
[  33.9991180] 0xbd7cde24: netbsd:vn_ioctl+0xd8
[  34.0057320] 0xbd7cdeec: netbsd:sys_ioctl+0x47c
[  34.0057320] 0xbd7cdfac: netbsd:syscall+0x188
[  34.0135450] cpu0: End traceback...
Stopped in pid 214.214 (gpioctl2) atnetbsd:cpu_Debugger+0x4:bx  
r14
db{0}>

I’m guessing this is a device tree problem, given the reference to 
meson_pinctl_pin_read(), but I have no idea how the kernel data structure is 
created or what to do about this.

For reference, u-boot loads the following device tree before booting the 
kernel: meson8b-odroidc1.dtb.

Any thoughts would be greatly appreciated.

Thanks a lot.

Cheers,
Brook

Re: kernel panic in NetBSD-9.1-amd64-install.img (exiting unheld spin mutex)

2020-11-14 Thread Michael van Elst
r...@reedmedia.net ("Jeremy C. Reed") writes:

>panic: lock error: Mutex error: mutex_vector_exit,742: exiting unheld 
>spin mutex: lock 0x8699588015c0 cpu 0 lwp 0xff... (my photo was 
>cropped)

Index: athn.c
===
RCS file: /cvsroot/src/sys/dev/ic/athn.c,v
retrieving revision 1.23
diff -p -u -r1.23 athn.c
--- athn.c  29 Jan 2020 14:09:58 -  1.23
+++ athn.c  15 Nov 2020 07:04:38 -
@@ -2734,7 +2734,7 @@ athn_set_multi(struct athn_softc *sc)
 
if ((ifp->if_flags & (IFF_ALLMULTI | IFF_PROMISC)) != 0) {
lo = hi = 0x;
-   goto done;
+   goto done2;
}
lo = hi = 0;
ETHER_LOCK(ec);
@@ -2760,6 +2760,7 @@ athn_set_multi(struct athn_softc *sc)
}
  done:
ETHER_UNLOCK(ec);
+ done2:
AR_WRITE(sc, AR_MCAST_FIL0, lo);
AR_WRITE(sc, AR_MCAST_FIL1, hi);
AR_WRITE_BARRIER(sc);

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


kernel panic in NetBSD-9.1-amd64-install.img (exiting unheld spin mutex)

2020-11-14 Thread Jeremy C. Reed
I booted NetBSD-9.1-amd64-install.img

I didn't install.

For about 30 minutes I attempted to get my athn device working as an 
access point plus using dhcpd and hostapd (they are on the installer 
system which I didn't know before). My android phone could get to a 
"connecting ..." state but never connected to it.

I ran multiple times:

ifconfig athn0 inet 172.16.1.1 media autoselect mediaopt hostap chan 5 nwkey 


(in between using dhcpd and hostap and tcpdump)

I had dhcpd and tcpdump running in background

the final time I got a kernel panic

Mutex error: mutex_vector_exit,742: exiting unheld spin mutex
...

panic: lock error: Mutex error: mutex_vector_exit,742: exiting unheld 
spin mutex: lock 0x8699588015c0 cpu 0 lwp 0xff... (my photo was 
cropped)

...

vpanic() at netbsd:vpanic+0x160
snprintf() at netbsd:snprintf
...

athn_ioctl() at netbsd:athn_ioctl+0x18b
if_mcast_op() at netbsd:if_mcast_op+0x4b

(sorry I don't type all in)

in_delmulti
in_scrubaddr
in_purgeaddr
in_control0
udp_ioctl_wrapper
compat_ifioctl
doifioctl
sys_ioctl
syscall
...
--- syscall (number 54) ---
74df3936822a:
cpu0: End traceback...

I can type in the rest if needed.


kernel panic in genfs_deadunlock

2019-04-18 Thread coypu
hi folks,
while testing a very recent kernel, and waiting for it to reboot, I got this:

Crash version 8.99.37, image version 8.99.37.
System panicked: lock error: Reader / writer lock: rw_vector_exit,454: 
assertion failed: RW_COUNT(rw) != 0: lock 0xed5bd50116b0 cpu 3 lwp 
0xed5f70a20ae0
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
ostype() at ostype+0xb7290
vpanic() at vpanic+0x169
snprintf() at snprintf
lockdebug_abort() at lockdebug_abort+0xe7
rw_vector_exit() at rw_vector_exit+0xce
genfs_deadunlock() at genfs_deadunlock+0x14
VOP_UNLOCK() at VOP_UNLOCK+0x51
cnclose() at cnclose+0x7e
cdev_close() at cdev_close+0xbc
spec_close() at spec_close+0x199
VOP_CLOSE() at VOP_CLOSE+0x4c
vn_close() at vn_close+0x34
closef() at closef+0x6d
fd_close() at fd_close+0x1f4
sys_close() at sys_close+0x20
syscall() at syscall+0x173
--- syscall (number 6) ---
755abfa42bea:


If it helps anyone. I have no idea :-)


Re: Recent NetBSD/amd64 7.99.54 kernel panic

2017-01-03 Thread Ryo ONODERA
Hi,

Sorry. This is as same as PR kern/51767.

Thank you.

From: Ryo ONODERA , Date: Wed, 04 Jan 2017 02:40:14 +0900 
(JST)

> Hi,
> 
> Recent NetBSD/amd64 kernel panics with the following message
> (manually transcripted).
> Could anyone investigate this?
> Thank you.
> 
> stack overflow detected: terminated
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 
> ilevel 4 rsp fe810e688a70
> curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0
> Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x05:  leave
> db{0}> bt
> breakpoint() at netbsd:breakpoint+0x05
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ssp_init() at netbsd:ssp_init
> tcp_output() at netbsd:tcp_output+0x246e
> tcp_input() at netbsd:tcp_input+0x111e
> tcp6_input() at netbsd:tcp6_input+0x49
> ip6_input() at netbsd:ip6_input+0x724
> ip6intr() at netbsd:ip6intr+0x71
> softint_dispatch() at netbsd:softint_dispatch+0xda
> db{0}> 
> 
> --
> Ryo ONODERA // ryo...@yk.rim.or.jp
> PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3


Re: Recent NetBSD/amd64 7.99.54 kernel panic

2017-01-03 Thread Kamil Rytarowski
On 03.01.2017 18:40, Ryo ONODERA wrote:
> Hi,
> 
> Recent NetBSD/amd64 kernel panics with the following message
> (manually transcripted).
> Could anyone investigate this?
> Thank you.
> 
> stack overflow detected: terminated
> fatal breakpoint trap in supervisor mode
> trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 
> ilevel 4 rsp fe810e688a70
> curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0
> Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x05:  leave
> db{0}> bt
> breakpoint() at netbsd:breakpoint+0x05
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> ssp_init() at netbsd:ssp_init
> tcp_output() at netbsd:tcp_output+0x246e
> tcp_input() at netbsd:tcp_input+0x111e
> tcp6_input() at netbsd:tcp6_input+0x49
> ip6_input() at netbsd:ip6_input+0x724
> ip6intr() at netbsd:ip6intr+0x71
> softint_dispatch() at netbsd:softint_dispatch+0xda
> db{0}> 
> 
> --
> Ryo ONODERA // ryo...@yk.rim.or.jp
> PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3
> 

I've just reproduced it locally:

panic: stack overflow detected; terminated
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
ssp_init() at netbsd:ssp_init
tcp_output() at netbsd:tcp_output+0x246e
tcp_input() at netbsd:tcp_input+0x111e
ipintr() at netbsd:ipintr+0xa46
softint_dispatch() at netbsd:softint_dispatch+0xda
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe813a2c7ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt --

It happened when fetching pkgsrc distfiles.



signature.asc
Description: OpenPGP digital signature


Recent NetBSD/amd64 7.99.54 kernel panic

2017-01-03 Thread Ryo ONODERA
Hi,

Recent NetBSD/amd64 kernel panics with the following message
(manually transcripted).
Could anyone investigate this?
Thank you.

stack overflow detected: terminated
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 80115455 cs 8 rflag 246 cr2 79b90fe688f0 ilevel 
4 rsp fe810e688a70
curlwp0xfe8220f32460 pid 0.3lowes kstack 0xfe810e6852c0
Stopped in pid 0.3 (system) at  netbsd:breakpoint+0x05:  leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x05
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
ssp_init() at netbsd:ssp_init
tcp_output() at netbsd:tcp_output+0x246e
tcp_input() at netbsd:tcp_input+0x111e
tcp6_input() at netbsd:tcp6_input+0x49
ip6_input() at netbsd:ip6_input+0x724
ip6intr() at netbsd:ip6intr+0x71
softint_dispatch() at netbsd:softint_dispatch+0xda
db{0}> 

--
Ryo ONODERA // ryo...@yk.rim.or.jp
PGP fingerprint = 82A2 DC91 76E0 A10A 8ABB  FD1B F404 27FA C7D1 15F3


Re: kernel panic

2016-06-19 Thread Ryota Ozaki
On Sun, Jun 19, 2016 at 9:23 PM, Michael van Elst  wrote:
> brad.har...@gmail.com (bch) writes:
>
>>kernel (adjusted from GENNERIC to allow dtrace support) from latest src 
>>panics:
>
>>(transcription):
>
>>reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
>>struct ieee80211_node *) == NULL)" failed: file
>>"/usr/src/sys/80211/ieee80211_output.c", line 1347
>
>
> That assertion seems to be bogus. It checks a field in an mbuf
> that was just allocated in ieee80211_getmgtframe using m_getcl
> and that may contain random data in the ctx pointer.

Indeed.

>
> Another similar assertion in the same file is #ifdef __FreeBSD__.
>
> Looking at the current FreeBSD code, it still abuses the rcvif
> pointer for local data. But there are no such assertions, which
> would be bogus in FreeBSD either.

Thanks. I think we can remove the assertion(s) safely.

(I'm not sure why the assertion hadn't failed ever. I guess my changes
broke some implicit zeroing rcvif somewhere.)

  ozaki-r


Re: kernel panic

2016-06-19 Thread bch
On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA"  wrote:
>
> Hi,
>
> On 2016/06/16 8:15, bch wrote:
> > I am now at 1.414, and it seems stable.
>
> Thank you for your checking and reporting.

My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some
luck macro-ization, rejigging)? Can anybody point me to the commits that
apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)?

> If it seems there is still
> problems, please tell us.

Will do. I'd like to have the commit(s) identified and
re-witness/characterize the issue. Otherwise, things currently seem stable.
Thanks.

>
> Thanks,
>
> --
> //
> Internet Initiative Japan Inc.
>
> Device Engineering Section,
> IoT Platform Development Department,
> Network Division,
> Technology Unit
>
> Kengo NAKAHARA 


Re: kernel panic

2016-06-19 Thread Michael van Elst
brad.har...@gmail.com (bch) writes:

>kernel (adjusted from GENNERIC to allow dtrace support) from latest src panics:

>(transcription):

>reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
>struct ieee80211_node *) == NULL)" failed: file
>"/usr/src/sys/80211/ieee80211_output.c", line 1347


That assertion seems to be bogus. It checks a field in an mbuf
that was just allocated in ieee80211_getmgtframe using m_getcl
and that may contain random data in the ctx pointer.

Another similar assertion in the same file is #ifdef __FreeBSD__.

Looking at the current FreeBSD code, it still abuses the rcvif
pointer for local data. But there are no such assertions, which
would be bogus in FreeBSD either.


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: kernel panic

2016-06-16 Thread Ryota Ozaki
On Thu, Jun 16, 2016 at 3:04 PM, Ryota Ozaki  wrote:
> On Thu, Jun 16, 2016 at 1:56 PM, bch  wrote:
>>
>> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA"  wrote:
>>>
>>> Hi,
>>>
>>> On 2016/06/16 8:15, bch wrote:
>>> > I am now at 1.414, and it seems stable.
>>>
>>> Thank you for your checking and reporting.
>>
>> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some
>> luck macro-ization, rejigging)?
>
> Not related.
>
>> Can anybody point me to the commits that
>> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)?
>
> For iwm:
>   http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h
>
> Commit 1.164 broke iwm (and I guess all other wifi drivers)
> and commit 1.165 fixed it.
>
> For wm:
>   http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c
>
> Commit 1.413 broke wm and commit 1.414 fixed it.
>
>>
>>> If it seems there is still
>>> problems, please tell us.
>>
>> Will do. I'd like to have the commit(s) identified and
>> re-witness/characterize the issue. Otherwise, things currently seem stable.
>> Thanks.
>
> [Timeline]
>
> - Jun 10 13:31:45: mbuf.h r1.164
> - Jun 11 ??:??:??: you encountered the first panic
> - Jun 12 10:14:12: mbuf.h r1.165

oops

> - Jun 14 09:07:22: if_wm.c r1.164
 ^^
 r1.413
> - Jun 14 ??:??:??: you encountered the second panic
> - Jun 14 17:09:20: if_wm.c r1.165
 ^^
 r1.414

> - Jun 16 ??:??:??: you are here
>
> And I noticed that I forgot to bump the kernel version; my mbuf.h
> change required it. (I already bumped.) If you run a kernel between
> my mbuf.h change and the bump with network device driver modules
> of 7.99.30, something bad will happen. (I guess the issues you saw
> aren't related to this though.)
>
> Thanks,
>   ozaki-r


Re: kernel panic

2016-06-16 Thread Ryota Ozaki
On Thu, Jun 16, 2016 at 1:56 PM, bch  wrote:
>
> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA"  wrote:
>>
>> Hi,
>>
>> On 2016/06/16 8:15, bch wrote:
>> > I am now at 1.414, and it seems stable.
>>
>> Thank you for your checking and reporting.
>
> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some
> luck macro-ization, rejigging)?

Not related.

> Can anybody point me to the commits that
> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)?

For iwm:
  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h

Commit 1.164 broke iwm (and I guess all other wifi drivers)
and commit 1.165 fixed it.

For wm:
  http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c

Commit 1.413 broke wm and commit 1.414 fixed it.

>
>> If it seems there is still
>> problems, please tell us.
>
> Will do. I'd like to have the commit(s) identified and
> re-witness/characterize the issue. Otherwise, things currently seem stable.
> Thanks.

[Timeline]

- Jun 10 13:31:45: mbuf.h r1.164
- Jun 11 ??:??:??: you encountered the first panic
- Jun 12 10:14:12: mbuf.h r1.165
- Jun 14 09:07:22: if_wm.c r1.164
- Jun 14 ??:??:??: you encountered the second panic
- Jun 14 17:09:20: if_wm.c r1.165
- Jun 16 ??:??:??: you are here

And I noticed that I forgot to bump the kernel version; my mbuf.h
change required it. (I already bumped.) If you run a kernel between
my mbuf.h change and the bump with network device driver modules
of 7.99.30, something bad will happen. (I guess the issues you saw
aren't related to this though.)

Thanks,
  ozaki-r


Re: kernel panic

2016-06-15 Thread bch
I am now at 1.414, and it seems stable.
On Jun 15, 2016 4:04 PM, "Kengo NAKAHARA"  wrote:

> Hi,
>
> On 2016/06/16 1:44, bch wrote:
> > On 6/12/16, bch  wrote:
> >> On 6/11/16, bch  wrote:
> snip
> > And now, on wm(4):
> > -rwxr-xr-x  1 root  wheel  18218304 Jun 14 10:20 /netbsd
> >
> > strathcona# crash  -M ./netbsd.8.core /netbsd
> > Crash version 7.99.30, image version /amd64/compile/G.
> > WARNING: versions differ, you may not be able to examine this image.
> > System panicked: trap
> > Backtrace from time of crash is available.
> > crash> bt
> > _KERNEL_OPT_NARCNET() at 0
> > _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5
> > aprint_verbose() at aprint_verbose+0x2f
> > aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14
> > trap() at trap+0xc4b
> > --- trap (number 6) ---
> > mutex_enter() at mutex_enter+0xc
> > fddi_output() at fddi_output+0x47c
> > wm_tick() at wm_tick+0x230
> > in6_update_ifa1() at in6_update_ifa1+0x766
> > in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a
> > in6_control1() at in6_control1+0x521
> > in6_control() at in6_control+0x10d
> > udp6_connect_wrapper() at udp6_connect_wrapper+0x83
> > compat_43_sa_put() at compat_43_sa_put+0x14
> > if_flags_set() at if_flags_set+0xb5
> > sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d
> > handle_modctl_load() at handle_modctl_load+0x108
> > syscall() at syscall+0x14b
> > --- syscall (number 54) ---
> > 7f7ff74e89fa:
> > crash>
>
> May your if_wm.c ident be r1.413? If so, could you try r1.414?
>
>
> Thanks,
>
> --
> //
> Internet Initiative Japan Inc.
>
> Device Engineering Section,
> IoT Platform Development Department,
> Network Division,
> Technology Unit
>
> Kengo NAKAHARA 
>


Re: kernel panic

2016-06-15 Thread Kengo NAKAHARA
Hi,

On 2016/06/16 1:44, bch wrote:
> On 6/12/16, bch  wrote:
>> On 6/11/16, bch  wrote:
snip
> And now, on wm(4):
> -rwxr-xr-x  1 root  wheel  18218304 Jun 14 10:20 /netbsd
> 
> strathcona# crash  -M ./netbsd.8.core /netbsd
> Crash version 7.99.30, image version /amd64/compile/G.
> WARNING: versions differ, you may not be able to examine this image.
> System panicked: trap
> Backtrace from time of crash is available.
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5
> aprint_verbose() at aprint_verbose+0x2f
> aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14
> trap() at trap+0xc4b
> --- trap (number 6) ---
> mutex_enter() at mutex_enter+0xc
> fddi_output() at fddi_output+0x47c
> wm_tick() at wm_tick+0x230
> in6_update_ifa1() at in6_update_ifa1+0x766
> in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a
> in6_control1() at in6_control1+0x521
> in6_control() at in6_control+0x10d
> udp6_connect_wrapper() at udp6_connect_wrapper+0x83
> compat_43_sa_put() at compat_43_sa_put+0x14
> if_flags_set() at if_flags_set+0xb5
> sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d
> handle_modctl_load() at handle_modctl_load+0x108
> syscall() at syscall+0x14b
> --- syscall (number 54) ---
> 7f7ff74e89fa:
> crash>

May your if_wm.c ident be r1.413? If so, could you try r1.414?


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 


Re: kernel panic

2016-06-15 Thread bch
On 6/12/16, bch  wrote:
> On 6/11/16, bch  wrote:
>

previously reported bt on core from iwm(4) crash...

> strathcona# crash -M ./netbsd.6.core
> Crash version 7.99.30, image version /amd64/compile/G.
> WARNING: versions differ, you may not be able to examine this image.
> System panicked: kernel diagnostic assertion "M_GETCTX(m, struct
> ieee80211_node *) == NULL" failed: file
> "/usr/src/sys/net80211/ieee80211_output.c", line 1347
> Backtrace from time of crash is available.
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> ?() at fe810f8b7c00
> aprint_error() at aprint_error+0xe
> tcp_reass() at tcp_reass+0x2dc
> ieee80211_send_probereq() at ieee80211_send_probereq+0xc0
> ieee80211_match_bss() at ieee80211_match_bss+0x2b8
> ieee80211_newstate() at ieee80211_newstate+0xb1
> iwm_newstate_cb() at iwm_newstate_cb+0x11d
> xc_init_cpu() at xc_init_cpu+0x13a

And now, on wm(4):
-rwxr-xr-x  1 root  wheel  18218304 Jun 14 10:20 /netbsd

strathcona# crash  -M ./netbsd.8.core /netbsd
Crash version 7.99.30, image version /amd64/compile/G.
WARNING: versions differ, you may not be able to examine this image.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x5
aprint_verbose() at aprint_verbose+0x2f
aprint_naive_internal.part.0() at aprint_naive_internal.part.0+0x14
trap() at trap+0xc4b
--- trap (number 6) ---
mutex_enter() at mutex_enter+0xc
fddi_output() at fddi_output+0x47c
wm_tick() at wm_tick+0x230
in6_update_ifa1() at in6_update_ifa1+0x766
in6ifa_ifpforlinklocal() at in6ifa_ifpforlinklocal+0x4a
in6_control1() at in6_control1+0x521
in6_control() at in6_control+0x10d
udp6_connect_wrapper() at udp6_connect_wrapper+0x83
compat_43_sa_put() at compat_43_sa_put+0x14
if_flags_set() at if_flags_set+0xb5
sysctl_kern_sysvipc() at sysctl_kern_sysvipc+0x37d
handle_modctl_load() at handle_modctl_load+0x108
syscall() at syscall+0x14b
--- syscall (number 54) ---
7f7ff74e89fa:
crash>

>> On Jun 11, 2016 2:01 AM, "Ryota Ozaki"  wrote:
>>
>>> Hi,
>>>
>>> On Sat, Jun 11, 2016 at 3:58 AM, bch  wrote:
>>> > kernel (adjusted from GENNERIC to allow dtrace support) from latest
>>> > src
>>> panics:
>>> >
>>> > (transcription):
>>> >
>>> > reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
>>> > struct ieee80211_node *) == NULL)" failed: file
>>> > "/usr/src/sys/80211/ieee80211_output.c", line 1347
>>>
>>> Can you show me a backtrace?
>>>
>>> And let me know the latest version (date) of the kernel that worked for
>>> you.
>>>
>>>   ozaki-r
>>>
>>
>


Re: kernel panic

2016-06-11 Thread Ryota Ozaki
Hi,

On Sat, Jun 11, 2016 at 3:58 AM, bch  wrote:
> kernel (adjusted from GENNERIC to allow dtrace support) from latest src 
> panics:
>
> (transcription):
>
> reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
> struct ieee80211_node *) == NULL)" failed: file
> "/usr/src/sys/80211/ieee80211_output.c", line 1347

Can you show me a backtrace?

And let me know the latest version (date) of the kernel that worked for you.

  ozaki-r


kernel panic

2016-06-10 Thread bch
kernel (adjusted from GENNERIC to allow dtrace support) from latest src panics:

(transcription):

reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m,
struct ieee80211_node *) == NULL)" failed: file
"/usr/src/sys/80211/ieee80211_output.c", line 1347


Re: amd64-7.99.29 - Another kernel panic - ffs?

2016-05-29 Thread Sevan Janiyan


On 29/05/2016 17:43, Robert Swindells wrote:
> One thing that could help would be if you could make an image of the
> CF card after the panic has happened.
> 
> The filesystem that you were creating is probably a fair bit smaller
> than the ones where other people had the same problem.

I baked a new hpcarm release from HEAD this evening, going to give it a
try now and report back.



Sevan


Re: amd64-7.99.29 - Another kernel panic - ffs?

2016-05-29 Thread Robert Swindells

Sevan Janiyan  wrote:
>On 29/05/2016 11:34, Paul Goyette wrote:
>> Hmmm.  Sevan opened PR port-hpcarm/50840 but perhaps we should
>> recategorize the PR?
>
>Done. I'm still running a prebuilt image which Jun published back in
>February but happy to do some test if that's required.

One thing that could help would be if you could make an image of the
CF card after the panic has happened.

The filesystem that you were creating is probably a fair bit smaller
than the ones where other people had the same problem.

Robert Swindells


Re: amd64-7.99.29 - Another kernel panic - ffs?

2016-05-29 Thread Sevan Janiyan


On 29/05/2016 11:34, Paul Goyette wrote:
> Hmmm.  Sevan opened PR port-hpcarm/50840 but perhaps we should
> recategorize the PR?

Done. I'm still running a prebuilt image which Jun published back in
February but happy to do some test if that's required.


Sevan


Re: amd64-7.99.29 - Another kernel panic - ffs?

2016-05-29 Thread Paul Goyette

On Sun, 29 May 2016, Robert Swindells wrote:



Paul Goyette  wrote:

Well, today I just had another crash, this time in ffs_newvnode().  The
traceback (manually transcribed) is:


[snip]

Was the panic message "ffs_init_vnode: dup alloc" ?


I missed copying down the panic message, but the backtrace seems to 
think we were in the printf() calls leading up to that panic message.



I had this on a filesystem a couple of months ago, I was using wapbl
but didn't have QUOTA or QUOTA2 in the kernel, I confess I just copied
everything off and ran newfs(8) on it.


Well the only thing that seems to have been trashed is one subdirectory 
which was being cvs updated, and that is easily recovered.  So, no loss 
of data, just the inconvenience of having to reboot and clean up.




There has also been a fairly recent report of it happening when
installing NetBSD/hpcarm [1].

[1] https://mail-index.netbsd.org/port-hpcarm/2016/02/27/msg000196.html


Hmmm.  Sevan opened PR port-hpcarm/50840 but perhaps we should 
recategorize the PR?




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: NetBSD-current/i386 install kernel panic (sysv_ipc related)

2015-12-04 Thread Paul Goyette

(cc-ing current-users as a heads-up)

Yes, I got another report of this as well.  I am looking into it and
will fix as quickly as possible.


On Fri, 4 Dec 2015, Andreas Gustafsson wrote:


Hi Paul,

NetBSD-current/i386 panics during the install since yesterday.  Since
the panic message mentions sysv_ipc and you made some commits in that
area between the last successful install and the first unsuccessful
one, I'm reporting this to you :)

The panic message is:

  cd0 at atapibus0 drive 1:  cdrom removable
  wd0 at atabus0 drive 0
  wd0: 
  wd0: 1024 MB, 2080 cyl, 16 head, 63 sec, 512 bytes/sect x 2097152 sectors
  syscall 171 is busy
  WARNING: module error: builtin module `sysv_ipc' failed to init, error 16
  panic: kernel diagnostic assertion "sysvipc_listener == NULL" failed: file 
"/tmp/bracket/build/2015.12.03.03.03.58-i386/src/sys/kern/sysv_ipc.c", line 365
  fatal breakpoint trap in supervisor mode
  trap type 1 code 0 eip c010e424 cs 8 eflags 246 cr2 0 ilevel 0 esp c13e8e98
  curlwp 0xc1367ba0 pid 0 lid 1 lowest kstack 0xc13e62c0
  Stopped in pid 0.1 (system) at c010e424:  popl%ebp
  db{0}>

More logs are at:

 
http://releng.netbsd.org/b5reports/i386/commits-2015.12.html#2015.12.03.02.57.47

--
Andreas Gustafsson, g...@gson.org



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: NetBSD-current/i386 install kernel panic (sysv_ipc related)

2015-12-04 Thread Paul Goyette

This should be fixed now, although I am still testing a few more
combinations.


On Sat, 5 Dec 2015, Paul Goyette wrote:


(cc-ing current-users as a heads-up)

Yes, I got another report of this as well.  I am looking into it and
will fix as quickly as possible.


On Fri, 4 Dec 2015, Andreas Gustafsson wrote:


Hi Paul,

NetBSD-current/i386 panics during the install since yesterday.  Since
the panic message mentions sysv_ipc and you made some commits in that
area between the last successful install and the first unsuccessful
one, I'm reporting this to you :)

The panic message is:

  cd0 at atapibus0 drive 1:  cdrom removable
  wd0 at atabus0 drive 0
  wd0: 
  wd0: 1024 MB, 2080 cyl, 16 head, 63 sec, 512 bytes/sect x 2097152 sectors
  syscall 171 is busy
  WARNING: module error: builtin module `sysv_ipc' failed to init, error 16
  panic: kernel diagnostic assertion "sysvipc_listener == NULL" failed: 
file "/tmp/bracket/build/2015.12.03.03.03.58-i386/src/sys/kern/sysv_ipc.c", 
line 365

  fatal breakpoint trap in supervisor mode
  trap type 1 code 0 eip c010e424 cs 8 eflags 246 cr2 0 ilevel 0 esp 
c13e8e98

  curlwp 0xc1367ba0 pid 0 lid 1 lowest kstack 0xc13e62c0
  Stopped in pid 0.1 (system) at c010e424:  popl%ebp
  db{0}>

More logs are at:

 
http://releng.netbsd.org/b5reports/i386/commits-2015.12.html#2015.12.03.02.57.47

--
Andreas Gustafsson, g...@gson.org



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki
Hi,

It's probably due to my recent change to refcnt. I'm investigating
that defect.

Thanks,
  ozaki-r

On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee


Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki
Hi,

I just fixed one bug related to refcnt. The fix may shut up the panic.
Could you try again with a latest kernel?

Thanks,
  ozaki-r

On Fri, Jul 24, 2015 at 3:38 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 Hi,

 It's probably due to my recent change to refcnt. I'm investigating
 that defect.

 Hmm, I cannot reproduce it. Could you tell me the kernel config,
 network setups and apps running on the box?

 Thanks,
   ozaki-r


 Thanks,
   ozaki-r

 On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com 
 wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee


Re: Kernel panic from network traffic

2015-07-24 Thread Ryota Ozaki
On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 Hi,

 It's probably due to my recent change to refcnt. I'm investigating
 that defect.

Hmm, I cannot reproduce it. Could you tell me the kernel config,
network setups and apps running on the box?

Thanks,
  ozaki-r


 Thanks,
   ozaki-r

 On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com 
 wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

 This is all I got so far.

 Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion
 rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23
 21:46:11 mara /netbsd: cpu3: Begin traceback...
 Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
 Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
 Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
 netbsd:rtcache_clear+0x41
 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
 Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
 netbsd:in6_pcbdetach+0xcb
 Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
 netbsd:udp6_detach_wrapper+0x3f
 Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
 Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
 Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
 Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
 Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

 --
 Hisashi T Fujinaka - ht...@twofifty.com
 BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee


Re: Kernel panic from network traffic

2015-07-24 Thread Andy Ruhl
On Thu, Jul 23, 2015 at 10:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote:
 Being a moron, I plugged ports of my switch together. The big surprise
 is two ports away is my -current box and it kept panicking.

Hello fellow moron :)

I have a general question:

I see some comments around unifying route caches, but in this
particular case it seems related to ipv6. Is this an ipv6 problem or a
general problem? I have a -current machine and it's not likely to
encounter this particular scenario (sorry, heh), but wondering anyway.

Thanks!

Andy


Re: Kernel panic from network traffic

2015-07-24 Thread Hisashi T Fujinaka

On Fri, 24 Jul 2015, Andy Ruhl wrote:


On Thu, Jul 23, 2015 at 10:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote:

Being a moron, I plugged ports of my switch together. The big surprise
is two ports away is my -current box and it kept panicking.


Hello fellow moron :)

I have a general question:

I see some comments around unifying route caches, but in this
particular case it seems related to ipv6. Is this an ipv6 problem or a
general problem? I have a -current machine and it's not likely to
encounter this particular scenario (sorry, heh), but wondering anyway.


I'm not sure of what kind of flood of traffic was seen at the -current
box (tcpdump wasn't being helpful) but I kind of doubt it was IPv6 only.
But who knows? My guess is that there is IPv6 routing traffic mixed in
with a whole lot of garbage from my switch.

--
Hisashi T Fujinaka - ht...@twofifty.com
BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee


Kernel panic from network traffic

2015-07-23 Thread Hisashi T Fujinaka

Being a moron, I plugged ports of my switch together. The big surprise
is two ports away is my -current box and it kept panicking.

This is all I got so far.

Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt  0 failed: file /usr/src/sys/net/route.c, line 418 
Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback...

Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c
Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f
Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5
Jul 23 21:46:11 mara /netbsd: rtcache_clear() at
netbsd:rtcache_clear+0x41
Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd
Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at
netbsd:in6_pcbdetach+0xcb
Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at
netbsd:udp6_detach_wrapper+0x3f
Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63
Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16
Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54
Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f
Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20
Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c
Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) ---
Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a:
Jul 23 21:46:11 mara /netbsd: cpu3: End traceback...

--
Hisashi T Fujinaka - ht...@twofifty.com
BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee


Kernel panic when entering ACPI sleep state S3

2015-04-15 Thread Calum MacRae
Hi all,

I'm running the latest snapshot from nyftp.netbsd.org (201504151050Z) on
a Thinkpad X120e.

I seem to be encountering some issues when attempting to enter ACPI
sleep state S3, using: sysctl -w hw.acpi.sleep.state=3

Upon invoking the above command, my system seems to attempt to sleep
(blanks the screen, the speakers click) but then halts and reboots.

Figured I'd report this, and would appreciate any input.

Please see the relevant output from /var/log/messages, and a dmesg
following this.

/var/log/messages:
--
Apr 15 19:47:40 bmo /netbsd: acpi0: entering state S3
Apr 15 19:48:29 bmo syslogd[684]: restart
Apr 15 19:48:29 bmo /netbsd: panic: kernel diagnostic assertion 
(bo-mem.bus.base  (PAGE_SIZE - 1)) == 0 failed: file 
/home/source/ab/HEAD/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c, line 
1618 bo bus base addr not page-aligned: fe82125c69b0
Apr 15 19:48:29 bmo /netbsd: cpu0: Begin traceback...
Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x13c
Apr 15 19:48:29 bmo /netbsd: kern_assert() at netbsd:kern_assert+0x4f
Apr 15 19:48:29 bmo /netbsd: ttm_bo_unmap_virtual_locked() at 
netbsd:ttm_bo_unmap_virtual_locked+0x17b
Apr 15 19:48:29 bmo /netbsd: ttm_bo_handle_move_mem() at 
netbsd:ttm_bo_handle_move_mem+0x22f
Apr 15 19:48:29 bmo /netbsd: ttm_mem_evict_first() at 
netbsd:ttm_mem_evict_first+0x4e0
Apr 15 19:48:29 bmo /netbsd: ttm_bo_force_list_clean() at 
netbsd:ttm_bo_force_list_clean+0x5a
Apr 15 19:48:29 bmo /netbsd: radeon_suspend_kms() at 
netbsd:radeon_suspend_kms+0x13f
Apr 15 19:48:29 bmo /netbsd: radeon_do_suspend() at 
netbsd:radeon_do_suspend+0x21
Apr 15 19:48:29 bmo /netbsd: device_pmf_driver_suspend() at 
netbsd:device_pmf_driver_suspend+0x35
Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend_locked() at 
netbsd:pmf_device_suspend_locked+0xe3
Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend() at 
netbsd:pmf_device_suspend+0x41
Apr 15 19:48:29 bmo /netbsd: pmf_system_suspend() at 
netbsd:pmf_system_suspend+0xc1
Apr 15 19:48:29 bmo /netbsd: acpi_enter_sleep_state() at 
netbsd:acpi_enter_sleep_state+0x115
Apr 15 19:48:29 bmo /netbsd: sysctl_hw_acpi_sleepstate() at 
netbsd:sysctl_hw_acpi_sleepstate+0xfe
Apr 15 19:48:29 bmo /netbsd: sysctl_dispatch() at netbsd:sysctl_dispatch+0xc4
Apr 15 19:48:29 bmo /netbsd: sys___sysctl() at netbsd:sys___sysctl+0xd0
Apr 15 19:48:29 bmo /netbsd: syscall() at netbsd:syscall+0x9c
Apr 15 19:48:29 bmo /netbsd: --- syscall (number 202) ---
Apr 15 19:48:29 bmo /netbsd: 7f7ff7501d3a:
Apr 15 19:48:29 bmo /netbsd: cpu0: End traceback...
Apr 15 19:48:29 bmo /netbsd: 
Apr 15 19:48:29 bmo /netbsd: dumping to dev 0,1 (offset=3496, size=1992749):
Apr 15 19:48:29 bmo /netbsd: dump Skipping crash dump on recursive panic
Apr 15 19:48:29 bmo /netbsd: panic: wddump: polled command has been queued
Apr 15 19:48:29 bmo /netbsd: cpu0: Begin traceback...
Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x13c
Apr 15 19:48:29 bmo /netbsd: snprintf() at netbsd:snprintf
Apr 15 19:48:29 bmo /netbsd: wddump() at netbsd:wddump+0x282
Apr 15 19:48:29 bmo /netbsd: dump_header_flush() at 
netbsd:dump_header_flush+0x4f
Apr 15 19:48:29 bmo /netbsd: dump_header_addbytes() at 
netbsd:dump_header_addbytes+0x46
Apr 15 19:48:29 bmo /netbsd: dump_header_addseg() at 
netbsd:dump_header_addseg+0x1e
Apr 15 19:48:29 bmo /netbsd: dump_seg_iter() at netbsd:dump_seg_iter+0xce
Apr 15 19:48:29 bmo /netbsd: cpu_dump() at netbsd:cpu_dump+0x6a
Apr 15 19:48:29 bmo /netbsd: dodumpsys() at netbsd:dodumpsys+0xfb
Apr 15 19:48:29 bmo /netbsd: dumpsys() at netbsd:dumpsys+0x1d
Apr 15 19:48:29 bmo /netbsd: vpanic() at netbsd:vpanic+0x145
Apr 15 19:48:29 bmo /netbsd: kern_assert() at netbsd:kern_assert+0x4f
Apr 15 19:48:29 bmo /netbsd: ttm_bo_unmap_virtual_locked() at 
netbsd:ttm_bo_unmap_virtual_locked+0x17b
Apr 15 19:48:29 bmo /netbsd: ttm_bo_handle_move_mem() at 
netbsd:ttm_bo_handle_move_mem+0x22f
Apr 15 19:48:29 bmo /netbsd: ttm_mem_evict_first() at 
netbsd:ttm_mem_evict_first+0x4e0
Apr 15 19:48:29 bmo /netbsd: ttm_bo_force_list_clean() at 
netbsd:ttm_bo_force_list_clean+0x5a
Apr 15 19:48:29 bmo /netbsd: radeon_suspend_kms() at 
netbsd:radeon_suspend_kms+0x13f
Apr 15 19:48:29 bmo /netbsd: radeon_do_suspend() at 
netbsd:radeon_do_suspend+0x21
Apr 15 19:48:29 bmo /netbsd: device_pmf_driver_suspend() at 
netbsd:device_pmf_driver_suspend+0x35
Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend_locked() at 
netbsd:pmf_device_suspend_locked+0xe3
Apr 15 19:48:29 bmo /netbsd: pmf_device_suspend() at 
netbsd:pmf_device_suspend+0x41
Apr 15 19:48:29 bmo /netbsd: pmf_system_suspend() at 
netbsd:pmf_system_suspend+0xc1
Apr 15 19:48:29 bmo /netbsd: acpi_enter_sleep_state() at 
netbsd:acpi_enter_sleep_state+0x115
Apr 15 19:48:29 bmo /netbsd: sysctl_hw_acpi_sleepstate() at 
netbsd:sysctl_hw_acpi_sleepstate+0xfe
Apr 15 19:48:29 bmo /netbsd: sysctl_dispatch() at netbsd:sysctl_dispatch+0xc4
Apr 15 19:48:29 bmo /netbsd: sys___sysctl() at netbsd:sys___sysctl+0xd0
Apr 15 19:48:29 bmo /netbsd: syscall() at 

Re: kernel panic: uvm_fault

2015-01-26 Thread Thomas Klausner
On Mon, Dec 22, 2014 at 03:56:43PM +, Robert Swindells wrote:
 
 Thomas Klausner wrote:
 On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote:
  
  Thomas Klausner wrote:
  I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The
  last activity I had started was downloading a file from network to an
  NFS directory mounted from a Synology.
 
  It looks the same as the panic you had back in September to me:
  
  http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html
  
  Can you turn on HW checksumming on this machine ?
 
 I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum):
 
 wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
  capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
  capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
  capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
  enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
  enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
  ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
  ec_enabled=0
 ...
 
 I had in the back of my memory that hardware checksumming was usually
 the cause of bugs, not when it's turned off. Am I misremembering?
 
 Depends on the network controller, wm works well for me with
 everything enabled.
 
 wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
 capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
 capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
 enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
 enabled=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
 enabled=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
 ec_enabled=0
 
 I have had similar crashes to you when using sw checksumming on
 amd64, never seen it on i386 or arm.
 
 There was also this:
 
 http://mail-index.netbsd.org/port-sparc64/2014/11/29/msg002298.html
 
 I guess we need to add some more KASSERT() checks.

Ok, the Synology installed an opsys update again last night, and a
couple minutes ago I tried writing to a still-mounted file system from
it. And got a panic. From dmesg:

192.168.1.2:/volume1/roms: re-enabling wcc
192.168.1.2:/volume1/video: re-enabling wcc
panic: _bus_virt_to_bus
cpu1: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
_bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf
bus_dmamap_load_mbuf() at netbsd:bus_dmamap_load_mbuf+0xf0
wm_nq_start() at netbsd:wm_nq_start+0x1c5
ifq_enqueue() at netbsd:ifq_enqueue+0xae
ether_output() at netbsd:ether_output+0x579
ip_output() at netbsd:ip_output+0xdeb
tcp_output() at netbsd:tcp_output+0x15cf
tcp_send_wrapper() at netbsd:tcp_send_wrapper+0xa2
sosend() at netbsd:sosend+0x712
nfs_send() at netbsd:nfs_send+0x8e
nfs_request() at netbsd:nfs_request+0x39d
nfs_writerpc() at netbsd:nfs_writerpc+0x3b0
nfs_doio() at netbsd:nfs_doio+0x250
nfssvc_iod() at netbsd:nfssvc_iod+0x1a1
cpu1: End traceback...


I had and still have all the checksum options turned on as you
suggested.

wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=3ff00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
enabled=3ff00UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx
enabled=3ff00UDP6CSUM_Rx,UDP6CSUM_Tx
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
 
At least the backtrace looks nfs related this time :)
 Thomas


Re: kernel panic: uvm_fault

2015-01-26 Thread Thomas Klausner
On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote:
 alloc_bouncebus?  On amd64?  I think you've got a trashed pointer
 somewhere.

I have
makeoptions   DEBUG=-g  # compile full symbol table

# grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/
Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches

 Thomas

 On Mon, Jan 26, 2015 at 02:47:55PM +0100, Thomas Klausner wrote:
  On Mon, Dec 22, 2014 at 03:56:43PM +, Robert Swindells wrote:
   
   Thomas Klausner wrote:
   On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote:

Thomas Klausner wrote:
I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The
last activity I had started was downloading a file from network to an
NFS directory mounted from a Synology.
   
It looks the same as the panic you had back in September to me:

http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html

Can you turn on HW checksumming on this machine ?
   
   I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum):
   
   wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
   ...
   
   I had in the back of my memory that hardware checksumming was usually
   the cause of bugs, not when it's turned off. Am I misremembering?
   
   Depends on the network controller, wm works well for me with
   everything enabled.
   
   wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
   
   capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
   capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
   enabled=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
   enabled=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
   enabled=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
   ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
   ec_enabled=0
   
   I have had similar crashes to you when using sw checksumming on
   amd64, never seen it on i386 or arm.
   
   There was also this:
   
   http://mail-index.netbsd.org/port-sparc64/2014/11/29/msg002298.html
   
   I guess we need to add some more KASSERT() checks.
  
  Ok, the Synology installed an opsys update again last night, and a
  couple minutes ago I tried writing to a still-mounted file system from
  it. And got a panic. From dmesg:
  
  192.168.1.2:/volume1/roms: re-enabling wcc
  192.168.1.2:/volume1/video: re-enabling wcc
  panic: _bus_virt_to_bus
  cpu1: Begin traceback...
  vpanic() at netbsd:vpanic+0x13c
  snprintf() at netbsd:snprintf
  _bus_dma_alloc_bouncebuf() at netbsd:_bus_dma_alloc_bouncebuf
  bus_dmamap_load_mbuf() at netbsd:bus_dmamap_load_mbuf+0xf0
  wm_nq_start() at netbsd:wm_nq_start+0x1c5
  ifq_enqueue() at netbsd:ifq_enqueue+0xae
  ether_output() at netbsd:ether_output+0x579
  ip_output() at netbsd:ip_output+0xdeb
  tcp_output() at netbsd:tcp_output+0x15cf
  tcp_send_wrapper() at netbsd:tcp_send_wrapper+0xa2
  sosend() at netbsd:sosend+0x712
  nfs_send() at netbsd:nfs_send+0x8e
  nfs_request() at netbsd:nfs_request+0x39d
  nfs_writerpc() at netbsd:nfs_writerpc+0x3b0
  nfs_doio() at netbsd:nfs_doio+0x250
  nfssvc_iod() at netbsd:nfssvc_iod+0x1a1
  cpu1: End traceback...
  
  
  I had and still have all the checksum options turned on as you
  suggested.
  
  wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
  capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
  capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
  capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
  enabled=3ff00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
  enabled=3ff00UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx
  enabled=3ff00UDP6CSUM_Rx,UDP6CSUM_Tx
  ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
  ec_enabled=0
   
  At least the backtrace looks nfs related this time :)
   Thomas
 
 -- 
  Thor Lancelot Simont...@panix.com
 From the tooth paste you use in the morning to the salt on your evening meal,
 it's easy to take for granted the many products brought to us with 
 explosives.
 - Institute of Manufacturers of Explosives, Explosives Make It Possible 
 


Re: kernel panic: uvm_fault

2015-01-26 Thread Joerg Sonnenberger
On Mon, Jan 26, 2015 at 01:07:40PM -0500, Thor Lancelot Simon wrote:
 On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote:
  On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote:
   alloc_bouncebus?  On amd64?  I think you've got a trashed pointer
   somewhere.
  
  I have
  makeoptions   DEBUG=-g  # compile full symbol table
  
  # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/
  Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches
 
 Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform?

Because the device doesn't support 64bit DMA?

Joerg


Re: kernel panic: uvm_fault

2015-01-26 Thread Thor Lancelot Simon
On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote:
 On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote:
  alloc_bouncebus?  On amd64?  I think you've got a trashed pointer
  somewhere.
 
 I have
 makeoptions   DEBUG=-g  # compile full symbol table
 
 # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/
 Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o matches

Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform?

Thor


Re: kernel panic: uvm_fault

2015-01-26 Thread Robert Swindells

Thor Lancelot Simon wrote:
On Mon, Jan 26, 2015 at 08:03:26PM +0100, Joerg Sonnenberger wrote:
 On Mon, Jan 26, 2015 at 01:07:40PM -0500, Thor Lancelot Simon wrote:
  On Mon, Jan 26, 2015 at 04:28:22PM +0100, Thomas Klausner wrote:
   On Mon, Jan 26, 2015 at 10:01:41AM -0500, Thor Lancelot Simon wrote:
alloc_bouncebus?  On amd64?  I think you've got a trashed pointer
somewhere.
   
   I have
   makeoptions   DEBUG=-g  # compile full symbol table
   
   # grep -r _bus_dma_alloc_bouncebuf /usr/src/sys/arch/amd64/
   Binary file /usr/src/sys/arch/amd64/compile/obj/KERNELNAME/bus_dma.o 
   matches
  
  Yes, but why would you be trying to bounce PCI DMA on a 64-bit platform?
 
 Because the device doesn't support 64bit DMA?

That doesn't sound right for this device.

If there is an error in the normal route through bus_dmamap_load_mbuf
then it tries to use a bounce buffer. The code is shared with i386.

It looks to me as if it could be triggered by a mbuf with an invalid
size.



Re: kernel panic: uvm_fault

2014-12-25 Thread Thomas Klausner
On Mon, Dec 22, 2014 at 03:49:20PM +0100, Thomas Klausner wrote:
 I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The
 last activity I had started was downloading a file from network to an
 NFS directory mounted from a Synology.

I just saw that the Synology had installed an operating system upgrade
again (on its own) on this day. So I guess this is related to the NFS
mount I have from Synology (as server) to NetBSD (as client).

The mount flags from my /etc/fstab are currently
intr,nodev,nosuid,rw,tcp.
 Thomas


kernel panic: uvm_fault

2014-12-22 Thread Thomas Klausner
Hi!

I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The
last activity I had started was downloading a file from network to an
NFS directory mounted from a Synology.

I don't know if that's related, but there was no particular load on
the machine.

From dmesg after reboot:
uvm_fault(0x811cf2c0, 0x8003393b8000, 1) - e
fatal page fault in supervisor mode
trap type 6 code 0 rip 8028b965 cs 8 rflags 10202 cr2 8003393b8000 
ilevel 4 rsp fe813bcb8728
curlwp 0xfe8825ee1220 pid 0.143 lowest kstack 0xfe813bcb52c0
panic: trap
cpu8: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
cpu8: End traceback...


Kernel backtrace:

(gdb) bt
#0  0x80677a85 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x808ccb54 in vpanic (fmt=fmt@entry=0x80ddb08d trap, 
ap=ap@entry=0xfe813bcb8510) at /archive/foreign/src/sys/kern/subr_prf.c:340
#2  0x808ccc0f in panic (fmt=fmt@entry=0x80ddb08d trap) at 
/archive/foreign/src/sys/kern/subr_prf.c:256
#3  0x8091bd87 in trap (frame=0xfe813bcb8630) at 
/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
#4  0x8010108e in alltraps ()
#5  0x8028b965 in .Mmbuf_inner_loop ()
#6  0xfe8349294000 in ?? ()
#7  0xfe813bcb8758 in ?? ()
#8  0x8058733e in in_delayed_cksum (m=0x8003393b8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:793
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

Is this a valid backtrace? Does it give any useful hints?
 Thomas


Re: kernel panic: uvm_fault

2014-12-22 Thread Thomas Klausner
On Mon, Dec 22, 2014 at 03:29:58PM +, Robert Swindells wrote:
 
 Thomas Klausner wrote:
 I had a kernel panic today with a amd64/7.99.3 kernel from Dec 20. The
 last activity I had started was downloading a file from network to an
 NFS directory mounted from a Synology.
 
 last activity I had started was downloading a file from network to an
 NFS directory mounted from a Synology.
 
 [snip]
 
 #1  0x808ccb54 in vpanic (fmt=fmt@entry=0x80ddb08d trap, 
 ap=ap@entry=0xfe813bcb8510) at 
 /archive/foreign/src/sys/kern/subr_prf.c:340
 #2  0x808ccc0f in panic (fmt=fmt@entry=0x80ddb08d trap) at 
 /archive/foreign/src/sys/kern/subr_prf.c:256
 #3  0x8091bd87 in trap (frame=0xfe813bcb8630) at 
 /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
 #4  0x8010108e in alltraps ()
 #5  0x8028b965 in .Mmbuf_inner_loop ()
 #6  0xfe8349294000 in ?? ()
 #7  0xfe813bcb8758 in ?? ()
 #8  0x8058733e in in_delayed_cksum (m=0x8003393b8000) at 
 /archive/foreign/src/sys/netinet/ip_output.c:793
 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
 (gdb)
 
 Is this a valid backtrace? Does it give any useful hints?
 
 It looks the same as the panic you had back in September to me:
 
 http://mail-index.netbsd.org/current-users/2014/09/13/msg025777.html
 
 Can you turn on HW checksumming on this machine ?

I've just done that (ifconfig wm0 tcp4csum ip4csum udp4csum):

wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=3f00IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx
enabled=3f00UDP4CSUM_Rx,UDP4CSUM_Tx
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
...

I had in the back of my memory that hardware checksumming was usually
the cause of bugs, not when it's turned off. Am I misremembering?
 Thomas


Re: kernel panic on a cold start amd64.(msk0 and bridge problem).

2014-07-15 Thread Ilia Zykov
On 27.05.2014 16:09, Ilia Zykov wrote:
 Now I can reproduce it persistent.
 Kernel panic on a network bridge with a msk interface hasn't connection.
 
 Do I need open a new bug? Or it can be fixed easy?
 The main reason is:
 msk0: watchdog timeout
 from source:
 
 void
 msk_watchdog(struct ifnet *ifp)
 {
 [...]
   /* XXX Resets both ports; we shouldn't do that. */
   msk_reset(sc_if-sk_softc);
   msk_init(ifp);
 [...]
 }
 
Hello.
Now it's happening if the msk0(Marvell Yukon 88E8056 (ethernet network, 
revision 0x12)) 
inside the bridge and hasn't a patch cord connection.
Maybe I had configured the bridge wrong?

NetBSD 6.99.47 NetBSD 6.99.47 (GENERIC.201407150020Z)

cat /etc/ifconfig.bridge0 
 create
 !brconfig $int add msk0 add re0 up
cat /etc/ifconfig.msk0
 up
cat /etc/ifconfig.re0
 up

Dhcpcd works on the re0 interface normal.
cat /etc/rc.conf
 ...
 dhcpcd=YES
 dhcpcd_flags=-4
 ...
cat /etc/dhcpcd.conf
 ...
 allowinterfaces re*


crash -M work/core
Crash version 6.99.47, image version 6.99.47.
System panicked: kernel diagnostic assertion (!cpu_intr_p()  
!cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != 
NULL) failed: file /home/source/ab/HEAD/src/sys/kern/subr_pool.c, line 2211 
pool 'vmmpepl' is IPL_NONE, but called from interrup
Backtrace from time of crash is available.
crash bt
_KERNEL_OPT_NARCNET() at 0
?() at fe822fbc8510
vpanic() at vpanic+0x145
kern_assert() at kern_assert+0x4f
pool_cache_get_paddr() at pool_cache_get_paddr+0x12c
uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20
uvm_map_clip_start() at uvm_map_clip_start+0x1b
uvm_unmap_remove() at uvm_unmap_remove+0x2fe
uvm_unmap1() at uvm_unmap1+0x35
_bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41
msk_stop() at msk_stop+0x3f4
msk_init() at msk_init+0x35
if_slowtimo() at if_slowtimo+0x46
callout_softclock() at callout_softclock+0x392
softint_dispatch() at softint_dispatch+0xd3
DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e930ff0
Xsoftintr() at Xsoftintr+0x4f
--- interrupt ---
0:

Ilia.



Re: kernel panic on a cold start amd64.

2014-05-27 Thread Ilia Zykov
Now I can reproduce it persistent.
Kernel panic on a network bridge with a msk interface hasn't connection.

Do I need open a new bug? Or it can be fixed easy?
The main reason is:
msk0: watchdog timeout
from source:

void
msk_watchdog(struct ifnet *ifp)
{
[...]
/* XXX Resets both ports; we shouldn't do that. */
msk_reset(sc_if-sk_softc);
msk_init(ifp);
[...]
}


From /var/log/messages:
May 27 13:59:19 bmoy /netbsd: msk0: watchdog timeout
May 27 13:59:19 bmoy /netbsd: panic: kernel diagnostic assertion 
(!cpu_intr_p()  !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || 
cold || panicstr != NULL) failed: file 
/home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 2210 pool 'vmmpepl' is 
IPL_NONE, but called from interrup
May 27 13:59:19 bmoy /netbsd: cpu0: Begin traceback...
May 27 13:59:19 bmoy /netbsd: vpanic() at netbsd:vpanic+0x13c
May 27 13:59:19 bmoy /netbsd: kern_assert() at netbsd:kern_assert+0x4f
May 27 13:59:19 bmoy /netbsd: pool_cache_get_paddr() at 
netbsd:pool_cache_get_paddr+0x12c
May 27 13:59:19 bmoy /netbsd: uvm_mapent_alloc.isra.2() at 
netbsd:uvm_mapent_alloc.isra.2+0x20
May 27 13:59:19 bmoy /netbsd: uvm_map_clip_start() at 
netbsd:uvm_map_clip_start+0x1b
May 27 13:59:19 bmoy /netbsd: uvm_unmap_remove() at 
netbsd:uvm_unmap_remove+0x2fe
May 27 13:59:19 bmoy /netbsd: uvm_unmap1() at netbsd:uvm_unmap1+0x35
May 27 13:59:19 bmoy /netbsd: _bus_dmamap_destroy.isra.11() at 
netbsd:_bus_dmamap_destroy.isra.11+0x41
May 27 13:59:19 bmoy /netbsd: msk_stop() at netbsd:msk_stop+0x3f4
May 27 13:59:19 bmoy /netbsd: msk_init() at netbsd:msk_init+0x35
May 27 13:59:19 bmoy /netbsd: if_slowtimo() at netbsd:if_slowtimo+0x46
May 27 13:59:19 bmoy /netbsd: callout_softclock() at 
netbsd:callout_softclock+0x392
May 27 13:59:19 bmoy /netbsd: softint_dispatch() at netbsd:softint_dispatch+0xd3
May 27 13:59:19 bmoy /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xfe810e932ff0
May 27 13:59:19 bmoy /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f
May 27 13:59:19 bmoy /netbsd: --- interrupt ---
May 27 13:59:19 bmoy /netbsd: 0:
May 27 13:59:19 bmoy /netbsd: cpu0: End traceback...
May 27 13:59:19 bmoy /netbsd:
May 27 13:59:19 bmoy /netbsd: dumping to dev 0,1 (offset=4197039, size=2096926):
May 27 13:59:19 bmoy /netbsd: dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005,
May 27 13:59:19 bmoy /netbsd:2006, 2007, 2008, 2009, 2010, 2011, 2012, 
2013, 2014
May 27 13:59:19 bmoy /netbsd:The NetBSD Foundation, Inc.  All rights 
reserved.
May 27 13:59:19 bmoy /netbsd: Copyright (c) 1982, 1986, 1989, 1991, 1993
May 27 13:59:19 bmoy /netbsd:The Regents of the University of California.  
All rights reserved.
May 27 13:59:19 bmoy /netbsd:




On 25.05.2014 23:55, Ilia Zykov wrote:
 But it is five years old the testing machine and can has hardware degradation.
 
 Crash version 6.99.42, image version 6.99.42.
 System panicked: kernel diagnostic assertion (!cpu_intr_p()  
 !cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != 
 NULL) failed: file /home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 
 2210 pool 'vmmpepl' is IPL_NONE, but called from interrup
 Backtrace from time of crash is available.
 crash bt
 _KERNEL_OPT_NARCNET() at 0
 ?() at fe822c8c4cf8
 vpanic() at vpanic+0x145
 kern_assert() at kern_assert+0x4f
 pool_cache_get_paddr() at pool_cache_get_paddr+0x12c
 uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20
 uvm_map_clip_start() at uvm_map_clip_start+0x1b
 uvm_unmap_remove() at uvm_unmap_remove+0x2fe
 uvm_unmap1() at uvm_unmap1+0x35
 _bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41
 msk_stop() at msk_stop+0x3f4
 msk_init() at msk_init+0x35
 if_slowtimo() at if_slowtimo+0x46
 callout_softclock() at callout_softclock+0x392
 softint_dispatch() at softint_dispatch+0xd3
 DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e932ff0
 Xsoftintr() at Xsoftintr+0x4f
 --- interrupt ---
 0:
 crash exit
 
 6.99.42 NetBSD 6.99.42 (GENERIC) #0: Thu May 22 20:16:12 UTC 2014  
 bui...@b2.netbsd.org:/home/builds/ab/HEAD/amd64/201405221850Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC
  amd64
 
 Ilia.
 



kernel panic on a cold start amd64.

2014-05-25 Thread Ilia Zykov
But it is five years old the testing machine and can has hardware degradation.

Crash version 6.99.42, image version 6.99.42.
System panicked: kernel diagnostic assertion (!cpu_intr_p()  
!cpu_softintr_p()) || (pc-pc_pool.pr_ipl != IPL_NONE || cold || panicstr != 
NULL) failed: file /home/builds/ab/HEAD/src/sys/kern/subr_pool.c, line 2210 
pool 'vmmpepl' is IPL_NONE, but called from interrup
Backtrace from time of crash is available.
crash bt
_KERNEL_OPT_NARCNET() at 0
?() at fe822c8c4cf8
vpanic() at vpanic+0x145
kern_assert() at kern_assert+0x4f
pool_cache_get_paddr() at pool_cache_get_paddr+0x12c
uvm_mapent_alloc.isra.2() at uvm_mapent_alloc.isra.2+0x20
uvm_map_clip_start() at uvm_map_clip_start+0x1b
uvm_unmap_remove() at uvm_unmap_remove+0x2fe
uvm_unmap1() at uvm_unmap1+0x35
_bus_dmamap_destroy.isra.11() at _bus_dmamap_destroy.isra.11+0x41
msk_stop() at msk_stop+0x3f4
msk_init() at msk_init+0x35
if_slowtimo() at if_slowtimo+0x46
callout_softclock() at callout_softclock+0x392
softint_dispatch() at softint_dispatch+0xd3
DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e932ff0
Xsoftintr() at Xsoftintr+0x4f
--- interrupt ---
0:
crash exit

6.99.42 NetBSD 6.99.42 (GENERIC) #0: Thu May 22 20:16:12 UTC 2014  
bui...@b2.netbsd.org:/home/builds/ab/HEAD/amd64/201405221850Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC
 amd64

Ilia.


kernel panic in xhci on boot

2014-04-21 Thread Brett Lymn

On my desktop machine I am pretty much guaranteed a kernel panic when I
reboot from windows 8 into NetBSD, the traceback is:

softint_schedule()
usb_schedsoftintr()
xhci_intr1()

The assert in softint_schedule is firing because offset is 0, this is
passed in by usb_schedsoftintr, it should be the soft intr cookie.  It
looks to me like there is a race there that a softintr can try to be
queued before the softintr queue has been set up.  Usually, if I reboot
again I don't see the assertion and things work fine from that point on.
This is on an amd64 system with sources from a few days ago.

-- 
Brett Lymn


Re: Kernel panic when trying to mount non-existing file-system

2014-03-30 Thread Martin Husemann
On Sat, Mar 29, 2014 at 03:19:48PM -0700, Andy Ruhl wrote:
 I didn't get the memo either.

Sorry about the unusefull answer - of course the crash is not intended,
but I couldn't reproduce it at first try last night - might depend on
the architecture and concrete kernel (e.g. wether trying to load modules
fails) or something.

Adam, could you please file a PR?

Thanks,

Martin


Re: Kernel panic when trying to mount non-existing file-system

2014-03-29 Thread Martin Husemann
On Sat, Mar 29, 2014 at 08:54:36PM +0100, Adam Ciarci?ski wrote:
 Is that intentional?

Yes, didn't you get the memo?

Martin


Re: Kernel panic when trying to mount non-existing file-system

2014-03-29 Thread Adam Ciarciński
 Is that intentional?
 
 Yes, didn't you get the memo?
 
 Martin

No.
I guess the postman stole it again. 8-)

Adam


Re: Kernel panic when trying to mount non-existing file-system

2014-03-29 Thread Andy Ruhl
On Sat, Mar 29, 2014 at 1:01 PM, Adam Ciarciński a...@netbsd.org wrote:
 Is that intentional?

 Yes, didn't you get the memo?

 Martin

 No.
 I guess the postman stole it again. 8-)

I didn't get the memo either.

This is not true in a relatively recent build of 6 when trying to
mount NTFS and there was no kernel support.

I didn't see a recent PR.

I'm wondering what is going on because I'm considering putting current
on one of my machines.

Andy