uvmfault (7.99.1/amd64)

2014-09-13 Thread Thomas Klausner
My main machine suddenly hung last night and then rebooted. There was
no big load on it at that time. dmesg contains:

uvm_fault(0x810157c0, 0x8003393c8000, 1) - e
fatal page fault in supervisor mode
trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 
ilevel 4 rsp fe813d81d720
curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
panic: trap
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
cpu7: End traceback...

dumping to dev 168,3 (offset=8, size=8373576):
dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
(new kernel booting messages follow)

I did get a core dump, and I do have a kernel with symbols.
# gdb netbsd
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64--netbsd.
Type show configuration for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type help.
Type apropos word to search for commands related to word...
Reading symbols from netbsd...done.
(gdb) target  kvm netbsd.core
0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, 
ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340
#2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at 
/archive/foreign/src/sys/kern/subr_prf.c:256
#3  0x807fc037 in trap (frame=0xfe813d81d630) at 
/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
#4  0x8010108e in alltraps ()
#5  0x80264fc5 in .Mmbuf_inner_loop ()
#6  0xfe8692e23400 in ?? ()
#7  0xfe813d81d750 in ?? ()
#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:791
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This does not really look like useful information, does it?
 Thomas


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Robert Swindells

Thomas Klausner wrote:
My main machine suddenly hung last night and then rebooted. There was
no big load on it at that time. dmesg contains:

[snip]

#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:791
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This does not really look like useful information, does it?

Can you tell which protocol family you were using at the time ?

I was regularly getting a similar crash when using NFS over IPv6, this
was with a network controller that only offloads checksumming for IPv4,
the in_delayed_cksum() function is where the network stack does the
checksum in software.

I confess that the current way that I'm trying to fix it is by
switching to a network card with hardware checksumming for both IPv4
and IPv6.

Robert Swindells


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Nick Hudson

On 09/13/14 07:55, Thomas Klausner wrote:

My main machine suddenly hung last night and then rebooted. There was
no big load on it at that time. dmesg contains:

uvm_fault(0x810157c0, 0x8003393c8000, 1) - e
fatal page fault in supervisor mode
trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 
ilevel 4 rsp fe813d81d720
curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
panic: trap
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
cpu7: End traceback...

dumping to dev 168,3 (offset=8, size=8373576):
dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
(new kernel booting messages follow)

I did get a core dump, and I do have a kernel with symbols.
# gdb netbsd
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64--netbsd.
Type show configuration for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type help.
Type apropos word to search for commands related to word...
Reading symbols from netbsd...done.
(gdb) target  kvm netbsd.core
0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
671 dumpsys();
(gdb) bt
#0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, 
ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340
#2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at 
/archive/foreign/src/sys/kern/subr_prf.c:256
#3  0x807fc037 in trap (frame=0xfe813d81d630) at 
/archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
#4  0x8010108e in alltraps ()
#5  0x80264fc5 in .Mmbuf_inner_loop ()
#6  0xfe8692e23400 in ?? ()
#7  0xfe813d81d750 in ?? ()
#8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
/archive/foreign/src/sys/netinet/ip_output.c:791
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

This does not really look like useful information, does it?
  Thomas



Try crash(8). It does a better job of stack traces through traps.

NIck


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Thomas Klausner
On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote:
 #8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
 /archive/foreign/src/sys/netinet/ip_output.c:791
 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
 
 This does not really look like useful information, does it?
 
 Can you tell which protocol family you were using at the time ?

I'm nfs-mounting via wm0:
wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=0
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
address: ...
media: Ethernet autoselect (1000baseT 
full-duplex,flowcontrol,rxpause,txpause)
status: active
inet ...
inet6 ...

My /etc/fstab has IPv4 addresses for the NFS mounts, like this:

192.168.1.2:/volume1/music  /disk/music nfs 
intr,nodev,nosuid,rw,soft,tcp

So it should be IPv4 only.

 I was regularly getting a similar crash when using NFS over IPv6, this
 was with a network controller that only offloads checksumming for IPv4,
 the in_delayed_cksum() function is where the network stack does the
 checksum in software.
 
 I confess that the current way that I'm trying to fix it is by
 switching to a network card with hardware checksumming for both IPv4
 and IPv6.

From the capabilities cited above, my card already should do that, right?
 Thomas


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Thomas Klausner
On Sat, Sep 13, 2014 at 07:57:20AM +0100, Nick Hudson wrote:
 On 09/13/14 07:55, Thomas Klausner wrote:
 My main machine suddenly hung last night and then rebooted. There was
 no big load on it at that time. dmesg contains:
 
 uvm_fault(0x810157c0, 0x8003393c8000, 1) - e
 fatal page fault in supervisor mode
 trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 
 8003393c8000 ilevel 4 rsp fe813d81d720
 curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0
 panic: trap
 cpu7: Begin traceback...
 vpanic() at netbsd:vpanic+0x13c
 snprintf() at netbsd:snprintf
 startlwp() at netbsd:startlwp
 cpu7: End traceback...
 
 dumping to dev 168,3 (offset=8, size=8373576):
 dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 
 2005,
 (new kernel booting messages follow)
 
 I did get a core dump, and I do have a kernel with symbols.
 # gdb netbsd
 GNU gdb (GDB) 7.7.1
 Copyright (C) 2014 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type show copying
 and show warranty for details.
 This GDB was configured as x86_64--netbsd.
 Type show configuration for configuration details.
 For bug reporting instructions, please see:
 http://www.gnu.org/software/gdb/bugs/.
 Find the GDB manual and other documentation resources online at:
 http://www.gnu.org/software/gdb/documentation/.
 For help, type help.
 Type apropos word to search for commands related to word...
 Reading symbols from netbsd...done.
 (gdb) target  kvm netbsd.core
 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
 bootstr=bootstr@entry=0x0) at 
 /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
 671 dumpsys();
 (gdb) bt
 #0  0x805b6ac5 in cpu_reboot (howto=howto@entry=260, 
 bootstr=bootstr@entry=0x0) at 
 /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671
 #1  0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, 
 ap=ap@entry=0xfe813d81d510) at 
 /archive/foreign/src/sys/kern/subr_prf.c:340
 #2  0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at 
 /archive/foreign/src/sys/kern/subr_prf.c:256
 #3  0x807fc037 in trap (frame=0xfe813d81d630) at 
 /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298
 #4  0x8010108e in alltraps ()
 #5  0x80264fc5 in .Mmbuf_inner_loop ()
 #6  0xfe8692e23400 in ?? ()
 #7  0xfe813d81d750 in ?? ()
 #8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
 /archive/foreign/src/sys/netinet/ip_output.c:791
 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
 
 This does not really look like useful information, does it?
   Thomas
 
 
 Try crash(8). It does a better job of stack traces through traps.

# crash -M netbsd.core -N netbsd 
Crash version 7.99.1, image version 7.99.1.
System panicked: trap
Backtrace from time of crash is available.
crash bt
_KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
_KERNEL_OPT_IPFILTER_COMPAT() at _KERNEL_OPT_IPFILTER_COMPAT+0x3
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
crash 

That looks weird.
 Thomas


Re: uvmfault (7.99.1/amd64)

2014-09-13 Thread Robert Swindells

Thomas Klausner wrote:
On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote:
 #8  0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at 
 /archive/foreign/src/sys/netinet/ip_output.c:791
 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
 
 This does not really look like useful information, does it?
 
 Can you tell which protocol family you were using at the time ?

I'm nfs-mounting via wm0:
wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx
capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx
capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6
enabled=0
ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU
ec_enabled=0
address: ...
media: Ethernet autoselect (1000baseT 
 full-duplex,flowcontrol,rxpause,txpause)
status: active
inet ...
inet6 ...

I just added a wm card to my main system and it seems solid with all the
offload features turned on, even TSO.

Obviously it doesn't help with finding any problem in the kernel.

My /etc/fstab has IPv4 addresses for the NFS mounts, like this:

192.168.1.2:/volume1/music  /disk/music nfs 
intr,nodev,nosuid,rw,soft,tcp

So it should be IPv4 only.

And TCP, I was using UDP over IPv6.

A common factor is writing to NFS though.

 I was regularly getting a similar crash when using NFS over IPv6, this
 was with a network controller that only offloads checksumming for IPv4,
 the in_delayed_cksum() function is where the network stack does the
 checksum in software.
 
 I confess that the current way that I'm trying to fix it is by
 switching to a network card with hardware checksumming for both IPv4
 and IPv6.

From the capabilities cited above, my card already should do that, right?

No, the enabled=0 means they are all turned off.

To turn on the checksumming you can run:

# ifconfig wm0 ip4csum udp4csum tcp4csum udp6csum tcp6csum

Or put the options in you /etc/ifconfig.wm0 file.

Don't do this if you are using bridge(4) on this machine.

Robert Swindells