uvmfault (7.99.1/amd64)
My main machine suddenly hung last night and then rebooted. There was no big load on it at that time. dmesg contains: uvm_fault(0x810157c0, 0x8003393c8000, 1) - e fatal page fault in supervisor mode trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 ilevel 4 rsp fe813d81d720 curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0 panic: trap cpu7: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp cpu7: End traceback... dumping to dev 168,3 (offset=8, size=8373576): dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, (new kernel booting messages follow) I did get a core dump, and I do have a kernel with symbols. # gdb netbsd GNU gdb (GDB) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64--netbsd. Type show configuration for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type help. Type apropos word to search for commands related to word... Reading symbols from netbsd...done. (gdb) target kvm netbsd.core 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 671 dumpsys(); (gdb) bt #0 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340 #2 0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at /archive/foreign/src/sys/kern/subr_prf.c:256 #3 0x807fc037 in trap (frame=0xfe813d81d630) at /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298 #4 0x8010108e in alltraps () #5 0x80264fc5 in .Mmbuf_inner_loop () #6 0xfe8692e23400 in ?? () #7 0xfe813d81d750 in ?? () #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Thomas
Re: uvmfault (7.99.1/amd64)
Thomas Klausner wrote: My main machine suddenly hung last night and then rebooted. There was no big load on it at that time. dmesg contains: [snip] #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Can you tell which protocol family you were using at the time ? I was regularly getting a similar crash when using NFS over IPv6, this was with a network controller that only offloads checksumming for IPv4, the in_delayed_cksum() function is where the network stack does the checksum in software. I confess that the current way that I'm trying to fix it is by switching to a network card with hardware checksumming for both IPv4 and IPv6. Robert Swindells
Re: uvmfault (7.99.1/amd64)
On 09/13/14 07:55, Thomas Klausner wrote: My main machine suddenly hung last night and then rebooted. There was no big load on it at that time. dmesg contains: uvm_fault(0x810157c0, 0x8003393c8000, 1) - e fatal page fault in supervisor mode trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 ilevel 4 rsp fe813d81d720 curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0 panic: trap cpu7: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp cpu7: End traceback... dumping to dev 168,3 (offset=8, size=8373576): dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, (new kernel booting messages follow) I did get a core dump, and I do have a kernel with symbols. # gdb netbsd GNU gdb (GDB) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64--netbsd. Type show configuration for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type help. Type apropos word to search for commands related to word... Reading symbols from netbsd...done. (gdb) target kvm netbsd.core 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 671 dumpsys(); (gdb) bt #0 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340 #2 0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at /archive/foreign/src/sys/kern/subr_prf.c:256 #3 0x807fc037 in trap (frame=0xfe813d81d630) at /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298 #4 0x8010108e in alltraps () #5 0x80264fc5 in .Mmbuf_inner_loop () #6 0xfe8692e23400 in ?? () #7 0xfe813d81d750 in ?? () #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Thomas Try crash(8). It does a better job of stack traces through traps. NIck
Re: uvmfault (7.99.1/amd64)
On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote: #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Can you tell which protocol family you were using at the time ? I'm nfs-mounting via wm0: wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=0 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 address: ... media: Ethernet autoselect (1000baseT full-duplex,flowcontrol,rxpause,txpause) status: active inet ... inet6 ... My /etc/fstab has IPv4 addresses for the NFS mounts, like this: 192.168.1.2:/volume1/music /disk/music nfs intr,nodev,nosuid,rw,soft,tcp So it should be IPv4 only. I was regularly getting a similar crash when using NFS over IPv6, this was with a network controller that only offloads checksumming for IPv4, the in_delayed_cksum() function is where the network stack does the checksum in software. I confess that the current way that I'm trying to fix it is by switching to a network card with hardware checksumming for both IPv4 and IPv6. From the capabilities cited above, my card already should do that, right? Thomas
Re: uvmfault (7.99.1/amd64)
On Sat, Sep 13, 2014 at 07:57:20AM +0100, Nick Hudson wrote: On 09/13/14 07:55, Thomas Klausner wrote: My main machine suddenly hung last night and then rebooted. There was no big load on it at that time. dmesg contains: uvm_fault(0x810157c0, 0x8003393c8000, 1) - e fatal page fault in supervisor mode trap type 6 code 0 rip 80264fc5 cs 8 rflags 10202 cr2 8003393c8000 ilevel 4 rsp fe813d81d720 curlwp 0xfe813dc10aa0 pid 0.143 lowest kstack 0xfe813d81a2c0 panic: trap cpu7: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp cpu7: End traceback... dumping to dev 168,3 (offset=8, size=8373576): dump Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, (new kernel booting messages follow) I did get a core dump, and I do have a kernel with symbols. # gdb netbsd GNU gdb (GDB) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64--netbsd. Type show configuration for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type help. Type apropos word to search for commands related to word... Reading symbols from netbsd...done. (gdb) target kvm netbsd.core 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 671 dumpsys(); (gdb) bt #0 0x805b6ac5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x807b0ae4 in vpanic (fmt=fmt@entry=0x80c51a95 trap, ap=ap@entry=0xfe813d81d510) at /archive/foreign/src/sys/kern/subr_prf.c:340 #2 0x807b0b9f in panic (fmt=fmt@entry=0x80c51a95 trap) at /archive/foreign/src/sys/kern/subr_prf.c:256 #3 0x807fc037 in trap (frame=0xfe813d81d630) at /archive/foreign/src/sys/arch/amd64/amd64/trap.c:298 #4 0x8010108e in alltraps () #5 0x80264fc5 in .Mmbuf_inner_loop () #6 0xfe8692e23400 in ?? () #7 0xfe813d81d750 in ?? () #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Thomas Try crash(8). It does a better job of stack traces through traps. # crash -M netbsd.core -N netbsd Crash version 7.99.1, image version 7.99.1. System panicked: trap Backtrace from time of crash is available. crash bt _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0 _KERNEL_OPT_IPFILTER_COMPAT() at _KERNEL_OPT_IPFILTER_COMPAT+0x3 vpanic() at vpanic+0x145 snprintf() at snprintf startlwp() at startlwp crash That looks weird. Thomas
Re: uvmfault (7.99.1/amd64)
Thomas Klausner wrote: On Sat, Sep 13, 2014 at 09:40:35AM +0100, Robert Swindells wrote: #8 0x804c3b5e in in_delayed_cksum (m=0x8003393c8000) at /archive/foreign/src/sys/netinet/ip_output.c:791 Backtrace stopped: previous frame inner to this frame (corrupt stack?) This does not really look like useful information, does it? Can you tell which protocol family you were using at the time ? I'm nfs-mounting via wm0: wm0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 capabilities=7ff80TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx capabilities=7ff80TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx capabilities=7ff80TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6 enabled=0 ec_capabilities=7VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU ec_enabled=0 address: ... media: Ethernet autoselect (1000baseT full-duplex,flowcontrol,rxpause,txpause) status: active inet ... inet6 ... I just added a wm card to my main system and it seems solid with all the offload features turned on, even TSO. Obviously it doesn't help with finding any problem in the kernel. My /etc/fstab has IPv4 addresses for the NFS mounts, like this: 192.168.1.2:/volume1/music /disk/music nfs intr,nodev,nosuid,rw,soft,tcp So it should be IPv4 only. And TCP, I was using UDP over IPv6. A common factor is writing to NFS though. I was regularly getting a similar crash when using NFS over IPv6, this was with a network controller that only offloads checksumming for IPv4, the in_delayed_cksum() function is where the network stack does the checksum in software. I confess that the current way that I'm trying to fix it is by switching to a network card with hardware checksumming for both IPv4 and IPv6. From the capabilities cited above, my card already should do that, right? No, the enabled=0 means they are all turned off. To turn on the checksumming you can run: # ifconfig wm0 ip4csum udp4csum tcp4csum udp6csum tcp6csum Or put the options in you /etc/ifconfig.wm0 file. Don't do this if you are using bridge(4) on this machine. Robert Swindells