After around 3-4 days of uptime I start getting watchdog timeouts in my logs - and eventually dhcpd stops responding to requests coming into the interface, and then connectivity drops.
I see this dying behaviour on my uplink (bge0) connection as well. Went to report this via sendbug while it was doing this, and got a kernel panic ;-/ Steps to reproduce: boot, provision network, and firewall config - bring up some services, send some traffic, wait a few days for message to appear, run sendbug. Crash. 2 Outcomes: Kernel panic/crash , or no network on interface (reboot solves problem). -- dmesg : console is /ebus@1f,464000/serial@2,80 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2010 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 4.8 (GENERIC) #86: Mon Aug 16 09:09:34 MDT 2010 dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC real mem = 1073741824 (1024MB) avail mem = 1044054016 (995MB) mainbus0 at root: Sun Fire V215 cpu0 at mainbus0: SUNW,UltraSPARC-IIIi (rev 3.4) @ 1504 MHz cpu0: physical 32K instruction (32 b/l), 64K data (32 b/l), 1024K external (64 b/l) "memory-controller" at mainbus0 not configured pyro0 at mainbus0: "Fire", rev 3, ign 780, bus A 2 to 13 pyro0: dvma map c0000000-ffffffff pci0 at pyro0 ppb0 at pci0 dev 0 function 0 "PLX PEX 8532" rev 0xbb pci1 at ppb0 bus 3 ppb1 at pci1 dev 1 function 0 "PLX PEX 8532" rev 0xbb pci2 at ppb1 bus 4 ppb2 at pci2 dev 0 function 0 "Acer Labs M5249 PCI-PCI" rev 0x00 pci3 at ppb2 bus 5 ohci0 at pci3 dev 28 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x780, version 1.0, legacy support ohci1 at pci3 dev 28 function 1 "Acer Labs M5237 USB" rev 0x03: ivec 0x780, version 1.0, legacy support ehci0 at pci3 dev 28 function 3 "Acer Labs M5239 USB2" rev 0x01: ivec 0x781 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Acer Labs EHCI root hub" rev 2.00/1.00 addr 1 ebus0 at pci3 dev 30 function 0 "Acer Labs M1575 ISA" rev 0x00 rtc0 at ebus0 addr 70-73: m5823 pciide0 at pci3 dev 31 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc8: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide0: using ivec 0x784 for native-PCI interrupt pciide0: channel 0 disabled (no drives) pciide0: channel 1 disabled (no drives) usb1 at ohci0: USB revision 1.0 uhub1 at usb1 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1 usb2 at ohci1: USB revision 1.0 uhub2 at usb2 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1 ppb3 at pci1 dev 2 function 0 "PLX PEX 8532" rev 0xbb: ivec 0x794 pci4 at ppb3 bus 6 ppb4 at pci1 dev 8 function 0 "PLX PEX 8532" rev 0xbb: ivec 0x794 pci5 at ppb4 bus 7 ppb5 at pci1 dev 9 function 0 "PLX PEX 8532" rev 0xbb pci6 at ppb5 bus 8 ppb6 at pci6 dev 0 function 0 "ServerWorks PCIE-PCIX" rev 0xb5 pci7 at ppb6 bus 9 bge0 at pci7 dev 4 function 0 "Broadcom BCM5714" rev 0xa3, BCM5715 A3 (0x9003): ivec 0x795, address 00:14:4f:b1:b4:62 brgphy0 at bge0 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0 bge1 at pci7 dev 4 function 1 "Broadcom BCM5714" rev 0xa3, BCM5715 A3 (0x9003): ivec 0x796, address 00:14:4f:b1:b4:63 brgphy1 at bge1 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0 ppb7 at pci7 dev 8 function 0 "ServerWorks HT-1000 PCIX" rev 0xb4 pci8 at ppb7 bus 10 ppb8 at pci1 dev 10 function 0 "PLX PEX 8532" rev 0xbb pci9 at ppb8 bus 11 ppb9 at pci9 dev 0 function 0 "ServerWorks PCIE-PCIX" rev 0xb5 pci10 at ppb9 bus 12 bge2 at pci10 dev 4 function 0 "Broadcom BCM5714" rev 0xa3, BCM5715 A3 (0x9003): ivec 0x796, address 00:14:4f:b1:b4:64 brgphy2 at bge2 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0 bge3 at pci10 dev 4 function 1 "Broadcom BCM5714" rev 0xa3, BCM5715 A3 (0x9003): ivec 0x797, address 00:14:4f:b1:b4:65 brgphy3 at bge3 phy 1: BCM5714 10/100/1000baseT/SX PHY, rev. 0 ppb10 at pci10 dev 8 function 0 "ServerWorks HT-1000 PCIX" rev 0xb4 pci11 at ppb10 bus 13 mpi0 at pci11 dev 1 function 0 "Symbios Logic SAS1064" rev 0x02: ivec 0x78f scsibus0 at mpi0: 63 targets sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST973402SSUN72G, 0603> SCSI3 0/direct fixed sd0: 70007MB, 512 bytes/sec, 143374738 sec total pyro1 at mainbus0: "Fire", rev 3, ign 7c0, bus B 2 to 255 pyro1: dvma map c0000000-ffffffff pci12 at pyro1 ebus1 at mainbus0: ign 7c0 "flashprom" at ebus1 addr 0-1fffff not configured com0 at ebus1 addr 80-87 ivec 0x8: ns16550a, 16 byte fifo com0: console com1 at ebus1 addr 40-47 ivec 0x9: ns16550a, 16 byte fifo "rmc-comm" at ebus1 addr 0-7 ivec 0xa not configured "gpio" at ebus1 addr c0-c0 not configured led0 at ebus1 addr 0-80: rev 0x5a power0 at ebus1 addr 40-c1 ivec 0x3 "i2c" at mainbus0 not configured softraid0 at root bootpath: /pci@1e,600000/pci@0,0/pci@a,0/pci@0,0/pci@8,0/scsi@1,0/disk@0,0 root on sd0a swap on sd0b dump on sd0b bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting bge3: watchdog timeout -- resetting --- ddb output: bash-4.1# cat /root/crash_history SC Alert: DHCP negotiation failed, perhaps misconfigured or no DHCP server available data error type 32 sfsr=808004 sfva=426890 afsr=10080005000000 afva=7f800200050 tf=0x4001115f410 panic: data fault: pc=1423ee4 addr=426890 sfsr=808004<TM,W> kdb breakpoint at 145bd40 Stopped at Debugger+0x4: nop RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb> ddb> trace data_access_error(4001115f410, 1423ee4, 7f800200050, 10080005000000, 426890, 80 8004) at data_access_error+0x214 trapbase_sun4v(40000f99000, 200000, 0, 14, 0, 80) at trapbase_sun4v+0x87a8 pci_conf_read(40000f99000, f009977c00020000, 58, 0, f800, 0) at pci_conf_read+0 x20 pciioctl(0, c0107002, 4001115fc78, 40000f9b900, 40009db1200, 40) at pciioctl+0x 440 spec_ioctl(4001115fa18, 0, 0, 58, 203f60, 20fa00) at spec_ioctl+0xb4 spec_vnoperate(4001115fa18, 0, 0, 0, 0, 0) at spec_vnoperate+0x24 VOP_IOCTL(40009ba68e0, c0107002, 4001115fc78, 1, 4000a0844b0, 40009db1200) at V OP_IOCTL+0x40 vn_ioctl(40009db8310, c0107002, 4001115fc78, 40009db1200, 0, 4200) at vn_ioctl+ 0x4c sys_ioctl(0, 4001115fdc0, 4001115fe00, fffffffffffed158, 0, 180d258) at sys_ioc tl+0x1b8 syscall(4001115fed0, 436, 20d76f008, 20d76f00c, 0, 80) at syscall+0x30c softtrap(4, c0107002, fffffffffffed3d0, 2dc81, fffffffffffed3dc, 0) at softtrap +0x19c D PPID PGRP UID S FLAGS WAIT COMMAND * 6256 2268 25248 0 7 0x4000 pcidump 2268 25248 25248 0 3 0x4080 pause sh 25248 24314 25248 0 3 0x4080 piperd sendbug 12333 1 12333 0 3 0x80 poll ntpd 26466 10985 26466 83 3 0x180 poll ntpd 10985 1 10985 83 3 0x180 poll ntpd 12773 26307 26307 0 3 0x80 piperd cron 18032 1 18032 0 3 0x80 poll ntpd 1005 13291 1005 83 3 0x180 poll ntpd 13291 1 13291 83 3 0x180 poll ntpd 24314 10843 24314 0 3 0x4080 wait bash 10843 28642 10843 0 3 0x4080 pause ksh 28642 27092 28642 1000 3 0x4080 wait bash 27092 24410 24410 1000 3 0x180 select sshd 24410 12595 24410 0 3 0x4180 netio sshd 11409 12947 11409 0 3 0x4080 ttyin ksh 251 1 251 0 3 0x40180 select sendmail 12947 1 12947 1000 3 0x4080 wait bash 26307 1 26307 0 3 0x80 select cron 25 1 25 0 3 0x180 select inetd 10236 1 10236 77 3 0x180 poll dhcpd 1 12595 0 3 0x80 select sshd 1808 15866 14172 83 3 0x180 poll ntpd 15866 14172 14172 83 3 0x180 poll ntpd 14172 1 14172 0 3 0x80 poll ntpd 5904 2076 2076 74 3 0x180 bpf pflogd 2076 1 2076 0 3 0x80 netio pflogd 15806 19796 19796 73 2 0x180 syslogd 19796 1 19796 0 3 0x88 netio syslogd 15 0 0 0 3 0x100200 bored crypto 14 0 0 0 3 0x100200 aiodoned aiodoned 13 0 0 0 3 0x100200 syncer update 12 0 0 0 3 0x100200 cleaner cleaner 11 0 0 0 3 0x100200 reaper reaper 10 0 0 0 3 0x100200 pgdaemon pagedaemon 9 0 0 0 3 0x100200 pftm pfpurge 8 0 0 0 3 0x100200 usbevt usb2 7 0 0 0 3 0x100200 usbevt usb1 6 0 0 0 3 0x100200 usbtsk usbtask 5 0 0 0 3 0x100200 usbevt usb0 4 0 0 0 3 0x100200 bored syswq 3 0 0 0 3 0x40100200 idle0 2 0 0 0 3 0x100200 kmalloc kmthread 1 0 1 0 3 0x4080 wait init 0 -1 0 0 3 0x80200 scheduler swapper 7606 12773 7606 0 5 0x6000 sh ddb>next bnic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 panic: kernel data fault: pc=145d3e0 addr=145a000 kdb breakpoint at 145bd40 RED State Exception Error enable reg: 0000.0001.00f0.001f CPU: 0000.0000.0000.0000 TL=0000.0000.0000.0005 TT=0000.0000.0000.0030 TPC=0000.0000.0101.0638 TnPC=0000.0000.0101.063c TSTATE=0000.0000.5800.0503 TL=0000.0000.0000.0004 TT=0000.0000.0000.0030 TPC=0000.0000.0101.0638 TnPC=0000.0000.0101.063c TSTATE=0000.0000.5800.0503 TL=0000.0000.0000.0003 TT=0000.0000.0000.0030 TPC=0000.0000.0101.0638 TnPC=0000.0000.0101.063c TSTATE=0000.0000.5800.0503 TL=0000.0000.0000.0002 TT=0000.0000.0000.0030 TPC=0000.0000.0101.0638 TnPC=0000.0000.0101.063c TSTATE=0000.0000.5800.0503 TL=0000.0000.0000.0001 TT=0000.0000.0000.0030 TPC=0000.0000.0115.2214 TnPC=0000.0000.0115.2218 TSTATE=0000.0099.80 SC Alert: Host System has Reset 00.0603 Probing system devices --- ifconfig: bash-4.1# ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33160 priority: 0 groups: lo inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6 bge0: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:14:4f:b1:b4:62 description: on-net kordia link priority: 0 groups: egress media: Ethernet autoselect (100baseTX half-duplex) status: active inet 124.XXX.XX.54 netmask 0xfffffffc broadcast 124.XXX.XX.55 inet6 fe80::214:4fff:feb1:b462%bge0 prefixlen 64 scopeid 0x1 bge1: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:14:4f:b1:b4:63 description: traffic dhcpd priority: 0 media: Ethernet autoselect (none) status: no carrier inet 10.1.0.1 netmask 0xffff0000 broadcast 10.1.255.255 inet6 fe80::214:4fff:feb1:b463%bge1 prefixlen 64 scopeid 0x2 inet6 fd9a:b7a6:d6db:d00d::1 prefixlen 64 bge2: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:14:4f:b1:b4:64 description: routable dmz priority: 0 media: Ethernet autoselect (none) status: no carrier inet 124.XXX.XX.129 netmask 0xffffffe0 broadcast 124.XXX.XX.159 inet6 fe80::214:4fff:feb1:b464%bge2 prefixlen 64 scopeid 0x3 bge3: flags=8a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:14:4f:b1:b4:65 description: management port dhcpd priority: 0 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 192.168.254.254 netmask 0xffffff00 broadcast 192.168.254.255 inet6 fe80::214:4fff:feb1:b465%bge3 prefixlen 64 scopeid 0x4 inet6 fd9a:b7a6:d6db:c001::1 prefixlen 64 enc0: flags=0<> priority: 0 groups: enc status: active gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280 priority: 0 groups: gif egress physical address inet 124.XXX.XX.129 --> 202.21.136.122 inet6 fe80::214:4fff:feb1:b462%gif0 -> prefixlen 64 scopeid 0x7 inet6 2001:4428:200:b6::2 -> 2001:4428:200:b6::1 prefixlen 128 pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33160 priority: 0 groups: pflog --- pfctl -sr: bash-4.1# pfctl -sr match out all scrub (random-id) match out on bge0 inet proto tcp from 192.168.254.0/24 to ! <tbl.r0> nat-to 124.XXX.XX.54 match out on bge0 inet proto udp from 192.168.254.0/24 to ! <tbl.r0> nat-to 124.XXX.XX.54 match out on bge0 inet proto icmp from 192.168.254.0/24 to ! <tbl.r0> nat-to 124.XXX.XX.54 pass in quick inet proto tcp from 192.168.254.0/24 to <tbl.r9998.d> port = ssh flags any keep state label "RULE 9998 -- ACCEPT " block drop in log quick on bge0 inet from <tbl.r0.s> to any label "RULE 0 -- DROP " pass quick on bge0 inet from 124.XXX.XX.52/30 to any flags S/SA keep state label "RULE 1 -- ACCEPT " pass log quick on bge0 inet from any to 124.XXX.XX.128/27 flags S/SA keep state label "RULE 2 -- ACCEPT " pass log quick on bge2 inet from 124.XXX.XX.128/27 to any flags S/SA keep state label "RULE 3 -- ACCEPT " pass log quick on lo0 inet from <tbl.r9998.d> to <tbl.r9998.d> flags S/SA keep state label "RULE 4 -- ACCEPT " pass out log quick on lo0 inet from <tbl.r9998.d> to 127.0.0.1 flags S/SA keep state label "RULE 4 -- ACCEPT " pass in log quick on lo0 inet from 127.0.0.1 to <tbl.r9998.d> flags S/SA keep state label "RULE 4 -- ACCEPT " pass log quick on lo0 inet from 127.0.0.1 to 127.0.0.1 flags S/SA keep state label "RULE 4 -- ACCEPT " pass log quick on gif0 inet all flags S/SA keep state label "RULE 5 -- ACCEPT " pass log quick on bge1 inet all flags S/SA keep state label "RULE 6 -- ACCEPT " pass log quick on bge3 inet all flags S/SA keep state label "RULE 6 -- ACCEPT " pass in log quick on bge0 inet proto icmp from any to <tbl.r9998.d> keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet proto tcp from any to <tbl.r9998.d> port = ssh flags any keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet proto tcp from any to <tbl.r9998.d> port = domain flags any keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet proto udp from any to <tbl.r9998.d> port = domain keep state label "RULE 7 -- ACCEPT " pass out log quick inet proto icmp from <tbl.r9998.d> to any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> port = ftp-data to any port >= 1024 flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = domain flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = www flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = https flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = ssh flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = ftp flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto tcp from <tbl.r9998.d> to any port = ftp-data flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto udp from <tbl.r9998.d> to any port = domain keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto udp from <tbl.r9998.d> to any port = ntp keep state label "RULE 8 -- ACCEPT " pass out log quick inet proto ipv6 from <tbl.r9998.d> to any keep state allow-opts label "RULE 8 -- ACCEPT " block drop in log quick inet from any to <tbl.r9998.d> label "RULE 9 -- DROP " block drop log quick inet all label "RULE 10 -- DROP " block drop quick inet all label "RULE 10000 -- DROP " block drop in log quick on bge0 inet6 from <tbl.r0.sx> to any label "RULE 0 -- DROP " pass log quick on lo0 inet6 from <tbl.r4.s> to <tbl.r4.s> flags S/SA keep state label "RULE 4 -- ACCEPT " pass out log quick on lo0 inet6 from <tbl.r4.s> to ::1 flags S/SA keep state label "RULE 4 -- ACCEPT " pass in log quick on lo0 inet6 from ::1 to <tbl.r4.s> flags S/SA keep state label "RULE 4 -- ACCEPT " pass log quick on lo0 inet6 from ::1 to ::1 flags S/SA keep state label "RULE 4 -- ACCEPT " pass log quick on gif0 inet6 all flags S/SA keep state label "RULE 5 -- ACCEPT " pass log quick on bge1 inet6 all flags S/SA keep state label "RULE 6 -- ACCEPT " pass log quick on bge3 inet6 all flags S/SA keep state label "RULE 6 -- ACCEPT " pass in log quick on bge0 inet6 proto tcp from any to <tbl.r4.s> port = ssh flags any keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet6 proto tcp from any to <tbl.r4.s> port = domain flags any keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet6 proto udp from any to <tbl.r4.s> port = domain keep state label "RULE 7 -- ACCEPT " pass in log quick on bge0 inet6 proto ipv6-icmp from any to <tbl.r4.s> keep state label "RULE 7 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> port = ftp-data to any port >= 1024 flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = domain flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = www flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = https flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = ssh flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = ftp flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto tcp from <tbl.r4.s> to any port = ftp-data flags any keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto udp from <tbl.r4.s> to any port = domain keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto udp from <tbl.r4.s> to any port = ntp keep state label "RULE 8 -- ACCEPT " pass out log quick inet6 proto ipv6 from <tbl.r4.s> to any keep state allow-opts label "RULE 8 -- ACCEPT " block drop in log quick inet6 from any to <tbl.r4.s> label "RULE 9 -- DROP " block drop log quick inet6 all label "RULE 10 -- DROP " block drop quick inet6 all label "RULE 10000 -- DROP "