Have serious problems for over 7 weeks now with em driver,
specifically any rev of if_em.c >  1.305. Starting with rev 1.306,
released on 2015/09/30 and continuing to -current, watchdog timeouts
rue the day. Unfortunately rev 1.305 no longer builds with -current as
it appears the patch in rev 1.309 would be necessary.

System in question is a NAT firewall, also running Unbound and DHCPD.
Timeouts occur randomly and can affect both internal and external
interfaces. But use of a bittorrent app on an internal client system
will always trigger many such timeouts:
============================================
Nov 18 12:21:17 stargate /bsd: em0: watchdog timeout -- resetting
Nov 18 12:21:17 stargate /bsd: em1: watchdog timeout -- resetting
Nov 18 12:22:34 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:22:34 stargate unbound: [12687:1] notice: remote address is
172.27.12.11 port 55181
Nov 18 12:22:36 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:22:36 stargate unbound: [12687:1] notice: remote address is
172.27.12.253 port 54266
Nov 18 12:22:36 stargate unbound: [22477:0] notice: sendto failed: No buffer
space available
Nov 18 12:22:36 stargate unbound: [22477:0] notice: remote address is
172.27.12.253 port 53257
Nov 18 12:22:37 stargate /bsd: em0: watchdog timeout -- resetting
Nov 18 12:23:42 stargate /bsd: em0: watchdog timeout -- resetting
Nov 18 12:28:11 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:11 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 56045
Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 41975
Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 48603
Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 17834
Nov 18 12:28:13 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:13 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 1177
Nov 18 12:28:14 stargate unbound: [12687:1] notice: sendto failed: No buffer
space available
Nov 18 12:28:14 stargate unbound: [12687:1] notice: remote address is
172.27.12.66 port 39013
Nov 18 12:28:15 stargate /bsd: em0: watchdog timeout -- resetting
Nov 18 12:29:42 stargate /bsd: em0: watchdog timeout -- resetting
Nov 18 14:00:01 stargate syslogd: restart
Nov 18 16:00:01 stargate syslogd: restart
Nov 19 12:00:01 stargate syslogd: restart
Nov 19 16:00:01 stargate syslogd: restart
Nov 19 16:08:36 stargate /bsd: em0: watchdog timeout -- resetting
Nov 19 16:10:34 stargate /bsd: em0: watchdog timeout -- resetting
Nov 19 16:15:04 stargate /bsd: em0: watchdog timeout -- resetting
Nov 19 16:19:55 stargate last message repeated 3 times
============================================
(one of the above is on the external interface em1)

The timeouts don't just shutdown net access during the reset time,
other problems occur. Many time the SSH server no longer accepts
connections so shelling into the system is not an option:
============================================
$ ssh stargate
write: Connection reset by peer
============================================

I've also had a system crash that I suspect (no proof at all and
thankfully it hasn't re-occurred, but timing is everything) was caused
by the faulty em driver:
============================================
Nov  1 22:23:55 stargate /bsd: uvm_fault(0xffffffff818f9920,
0xfffffff7818adf60, 0, 1) -> e
Nov  1 22:23:55 stargate /bsd: fatal page fault in supervisor mode
Nov  1 22:23:55 stargate /bsd: trap type 6 code 0 rip ffffffff81329e69
cs 8 rflags 10286 cr2  fffffff7818adf60 cpl 7 rsp ffff8000221df76
0
Nov  1 22:23:55 stargate /bsd: panic: trap type 6, code=0, pc=ffffffff81329e69
Nov  1 22:23:55 stargate /bsd: Starting stack trace...
Nov  1 22:23:55 stargate /bsd: panic() at panic+0x10b
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x7b8
Nov  1 22:23:55 stargate /bsd: --- trap (number 6) ---
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
Nov  1 22:23:55 stargate /bsd: bpf_filter() at bpf_filter+0x19b
Nov  1 22:23:55 stargate /bsd: _bpf_mtap() at _bpf_mtap+0xf4
Nov  1 22:23:55 stargate /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39
Nov  1 22:23:55 stargate /bsd: em_start() at em_start+0xd6
Nov  1 22:23:55 stargate /bsd: nettxintr() at nettxintr+0x52
Nov  1 22:23:55 stargate /bsd: softintr_dispatch() at softintr_dispatch+0x8b
Nov  1 22:23:55 stargate /bsd: Xsoftnet() at Xsoftnet+0x1f
Nov  1 22:23:55 stargate /bsd: --- interrupt ---
Nov  1 22:23:55 stargate /bsd: end of kernel
Nov  1 22:23:55 stargate /bsd: end trace frame: 0xffffff8132b494, count: 246
Nov  1 22:23:55 stargate /bsd: 0x282:
Nov  1 22:23:55 stargate /bsd: End of stack trace.
============================================

Of course it is possible that myself and several others have faulty
hardware - that would make 5 firewalls for me plus the count from the
others that have raised this issue. However for years the Intel NICs
have been working great with OpenBSD for me. If there's better
hardware that's better supported I'd like to know about it.

Sorry for the added noise as indeed the problems have been reported
before, but I neglected to send them to bugs@ so that and the lack of
any resolution so far has prompted this post.

Dmesg follows:

# dmesg
OpenBSD 5.8-current (GENERIC.MP) #13: Thu Nov 12 10:36:17 EST 2015
    r...@stargate.grizzly.bear:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277665792 (4079MB)
avail mem = 4143894528 (3951MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (19 entries)
bios0: vendor American Megatrends Inc. version "1.2b" date 07/19/13
bios0: Supermicro X7SPA-HF
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4)
USB2(S4) USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4)
P0P4(S4) P0P5(S4) P0P6(S4) P0P7(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.26 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu0: 512KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, C-substates=0.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.00 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu1: 512KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 3 pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 1, remapped to apid 3
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 1 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P8)
acpiprt4 at acpi0: bus 3 (P0P9)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: PWRB
ipmi at mainbus0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 3 int 16
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 3 int 21
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 3 int 19
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 3 int 18
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb0 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: msi
pci1 at ppb0 bus 1
ppb1 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02: msi
pci2 at ppb1 bus 2
em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:92:d4:f8
ppb2 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02: msi
pci3 at ppb2 bus 3
em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:92:d4:f9
uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 3 int 23
uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 3 int 19
uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 3 int 18
ehci1 at pci0 dev 29 function 7 "Intel 82801I USB" rev 0x02: apic 3 int 23
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb3 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0x92
pci4 at ppb3 bus 4
vga1 at pci4 dev 4 function 0 "Matrox MGA G200eW" rev 0x0a
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 31 function 0 "Intel 82801IR LPC" rev 0x02
ahci0 at pci0 dev 31 function 2 "Intel 82801I AHCI" rev 0x02: msi, AHCI 1.2
ahci0: port 0: 3.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, INTEL SSDSA2M080, 2CV1> SCSI3
0/direct fixed naa.5001517959323666
sd0: 76319MB, 512 bytes/sector, 156301488 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 82801I SMBus" rev 0x02: apic 3 int 18
iic0 at ichiic0
lm1 at iic0 addr 0x2d: W83627DHG
spdmem0 at iic0 addr 0x50: 2GB DDR3 SDRAM PC3-10600 SO-DIMM
spdmem1 at iic0 addr 0x51: 2GB DDR3 SDRAM PC3-10600 SO-DIMM
usb2 at uhci0: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci1: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb4 at uhci2: USB revision 1.0
uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb5 at uhci3: USB revision 1.0
uhub5 at usb5 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb6 at uhci4: USB revision 1.0
uhub6 at usb6 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb7 at uhci5: USB revision 1.0
uhub7 at usb7 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
wbsio0 at isa0 port 0x2e/2: W83627DHG rev 0x25
uhidev0 at uhub4 port 2 configuration 1 interface 0 "Winbond
Electronics Corp Hermon USB hidmouse Device" rev 1.10/0.01 addr 2
uhidev0: iclass 3/1
ums0 at uhidev0: 3 buttons, Z dir
wsmouse0 at ums0 mux 0
uhidev1 at uhub4 port 2 configuration 1 interface 1 "Winbond
Electronics Corp Hermon USB hidmouse Device" rev 1.10/0.01 addr 2
uhidev1: iclass 3/1
ukbd0 at uhidev1: 8 variable keys, 6 key codes
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0
uhidev2 at uhub5 port 2 configuration 1 interface 0 "Logitech USB
Keyboard" rev 1.10/64.00 addr 2
uhidev2: iclass 3/1
ukbd1 at uhidev2: 8 variable keys, 6 key codes
wskbd2 at ukbd1 mux 1
wskbd2: connecting to wsdisplay0
uhidev3 at uhub5 port 2 configuration 1 interface 1 "Logitech USB
Keyboard" rev 1.10/64.00 addr 2
uhidev3: iclass 3/0, 3 report ids
uhid0 at uhidev3 reportid 1: input=1, output=0, feature=0
uhid1 at uhidev3 reportid 2: input=1, output=0, feature=0
uhid2 at uhidev3 reportid 3: input=3, output=0, feature=0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (67f54d511ac3222d.a) swap on sd0b dump on sd0b

Sorry, I'm not a dev or coder but will be happy to supply anything
additional information within my capabilities in order to get this
resolved.

Thank you,

Chris

Reply via email to