I recently put a new server running 9.2 (with a local patches for NFS)
into production, and it's immediately started to fail in an odd way.
Since I pounded this server pretty heavily and never saw the error in
testing, I'm more than a little bit taken aback.  We have identical
hardware in production with 9.1, and I have the same kernel running
just peachy on a machine with Chelsio T4 NICs.  The problem machine has
ixgbe(4):

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 
0x9c00-0x9c1f mem 0xdef80000-0xdeffffff,0xdef7c000-0xdef7ffff irq 24 at device 
0.0 on pci2
ix0: Using MSIX interrupts with 7 vectors
ix0: Ethernet address: 04:7d:7b:a5:87:32
ix0: PCI Express Bus: Speed 5.0GT/s Width x4
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> port 
0x9880-0x989f mem 0xdee80000-0xdeefffff,0xdee7c000-0xdee7ffff irq 34 at device 
0.1 on pci2
ix1: Using MSIX interrupts with 7 vectors
ix1: Ethernet address: 04:7d:7b:a5:87:33
ix1: PCI Express Bus: Speed 5.0GT/s Width x4

(pciconf tells me these are "82599EB 10-Gigabit SFI/SFP+ Network
Connection".  It's a bug that the driver doesn't tell me that.)

These are glued together in a lagg(4) using LACP.

Since we put this server into production, random network system calls
have started failing with [EFBIG] or maybe sometimes [EIO].  I've
observed this with a simple ping, but various daemons also log the
errors:
Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too large 
[preauth]
Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL 
handshake. 5

The machine eventually becomes unreachable and has to be rebooted from
the console.

So, can anyone tell me how this is possible, and what changed between
9.1 and 9.2 to cause it?

-GAWollman
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to