I have one machine that is seeing watchdog timeouts on em0, running 7-STABLE amd64 as of 2009.04.19, and also some other more perverse errors.
Twice now in the last 48 hours, this machine has become unreachable via the network, and connecting to the console shows an endless string of [...] em0: watchdog timeout -- resetting em0: watchdog timeout -- resetting em0: watchdog timeout -- resetting messages. The machine is almost locked up. That is, I can get a login prompt, but can go no further than typing in a username; after the username, no password prompt, and nothing further. The only option is to hard reset the machine or to drop to debugger and reboot. Now the "perverse" part. After restarting, the system partition is no more. Background detail: the machine is a fileserver, with a 3Ware 9650SE-16ML SATA controller, connected to 16 1TB SATA drives, this configured as a 14-drive RAID10 array (+ 2 hot spares), with a 50GB system partition and 6.5TB data partition. The system partition is configured as da1, with one slice and more or less standard partitions for / /var /tmp, etc. (the data partition of the array is sliced with gpt). The issue here is that, upon restart, all parition information on da0 seems to have disappeared, and restarting results in a "no operating system found" message, and a failure to boot (obviously). But all of the data is still present. If I boot into rescue mode, recreate da0s1, mark it bootable, and restore the bsdlabel, then everything works again. I can restart the machine, and it comes back up normally (it requires an fsck of everything on da0, but after that everything is back to normal). I don't know if this is two unrelated problems, or one problem with two symptoms, or something else. I think that I can safely say that it is not a problem with the 3Ware controller itself, as I replaced the controller with a spare (identical model), and the problem recurred. Additionally, I have an almost-identical configuration on four other machines, none of which are experiencing any problems. One thing that is different is that the other machines use Intel PRO/1000 PF (pci-e) NICs. Is there some known problem with the Intel 2572 fibre NIC? Or some potential interaction of it with the 3ware RAID controller? For the moment, I've set hw.pci.enable_msi=0 (as discussed in the threads on 7.2/bge), and am building a new kernel/world from sources csup'd one hour ago, but I'd really like to hear any ideas about this -- particularly the wiping of the label. Some information about the system: # /dev/da0s1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 2097152 0 4.2BSD 0 0 0 b: 8388608 2097152 swap c: 104856192 0 unused 0 0 # "raw" part, don't edit d: 8388608 10485760 4.2BSD 0 0 0 e: 2097152 18874368 4.2BSD 0 0 0 f: 41943040 20971520 4.2BSD 0 0 0 g: 41941632 62914560 4.2BSD 0 0 0 e...@pci0:4:1:0: class=0x020000 card=0x10038086 chip=0x10018086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation'thernet Controller (Fiber)' device = '2572 10/100/1000 Ethernet Controller (Fiber)' class = networktory, range 32, base 0xda000000, size 131072, enabled subclass = ethernetory, range 32, base 0xda000000, size 131072, enabled bar [10] = type Memory, range 32, base 0xda000000, size 131072, enabled bar [14] = type Memory, range 32, base 0xda020000, size 65536, enabled0x00 t...@pci0:9:0:0: class=0x010400 card=0x100413c1 chip=0x100413c1 rev=0x01 hdr=0x00 device = '9650SE Series PCI-Express SATA2 Raid Controller' class = mass storage subclass = RAID bar [10] = type Prefetchable Memory, range 64, base 0xd8000000, size 33554432, enabled bar [18] = type Memory, range 64, base 0xda300000, size 4096, enabled bar [20] = type I/O Port, range 32, base 0x3000, size 256, enabled cap 01[40] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 32 messages, 64 bit cap 10[70] = PCI-Express 1 legacy endpoint -- greg byshenk - gbysh...@byshenk.net - Leiden, NL _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"