Re: em0 watchdog timeouts on 8-STABLE
If needed, I can reproduce this on demand. Just need to know what sort of statistics are needed when the problem is occurring. I've had to turn off my weekly scrubs until I can figure out how to fix this problem. On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote: In the kernel. Here's my kernel configuration: http://pastebin.com/raw.php?i=4JL814m3 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote: I have hardware now, am working on reproducing this. Just curious, do you have the em driver defined in the kernel, or as a module? Jack On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote: On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net: /usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 GFX Dual Slot' class = bridge subclass = HOST-PCI pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port A)' class = bridge subclass = PCI-PCI pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = PCI-PCI pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port A)' class = bridge subclass = PCI-PCI pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port C)' class = bridge subclass = PCI-PCI pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port D)' class =
Re: em0 watchdog timeouts on 8-STABLE
I cannot repro this, I used your kernel config, this is on a Dell 1850 btw, I ran netperf stress from 3 clients, and have seen no watchdogs :( Jack On Tue, Jun 21, 2011 at 7:59 PM, Joshua Boyd boy...@jbip.net wrote: If needed, I can reproduce this on demand. Just need to know what sort of statistics are needed when the problem is occurring. I've had to turn off my weekly scrubs until I can figure out how to fix this problem. On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote: In the kernel. Here's my kernel configuration: http://pastebin.com/raw.php?i=4JL814m3 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote: I have hardware now, am working on reproducing this. Just curious, do you have the em driver defined in the kernel, or as a module? Jack On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote: On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net: /usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 GFX Dual Slot' class = bridge subclass = HOST-PCI pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port A)' class = bridge subclass = PCI-PCI pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = PCI-PCI pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port A)' class = bridge subclass = PCI-PCI pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port C)' class = bridge subclass = PCI-PCI
em0 watchdog timeouts on 8-STABLE
I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 -- Joshua Boyd JBipNet E-mail: boy...@jbip.net Cell: (513) 375-0157 http://www.jbip.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts on 8-STABLE
On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb # vmstat -i # sysctl -a | grep msi # dmesg -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts on 8-STABLE
On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 GFX Dual Slot' class = bridge subclass = HOST-PCI pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port A)' class = bridge subclass = PCI-PCI pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = PCI-PCI pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port A)' class = bridge subclass = PCI-PCI pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port C)' class = bridge subclass = PCI-PCI pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port D)' class = bridge subclass = PCI-PCI pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx1 port A)' class = bridge subclass = PCI-PCI atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'IXP SB600 Serial ATA Controller' class = mass storage subclass = ATA ohci0@pci0:0:19:0: class=0x0c0310 card=0x82881043 chip=0x43871002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'IXP SB600 USB Controller
Re: em0 watchdog timeouts on 8-STABLE
In the kernel. Here's my kernel configuration: http://pastebin.com/raw.php?i=4JL814m3 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote: I have hardware now, am working on reproducing this. Just curious, do you have the em driver defined in the kernel, or as a module? Jack On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote: On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net: /usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 GFX Dual Slot' class = bridge subclass = HOST-PCI pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port A)' class = bridge subclass = PCI-PCI pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = PCI-PCI pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port A)' class = bridge subclass = PCI-PCI pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port C)' class = bridge subclass = PCI-PCI pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port D)' class = bridge subclass = PCI-PCI pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx1 port A)' class = bridge subclass =
Re: em0 watchdog timeouts on 8-STABLE
I have hardware now, am working on reproducing this. Just curious, do you have the em driver defined in the kernel, or as a module? Jack On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote: On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick free...@jdc.parodius.comwrote: On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote: I recently updated my server to the latest 8-STABLE, and upgraded to v28 ZFS. I have not had these problems on any other version of 8-STABLE or 7-STABLE, which this box was upgraded from some time ago. Now, during my weekly scrub, I get the following messages and em0 is unresponsive: Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP My scrub is scheduled to start at 03:00:00, so it looks like watchdog timeouts start occurring pretty quickly once I/O ramps up. Here's some possibly relevant information, let me know if anything else would be helpful to troubleshoot. FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17: Mon Jun 6 19:40:19 EDT 2011 r...@foghornleghorn.res.openband.net: /usr/obj/usr/src/sys/FOGHORNLEGHORN amd64 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)' class = network subclass = ethernet And, the SAS cards: dev.mpt.0.%desc: LSILogic SAS/SATA Adapter dev.mpt.0.%driver: mpt dev.mpt.0.%location: slot=0 function=0 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.0.%parent: pci1 dev.mpt.0.debug: 3 dev.mpt.0.role: 1 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter dev.mpt.1.%driver: mpt dev.mpt.1.%location: slot=0 function=0 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9 subdevice=0xa580 class=0x01 dev.mpt.1.%parent: pci2 dev.mpt.1.debug: 3 dev.mpt.1.role: 1 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter dev.mpt.2.%driver: mpt dev.mpt.2.%location: slot=0 function=0 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000 subdevice=0x30a0 class=0x01 dev.mpt.2.%parent: pci6 dev.mpt.2.debug: 3 dev.mpt.2.role: 1 Please provide output from the following commands (as root): # pciconf -lvcb hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 GFX Dual Slot' class = bridge subclass = HOST-PCI pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port A)' class = bridge subclass = PCI-PCI pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx0 port B)' class = bridge subclass = PCI-PCI pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port A)' class = bridge subclass = PCI-PCI pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port C)' class = bridge subclass = PCI-PCI pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (PCIe gpp port D)' class = bridge subclass = PCI-PCI pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00 hdr=0x01 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' device = 'RD790 PCI to PCI bridge (external gfx1 port A)' class = bridge subclass = PCI-PCI atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
Re: em0 watchdog timeouts
Hi i am also searching for the dcgdis.zip file to prevent watchdog timeout on em0 device Where can i get it Thanks David ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
On Wed, Aug 11, 2010 at 02:26:01PM +0200, Vonarburg David wrote: Hi i am also searching for the dcgdis.zip file to prevent watchdog timeout on em0 device Where can i get it Thanks David Which watchdog issue are you referring to? There are many reported watchdog timeout issues with em(4) in recent days. Are you referring to the power saving bit in the EEPRO, specific to certain Intel 82573 NICs? It's discussed here (see Networking (hardware and drivers)): http://wiki.freebsd.org/BugBusting/Commonly_reported_issues -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Hi, I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past 6months too. It looks related. I've tried to replace the hardware 3 times (2 different IBM x3755 chassis, one IBM x3650 chassis). I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I had issues with watchdog timeout. I tried replacing it with a 4-port pci-x Intel NIC, which gave me same problems. I was told that the 4-port intel NICs had an onboard bus- controller, that could cause trouble, so I replaced this with a 2-port PCI-e intel, which I was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx). Still getting watchdog timeouts, I tried upgrading all sort of sysctls I found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx processing, etc, etc). I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc, etc) to newest version. I also tried using a different qlogic isp(4) FC-controller (PCI-e). No matter what I tried, I could not diagnose this problem, or at least fix it. Also it happened rarely enough, to not be easy to debugging. I would get a series of watchdog timeout -- resetting, until the NIC would go completly offline - at the point I'd reboot it from console. This happened about once every 1-10 days, usually about 11-13:00. This machine has now been replaced with Linux, unfortunately, just to avoid more customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was replaced with Linux, is still online, and can be put at disposal for any developers who would like to debug this further. Like Stefan Krueger mentioned, this machine is also running as NFS server, with a mix of BSD and Linux clients, and it's getting hit pretty hard by clients. Hope we can iron this bug out, in the future. Best regards, Daniel Bond. On Oct 2, 2009, at 10:36 PM, Rudy wrote: Ah, I'll stop messing with them. I just set them all to 0 to see if that will help and noticed the card was leaving tx_int_delay=1. # sysctl dev.em.4.debug=1 Oct 2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0 Oct 2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0 # sysctl dev.em.4 dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12 dev.em.4.rx_int_delay: 0 dev.em.4.tx_int_delay: 0 dev.em.4.rx_abs_int_delay: 0 dev.em.4.tx_abs_int_delay: 0 Splitting traffic to different ports has brought down the watchdog events to once a day. ... essentially, I have a quad 30Mbps (not quad 1Gbps) card. heheh. Would turning off net.inet.ip.fastforwarding or any other setting help? Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I have a feeling that isn't related to the NIC at all, but I'm not sure what else to try. Rudy Jack Vogel wrote: Watchdog resets the adapter. Messing with these values is of dubious value anyway. Jack On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote: I noticed something interesting. I set the rc_int_delay to 0: sysctl dev.em.5.rx_int_delay=0 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is now 32: Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66 However, running sysctl dev.em.5 shows it as 0: dev.em.5.rx_int_delay: 0 dev.em.5.tx_int_delay: 66 Seems like the adapter and the kernel don't agree on the rx_int_delay value. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org PGP.sig Description: This is a digitally signed message part
Re: em0 watchdog timeouts
On Oct 2, 2009, at 4:36 PM, Rudy wrote: Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I have a feeling that isn't related to the NIC at all, but I'm not sure what else to try. Just curious, have you tried (or are you using) device polling? -- Robert Blayzor, BOFH INOC, LLC rblay...@inoc.net http://www.inoc.net/~rblayzor/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
This posting just muddies the issue, first you talk about having a problem that involves Broadcom, ok, so post about that on something other than em :) Then you make some references to hardware that you might have bought but didn't, I'm not about debugging 'possible worlds problems' though so can't help you there either :) Finally you never say what the actual hardware is, other than a person who I do not know told you it was the best performer... so, what exactly is it? You have a problem once every 10 days, and at a specific time no less, this almost always means something in your environment, a cron job run amok, a piece of hardware that resets, I dunno, but the last thing I would suspect given this description is the driver. You need a good sysadmin for this debugging I would venture, not a driver developer. Jack On Mon, Oct 5, 2009 at 7:19 AM, Daniel Bond d...@danielbond.org wrote: Hi, I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past 6months too. It looks related. I've tried to replace the hardware 3 times (2 different IBM x3755 chassis, one IBM x3650 chassis). I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I had issues with watchdog timeout. I tried replacing it with a 4-port pci-x Intel NIC, which gave me same problems. I was told that the 4-port intel NICs had an onboard bus-controller, that could cause trouble, so I replaced this with a 2-port PCI-e intel, which I was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx). Still getting watchdog timeouts, I tried upgrading all sort of sysctls I found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx processing, etc, etc). I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc, etc) to newest version. I also tried using a different qlogic isp(4) FC-controller (PCI-e). No matter what I tried, I could not diagnose this problem, or at least fix it. Also it happened rarely enough, to not be easy to debugging. I would get a series of watchdog timeout -- resetting, until the NIC would go completly offline - at the point I'd reboot it from console. This happened about once every 1-10 days, usually about 11-13:00. This machine has now been replaced with Linux, unfortunately, just to avoid more customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was replaced with Linux, is still online, and can be put at disposal for any developers who would like to debug this further. Like Stefan Krueger mentioned, this machine is also running as NFS server, with a mix of BSD and Linux clients, and it's getting hit pretty hard by clients. Hope we can iron this bug out, in the future. Best regards, Daniel Bond. On Oct 2, 2009, at 10:36 PM, Rudy wrote: Ah, I'll stop messing with them. I just set them all to 0 to see if that will help and noticed the card was leaving tx_int_delay=1. # sysctl dev.em.4.debug=1 Oct 2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0 Oct 2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0 # sysctl dev.em.4 dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12 dev.em.4.rx_int_delay: 0 dev.em.4.tx_int_delay: 0 dev.em.4.rx_abs_int_delay: 0 dev.em.4.tx_abs_int_delay: 0 Splitting traffic to different ports has brought down the watchdog events to once a day. ... essentially, I have a quad 30Mbps (not quad 1Gbps) card. heheh. Would turning off net.inet.ip.fastforwarding or any other setting help? Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I have a feeling that isn't related to the NIC at all, but I'm not sure what else to try. Rudy Jack Vogel wrote: Watchdog resets the adapter. Messing with these values is of dubious value anyway. Jack On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote: I noticed something interesting. I set the rc_int_delay to 0: sysctl dev.em.5.rx_int_delay=0 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is now 32: Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66 However, running sysctl dev.em.5 shows it as 0: dev.em.5.rx_int_delay: 0 dev.em.5.tx_int_delay: 66 Seems like the adapter and the kernel don't agree on the rx_int_delay value. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Hi Jack, I'll comment your mail inline: On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote: This posting just muddies the issue, first you talk about having a problem that involves Broadcom, ok, so post about that on something other than em :) I only meant to indicate that the problem might exist outside the intel driver. I'm also indicating that it happens with several drivers (bge, bce and em) on several different machines, on both pci-x and pci-e. I'm sorry if this is confusing to you, but I still think it's relevant to mention. Then you make some references to hardware that you might have bought but didn't, I'm not about debugging 'possible worlds problems' though so can't help you there either :) No. I only made references to hardware I actually used, and had real- world issues with. Finally you never say what the actual hardware is, other than a person who I do not know told you it was the best performer... so, what exactly is it? Sepherosa is a guy that writes drivers for BSD based operating systems. Including FreeBSD. He has a lot of knowledge in this area. http://people.freebsd.org/~sephe/ The NIC you are referring to, the one sephe recommended me, is a 82571EB. I didn't mention specific hardware, as I think it's more important to note this is an issue I'm experiencing across different sets of hardware and drivers. You have a problem once every 10 days, and at a specific time no less, this almost always means something in your environment, a cron job run amok, a piece of hardware that resets, I dunno, but the last thing I would suspect given this description is the driver. This is not what I wrote. I wrote I had a problem every 1-10 days, but it would usually happen once every 3-4 days. At worst, every day in periods. It's not at any specific time. If you read my email correctly, I say it *usually* happens arround 11-13:00, but it has happened at random times too. This is my point exactly. I don't think it's the Intel-driver, I think the problem is elsewhere. I had a suspicion it had to do with the combination of nic + qlogic fc-controller, but I have no evidence of this. You need a good sysadmin for this debugging I would venture, not a driver developer. What I need is useful advice/help. I never stated I needed a driver developer. I'd like to be able to run my favorite OS on cool hardware, in the future, for a high-performing NFS-server, without problems like I've experienced the past 6months, on a production system. Please note that I'm managing a server-park almost completely based on FreeBSD, and I'm running many NFS servers on other hardware, for other services, without issues. I've seen several other FreeBSD-users having problems with this too, so I think it's of importance for the project. As I mentioned originally, I'm happy to dispose the hardware to any FreeBSD developer that might want to look further into this. Debugging it further is above my skill-set, I don't even know where to begin looking, especially since I can't produce any panics. I'm sorry to say, but your reply was %0 useful, Jack. Jack - Daniel PGP.sig Description: This is a digitally signed message part
Re: em0 watchdog timeouts
Sorry, its a Monday morning, I was being kinda facetious, guess it didn't work very well :) I apologize. I know it must be annoying for you, its as much so for me when its something I can't just fix because its not reproducible. So, I feel your pain. Will try to restrain my Monday blues in the future. Jack On Mon, Oct 5, 2009 at 11:32 AM, Daniel Bond d...@danielbond.org wrote: Hi Jack, I'll comment your mail inline: On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote: This posting just muddies the issue, first you talk about having a problem that involves Broadcom, ok, so post about that on something other than em :) I only meant to indicate that the problem might exist outside the intel driver. I'm also indicating that it happens with several drivers (bge, bce and em) on several different machines, on both pci-x and pci-e. I'm sorry if this is confusing to you, but I still think it's relevant to mention. Then you make some references to hardware that you might have bought but didn't, I'm not about debugging 'possible worlds problems' though so can't help you there either :) No. I only made references to hardware I actually used, and had real-world issues with. Finally you never say what the actual hardware is, other than a person who I do not know told you it was the best performer... so, what exactly is it? Sepherosa is a guy that writes drivers for BSD based operating systems. Including FreeBSD. He has a lot of knowledge in this area. http://people.freebsd.org/~sephe/ http://people.freebsd.org/%7Esephe/ The NIC you are referring to, the one sephe recommended me, is a 82571EB. I didn't mention specific hardware, as I think it's more important to note this is an issue I'm experiencing across different sets of hardware and drivers. You have a problem once every 10 days, and at a specific time no less, this almost always means something in your environment, a cron job run amok, a piece of hardware that resets, I dunno, but the last thing I would suspect given this description is the driver. This is not what I wrote. I wrote I had a problem every 1-10 days, but it would usually happen once every 3-4 days. At worst, every day in periods. It's not at any specific time. If you read my email correctly, I say it *usually* happens arround 11-13:00, but it has happened at random times too. This is my point exactly. I don't think it's the Intel-driver, I think the problem is elsewhere. I had a suspicion it had to do with the combination of nic + qlogic fc-controller, but I have no evidence of this. You need a good sysadmin for this debugging I would venture, not a driver developer. What I need is useful advice/help. I never stated I needed a driver developer. I'd like to be able to run my favorite OS on cool hardware, in the future, for a high-performing NFS-server, without problems like I've experienced the past 6months, on a production system. Please note that I'm managing a server-park almost completely based on FreeBSD, and I'm running many NFS servers on other hardware, for other services, without issues. I've seen several other FreeBSD-users having problems with this too, so I think it's of importance for the project. As I mentioned originally, I'm happy to dispose the hardware to any FreeBSD developer that might want to look further into this. Debugging it further is above my skill-set, I don't even know where to begin looking, especially since I can't produce any panics. I'm sorry to say, but your reply was %0 useful, Jack. Jack - Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote: What I need is useful advice/help. I never stated I needed a driver developer. I'd like to be able to run my favorite OS on cool hardware, in the future, for a high-performing NFS-server, without problems like I've experienced the past 6months, on a production system. Please note that I'm managing a server-park almost completely based on FreeBSD, and I'm running many NFS servers on other hardware, for other services, without issues. I've seen several other FreeBSD-users having problems with this too, so I think it's of importance for the project. As I mentioned originally, I'm happy to dispose the hardware to any FreeBSD developer that might want to look further into this. Debugging it further is above my skill-set, I don't even know where to begin looking, especially since I can't produce any panics. I can give one bit of advice that helped me in a similar situation: check you motherboards. I run about a dozen fileservers on FreeBSD, and have always been very happy with their performance, but some months ago I began to experience problems with one of them. These problems were 'watchdog timeout' errors. Tried all manner of things, different NICs of different types, changing settings, etc., but nothing helped over the long term. At some point, when very heavy i/o was going on to our Beowulf cluster, the 'watchdog timeouts' would begin. What was strange is that other (supposedly identical) machines handled _more_ i/o without a problem. Finally, while doing some comparisons, I realized that the motherboard having the problem was _not_ the same as the others; it was similar, but not identical. I changed the motherboard and all the problems went away, never to reappear. I don't know if it was a specific problem with that particular motherboard, or something about that model, but for whatever reason, it appears that the buses just couldn't handle a RAID card and three active NICs. -- greg byshenk - gbysh...@byshenk.net - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Finally, while doing some comparisons, I realized that the motherboard having the problem was _not_ the same as the others; it was similar, but not identical. This is a good piece of info. I can try swapping out the MB and see what happens. I do want to add: thank you Jack for all your help and if does turn out to be the MB, then double thanks. Viva Monday! :) What would be nice would be MORE info for a watchdog timeout... maybe a sysctl dev.watchdog.debug=1 or something where when a watchdog event happened --- for whatever driver --- a bunch of stats were dumped relating to the event. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Hmmm, I did have one of the drivers print more info at watchdog time, but I just looked and that's not em, time to add that I guess. Since you're in the driver there isn't a huge amount of info that you can print, it still may not be enough to help. BTW, I've always been somewhat dissatisfied with the watchdog design and think its kinda flawed, I could try and make you an experimental with debug and some changes that you can try if you'd like. Jack On Mon, Oct 5, 2009 at 1:54 PM, Rudy cra...@monkeybrains.net wrote: Finally, while doing some comparisons, I realized that the motherboard having the problem was _not_ the same as the others; it was similar, but not identical. This is a good piece of info. I can try swapping out the MB and see what happens. I do want to add: thank you Jack for all your help and if does turn out to be the MB, then double thanks. Viva Monday! :) What would be nice would be MORE info for a watchdog timeout... maybe a sysctl dev.watchdog.debug=1 or something where when a watchdog event happened --- for whatever driver --- a bunch of stats were dumped relating to the event. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
BTW, I've always been somewhat dissatisfied with the watchdog design and think its kinda flawed, I could try and make you an experimental with debug and some changes that you can try if you'd like. I'm game -- it would be nice if the machine still reset the watchdog in 3 seconds and didn't cause any more damage from the debug code (eg a panic). :) My frequency of watchdog events is about 2 or 3 times per day. I am running: Intel(R) PRO/1000 Network Connection 6.9.12 Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
I noticed something interesting. I set the rc_int_delay to 0: sysctl dev.em.5.rx_int_delay=0 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is now 32: Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66 However, running sysctl dev.em.5 shows it as 0: dev.em.5.rx_int_delay: 0 dev.em.5.tx_int_delay: 66 Seems like the adapter and the kernel don't agree on the rx_int_delay value. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Watchdog resets the adapter. Messing with these values is of dubious value anyway. Jack On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote: I noticed something interesting. I set the rc_int_delay to 0: sysctl dev.em.5.rx_int_delay=0 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is now 32: Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66 However, running sysctl dev.em.5 shows it as 0: dev.em.5.rx_int_delay: 0 dev.em.5.tx_int_delay: 66 Seems like the adapter and the kernel don't agree on the rx_int_delay value. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Ah, I'll stop messing with them. I just set them all to 0 to see if that will help and noticed the card was leaving tx_int_delay=1. # sysctl dev.em.4.debug=1 Oct 2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0 Oct 2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0 # sysctl dev.em.4 dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12 dev.em.4.rx_int_delay: 0 dev.em.4.tx_int_delay: 0 dev.em.4.rx_abs_int_delay: 0 dev.em.4.tx_abs_int_delay: 0 Splitting traffic to different ports has brought down the watchdog events to once a day. ... essentially, I have a quad 30Mbps (not quad 1Gbps) card. heheh. Would turning off net.inet.ip.fastforwarding or any other setting help? Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps. I have a feeling that isn't related to the NIC at all, but I'm not sure what else to try. Rudy Jack Vogel wrote: Watchdog resets the adapter. Messing with these values is of dubious value anyway. Jack On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote: I noticed something interesting. I set the rc_int_delay to 0: sysctl dev.em.5.rx_int_delay=0 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0: Oct 1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is now 32: Oct 2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66 However, running sysctl dev.em.5 shows it as 0: dev.em.5.rx_int_delay: 0 dev.em.5.tx_int_delay: 66 Seems like the adapter and the kernel don't agree on the rx_int_delay value. Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
I have rxd and txd set to 1024. How high can I safely go? # add more descriptors to em devices. hw.em.rxd=1024 hw.em.txd=1024 ### other settings... I have tried rx_int_delay=0 and 32 ... doesn't seem to make the watchdogs go away. dev.em.4.rx_int_delay: 32 dev.em.4.tx_int_delay: 66 dev.em.4.rx_abs_int_delay: 66 dev.em.4.tx_abs_int_delay: 66 dev.em.4.rx_processing_limit: 300 I am using a PCI-Express (x8) PCI-e slot according to the motherboard specs: http://supermicro.com/products/motherboard/Xeon3000/3210/X7SBi.cfm Rudy Jack Vogel wrote: Increase the size of your TX ring, meaning the number of TX descriptors. You said this is a quad port card, what size PCI E slot are you in? On some motherboards slot connectors might suggest its of a certain size but its not really wired fully. If you are not in a x8 lane slot move it to one. What about system tuning? Some ideas, let me know how it goes. Jack On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote: Rudy wrote: Rudy wrote: I am having watchdog timeout issues Oh, here is some more info from 'pciconf -lcv'. I offloaded half the traffic from em0 to em5 and there has only been one watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday. We do streaming out of our network and the 3 second outage really messes things up... e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82573E Intel Corporation 82573E Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82573L Intel PRO/1000 PL Network Adaptor' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002 rev=0x02 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
I have a quad card in a PCIe 8x port, and there are 2 ports on the motherboard. I just read the manual and see that the on board ports are PCIe 1x. I have been seeing watchdog events on the onboard ports as well as on the PCIe card. The router is doing roughly 50Mbps on em0, em4 em5. Does i386 vs amd64 make any difference to the em0 driver? bumping TX Ring to 2048. grep em /boot/loader.conf if_em_load=YES hw.em.rxd=2048 hw.em.txd=2048 Rudy You said this is a quad port card, what size PCI E slot are you in? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
I would say that 1024 should be enough, I thought maybe you were at 256. amd64 kernels just perform better at a lot of things, however I/O is not necessarily one of them, so I wouldn't claim it for sure, still I'd always default to 64 bit these days unless there's some other reason not to. What about system load, perhaps something is bogging the thing down so that it cannot adequately service the network interrupts?? The specs of the motherboard are respectable, how much memory does it have? Another thought, are you using the out-of-band management features (like IPMI)? If you are not then go into the BIOS and disable that stuff. Have you run netstat or some other resource monitor to see if you run out of anything that might coincide with the watchdogs... Jack On Thu, Oct 1, 2009 at 2:12 PM, Rudy (bulk) cra...@monkeybrains.net wrote: I have a quad card in a PCIe 8x port, and there are 2 ports on the motherboard. I just read the manual and see that the on board ports are PCIe 1x. I have been seeing watchdog events on the onboard ports as well as on the PCIe card. The router is doing roughly 50Mbps on em0, em4 em5. Does i386 vs amd64 make any difference to the em0 driver? bumping TX Ring to 2048. grep em /boot/loader.conf if_em_load=YES hw.em.rxd=2048 hw.em.txd=2048 Rudy You said this is a quad port card, what size PCI E slot are you in? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
What about system load, perhaps something is bogging the thing down so that it cannot adequately service the network interrupts?? Hardly anything is running on the box... Only things on the box: zebra bgpd (3 peers...) sshd snmpd Here is the top of 'top': load averages: 0.06, 0.08, 0.07 up 7+01:08:16 17:26:39 15 processes: 1 running, 14 sleeping CPU: 0.0% user, 0.0% nice, 4.5% system, 0.0% interrupt, 95.5% idle Mem: 193M Active, 42M Inact, 156M Wired, 196K Cache, 83M Buf, 1610M Free The specs of the motherboard are respectable, how much memory does it have? Another thought, are you using the out-of-band management features (like IPMI)? If you are not then go into the BIOS and disable that stuff. No IPMI card added to that motherboard (you have to add a daughter card). Have you run netstat or some other resource monitor to see if you run out of anything that might coincide with the watchdogs... What should I look for? # netstat -s 4105/4610/8715 mbufs in use (current/cache/total) 4103/2303/6406/25600 mbuf clusters in use (current/cache/total/max) 4103/2297 mbuf+clusters out of packet secondary zone in use (current/cache) 0/44/44/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 9232K/5934K/15166K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/6/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines Are there specific router-only tunings that may help? Here are my sysctl settings: kern.ipc.somaxconn=256 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.ethernet=0 kern.ipc.nmbcluster=32768 net.inet.icmp.icmplim=1000 net.inet.ip.fastforwarding=1 net.inet.ip.intr_queue_maxlen=92 net.inet.icmp.drop_redirect=1 dev.em.0.rx_processing_limit=200 dev.em.1.rx_processing_limit=200 dev.em.2.rx_processing_limit=200 #dev.em.4.rx_processing_limit=200 # test setting processing limit up to 300 dev.em.4.rx_processing_limit=300 dev.em.5.rx_processing_limit=200 Thanks, Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Rudy wrote: I am having watchdog timeout issues with my Intel 82573 Pro/1000 ... http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html link to dcgdis.zip didn't work. Do you have a copy? Thanks, Jack. Got the file and flashed -- no upgrade needed. So, while the router was offline, I flashed the motherboards bios (Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12 version of the em driver. Still, watchdog timeouts. Sigh. Will the Intel Gigabit ET Quad Port Adapter make my the timeouts go away??? Should I be using amd64??? Should tx_int_delay=0? Summary: 2 Nics on Motherboard + quad card in PCIe slot. Watchdog timeouts on motherboard nics and on quad card nic when bandwidth 10Mbps There is minimal (bgp session) TCP to the box... it only forwards packets between interfaces. # uname -r -m 7.2-STABLE i386 # dmesg | grep ^em em0: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2000-0x201f mem 0xd022-0xd023,0xd020-0xd021 irq 16 at device 0.0 on pci5 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 00:15:17:78:99:70 em1: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2020-0x203f mem 0xd026-0xd027,0xd024-0xd025 irq 17 at device 0.1 on pci5 em1: Using MSI interrupt em1: [FILTER] em1: Ethernet address: 00:15:17:78:99:71 em2: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3000-0x301f mem 0xd032-0xd033,0xd030-0xd031 irq 17 at device 0.0 on pci6 em2: Using MSI interrupt em2: [FILTER] em2: Ethernet address: 00:15:17:78:99:72 em3: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3020-0x303f mem 0xd036-0xd037,0xd034-0xd035 irq 18 at device 0.1 on pci6 em3: Using MSI interrupt em3: [FILTER] em3: Ethernet address: 00:15:17:78:99:73 em4: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x4000-0x401f mem 0xd040-0xd041 irq 16 at device 0.0 on pci13 em4: Using MSI interrupt em4: [FILTER] em4: Ethernet address: 00:30:48:67:14:50 em5: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x5000-0x501f mem 0xd050-0xd051 irq 17 at device 0.0 on pci15 em5: Using MSI interrupt em5: [FILTER] em5: Ethernet address: 00:30:48:67:14:51 # vmstat -i interrupt total rate irq1: atkbd0 710 0 irq4: sio0 3 0 irq23: atapci0 14943 0 cpu0: timer929753417 2000 irq256: em0702754836 1511 irq257: em12 0 irq260: em4469338728 1009 irq261: em5 78605337169 cpu1: timer929753403 2000 Total 3110221379 6690 # sysctl dev.em.0.stats=1 Sep 30 01:08:20 mango kernel: em0: Excessive collisions = 0 Sep 30 01:08:20 mango kernel: em0: Sequence errors = 0 Sep 30 01:08:20 mango kernel: em0: Defer count = 0 Sep 30 01:08:20 mango kernel: em0: Missed Packets = 101469 Sep 30 01:08:20 mango kernel: em0: Receive No Buffers = 0 Sep 30 01:08:20 mango kernel: em0: Receive Length Errors = 0 Sep 30 01:08:20 mango kernel: em0: Receive errors = 0 Sep 30 01:08:20 mango kernel: em0: Crc errors = 0 Sep 30 01:08:20 mango kernel: em0: Alignment errors = 0 Sep 30 01:08:20 mango kernel: em0: Collision/Carrier extension errors = 0 Sep 30 01:08:20 mango kernel: em0: RX overruns = 0 Sep 30 01:08:20 mango kernel: em0: watchdog timeouts = 15 Sep 30 01:08:20 mango kernel: em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 Sep 30 01:08:20 mango kernel: em0: XON Rcvd = 0 Sep 30 01:08:20 mango kernel: em0: XON Xmtd = 0 Sep 30 01:08:20 mango kernel: em0: XOFF Rcvd = 0 Sep 30 01:08:20 mango kernel: em0: XOFF Xmtd = 0 Sep 30 01:08:20 mango kernel: em0: Good Packets Rcvd = 1056196797 Sep 30 01:08:20 mango kernel: em0: Good Packets Xmtd = 1088726903 Sep 30 01:08:20 mango kernel: em0: TSO Contexts Xmtd = 4088 Sep 30 01:08:20 mango kernel: em0: TSO Contexts Failed = 0 # sysctl dev.em.0.debug=1 Sep 30 01:34:59 mango kernel: em0: Adapter hardware address = 0xc5159420 Sep 30 01:34:59 mango kernel: em0: CTRL = 0x401c0241 RCTL = 0x8002 Sep 30 01:34:59 mango kernel: em0: Packet buffer = Tx=16k Rx=32k Sep 30 01:34:59 mango kernel: em0: Flow control watermarks high = 30720 low = 29220 Sep 30 01:34:59 mango kernel: em0: tx_int_delay = 66, tx_abs_int_delay = 66 Sep 30 01:34:59 mango kernel: em0: rx_int_delay = 0, rx_abs_int_delay = 66 Sep 30 01:34:59 mango kernel: em0: fifo workaround = 0, fifo_reset_count = 0 Sep 30 01:34:59 mango kernel: em0: hw tdh = 980, hw tdt = 980 Sep 30 01:34:59 mango kernel: em0: hw rdh = 203, hw rdt = 202 Sep 30 01:34:59 mango kernel: em0: Num Tx descriptors avail = 1024 Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail1 = 0 Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail2 = 0 Sep 30 01:34:59 mango kernel: em0: Std mbuf failed = 0 Sep 30 01:34:59 mango kernel: em0: Std mbuf cluster
Re: em0 watchdog timeouts
In muc.lists.freebsd.stable, you wrote: Rudy wrote: I am having watchdog timeout issues with my Intel 82573 Pro/1000 ... http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html link to dcgdis.zip didn't work. Do you have a copy? Thanks, Jack. Got the file and flashed -- no upgrade needed. So, while the router was offline, I flashed the motherboards bios (Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12 version of the em driver. Still, watchdog timeouts. Sigh. Hi Rudy, may I ask which clients have access to your FreeBSD 7.2 server? I had similar problems a few days ago; I have no idea what exactly happend, but Ubuntu Linux (NIS and NFS client) made my em0 timeout after a while, too, (and even crashed my FreeBSD 7.2 box a few times!) This box was rock solid before, I even thought my Intel NIC was broken... Anyway, since I had no time (and clue) to analyze this further, I took the risk and upgraded to 8.0-RC1 and, well, everything is working fine now :-) HTH ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Stefan Krueger wrote: In muc.lists.freebsd.stable, you wrote: Rudy wrote: I am having watchdog timeout issues with my Intel 82573 Pro/1000 ... http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html link to dcgdis.zip didn't work. Do you have a copy? Thanks, Jack. Got the file and flashed -- no upgrade needed. So, while the router was offline, I flashed the motherboards bios (Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12 version of the em driver. Still, watchdog timeouts. Sigh. Hi Rudy, may I ask which clients have access to your FreeBSD 7.2 server? None. It is a router and has minimal services on it (bgpd / zebra / snmpd). Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Rudy wrote: Rudy wrote: I am having watchdog timeout issues Oh, here is some more info from 'pciconf -lcv'. I offloaded half the traffic from em0 to em5 and there has only been one watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday. We do streaming out of our network and the 3 second outage really messes things up... e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:13:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82573E Intel Corporation 82573E Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) e...@pci0:15:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82573L Intel PRO/1000 PL Network Adaptor' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) vgap...@pci0:17:3:0: class=0x03 card=0xd18015d9 chip=0x515e1002 rev=0x02 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em0 watchdog timeouts
Increase the size of your TX ring, meaning the number of TX descriptors. You said this is a quad port card, what size PCI E slot are you in? On some motherboards slot connectors might suggest its of a certain size but its not really wired fully. If you are not in a x8 lane slot move it to one. What about system tuning? Some ideas, let me know how it goes. Jack On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote: Rudy wrote: Rudy wrote: I am having watchdog timeout issues Oh, here is some more info from 'pciconf -lcv'. I offloaded half the traffic from em0 to em5 and there has only been one watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday. We do streaming out of our network and the 3 second outage really messes things up... e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82571EB Gigabit Ethernet Controller' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4) e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82573E Intel Corporation 82573E Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82573L Intel PRO/1000 PL Network Adaptor' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002 rev=0x02 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
em0 watchdog timeouts -- looking for dcgdis.zip
I am having watchdog timeout issues with my Intel 82573 Pro/1000 ... http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html link to dcgdis.zip didn't work. Do you have a copy? Thanks in advance, Rudy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
Hello. This is Yutaka. I am unskilled using english. If you don't understand my english , please teach for me. I came from FreeBSD-users-jp (Japanese mailing-list) which has been guidanced. http://home.jp.freebsd.org/cgi-bin/showmail/FreeBSD-users-jp/90318 This thired. 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike Andrews * 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jack Vogel o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike Andrews o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jeremy Chadwick + 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Jack Vogel # 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) John Baldwin o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround) Mike Andrews I know this phenomenon. My environment generate it. Changed setting with problem has improved. I was setting disable USB in BIOS. Having been generated problem by 3 times after reboot. however, no-problem last 3 days. enable USB, this problem 4-10 times per hour. disable USB, this problem 3 times after reboot. # pciconf -l -v [EMAIL PROTECTED]:8:0: class=0x02 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class= network subclass = ethernet disable USB, dmesg.boot log. Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE #0: Sat Jan 20 12:12:56 JST 2007 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/NATBOX ACPI APIC Table: AMIINT VIA_K7 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: AMD Sempron(tm) (1403.19-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x681 Stepping = 1 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE AMD Features=0xc0480800SYSCALL,MP,MMX+,3DNow+,3DNow real memory = 805240832 (767 MB) avail memory = 774430720 (738 MB) ioapic0 Version 0.3 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: AMIINT VIA_K7 on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 acpi_button0: Power Button on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 agp0: VIA 8377 (Apollo KT400/KT400A/KT600) host to PCI bridge mem 0xe000-0xe7ff at device 0.0 on pci0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 pci0: display, VGA at device 5.0 (no driver attached) atapci0: Promise PDC40518 SATA150 controller port 0xec00-0xec7f,0xe800-0xe8ff mem 0xdfffb000-0xdfffbfff,0xdffc-0xdffd irq 17 at device 6.0 on pci0 ata2: ATA channel 0 on atapci0 ata3: ATA channel 1 on atapci0 ata4: ATA channel 2 on atapci0 ata5: ATA channel 3 on atapci0 em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0xe400-0xe43f mem 0xdff8-0xdff9,0xdff6-0xdff7 irq 18 at device 8.0 on pci0 em0: Ethernet address: 00:07:e9:xx:x:xx xl0: 3Com 3c905B-TX Fast Etherlink XL port 0xe000-0xe07f mem 0xdfffaf80-0xdfffafff irq 17 at device 10.0 on pci0 miibus0: MII bus on xl0 xlphy0: 3Com internal media interface on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:10:5a:xx:xx:xx atapci1: VIA 6420 SATA150 controller port 0xdc00-0xdc07,0xd800-0xd803,0xd400-0xd407,0xd000-0xd003,0xcc00-0xcc0f,0xc800-0xc8ff irq 20 at device 15.0 on pci0 ata6: ATA channel 0 on atapci1 ata7: ATA channel 1 on atapci1 atapci2: VIA 8237 UDMA133 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 ata0: ATA channel 0 on atapci2 ata1: ATA channel 1 on atapci2 isab0: PCI-ISA bridge at device 17.0 on pci0 isa0: ISA bus on isab0 pci0: multimedia, audio at device 17.5 (no driver attached) acpi_button1: Sleep Button on acpi0 atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0 atkbd0: AT Keyboard irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 fdc0: floppy drive controller port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3
Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
Jack Vogel wrote: On 1/16/07, Mike Andrews [EMAIL PROTECTED] wrote: I have a strange issue with em0 watchdog timeouts that I think is not the same as the ones everyone was having during the 6.2 beta cycle... I have six systems, each with two Intel GigE ports onboard: Systems A and B: Supermicro PDSMi+ Systems C and D: Supermicro PDSMi (without the plus) [snip] Several times a day, em0 will go down, give a watchdog timeout error on the console, then come right back up on its own a few seconds later. But here's the weird twist: it ONLY happens on systems A and B, and ONLY when running at gigabit speed. If I knock the two switch ports down to 100 meg, the problem goes away. [snip] There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Let me know if it does fix it please. No problems since running that tool almost 24 hours ago. Looks like a fix. Thanks again! -- Mike Andrews * [EMAIL PROTECTED] * http://www.bit0.com It's not news, it's Fark.com. Carpe cavy! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
On Tuesday 16 January 2007 22:07, Jack Vogel wrote: On 1/16/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote: There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Jack, Can you expand on what this application changes in the PROM? I have an Intel motherboard which suffers from similar to what the OP has reported (em0 watchdog timeouts), and was curious what the utility does before firing up the board and trying it. Others may be curious to know, too. Hmmm, I'm rusty on this, its now been a year or more since I was first involved in the details, so I may need to amend this later :) But from memory, the issue is the value programmed into the MANC register by the PROM, I don't remember what bit it was, but one bit is mistakenly set, it causes the hardware to incorrectly intercept some packets. I was snowbound today, but I'll doublecheck on the detail tomorrow and amend if needed. Everyone note that this ONLY effects an 82573 NIC, so make sure of that before anything else. Is this the IPMI/ASF stuff? If so, you can also work around it by adding 'net.inet.ip.portrange.lowlast=665' to /etc/sysctl.conf. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
I have a strange issue with em0 watchdog timeouts that I think is not the same as the ones everyone was having during the 6.2 beta cycle... I have six systems, each with two Intel GigE ports onboard: Systems A and B: Supermicro PDSMi+ Systems C and D: Supermicro PDSMi (without the plus) System E: Tyan S2730U3GN System F: Supermicro X5DPA-GG On each system: em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch. em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch. All six run FreeBSD 6.2-RELEASE i386, even though the first four are capable of running amd64. They all have 2 GB of memory, except E which has 4 GB. The kernel configs are all identical, and are not that far from GENERIC + SMP. Several times a day, em0 will go down, give a watchdog timeout error on the console, then come right back up on its own a few seconds later. But here's the weird twist: it ONLY happens on systems A and B, and ONLY when running at gigabit speed. If I knock the two switch ports down to 100 meg, the problem goes away. The other four systems C thru F never have watchdog timeout issues; they always work perfectly even at gigabit speed. So I'm trying to figure out if there are any other obvious hardware differences between the plus and non-plus version of the PDSMi that would be causing issues on the plus version. Fortunately, at the moment we are not (yet) pushing anywhere near even 100 meg worth of traffic through these ports, so it's a tolerable workaround... just kinda annoying. :) The chipset is a bit different: the PDSMi is the Intel E7230 chipset for Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo support. But apparently the NIC chips are identical: 82573V for em0 and 82573L for em1. The BIOS is identical too, so the chipsets must be pretty similar. Nothing shares an IRQ with the NICs. (USB is disabled in the BIOS.) They do have different disk systems; A and B are SATA gmirror setups, while C and D use LSI Megaraid SCSI cards for their mirrors. I have tried the obvious switching the cables out. No difference at all. I have NOT yet tried a different gigabit switch. Hopefully that's enough detail to start; I can get into more specifics as needed. (Kernel configs, dmesg output, IRQ details, disk details, IPMI, running apps, serial console access if needed...) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
Jack Vogel wrote: On 1/16/07, Mike Andrews [EMAIL PROTECTED] wrote: I have a strange issue with em0 watchdog timeouts that I think is not the same as the ones everyone was having during the 6.2 beta cycle... I have six systems, each with two Intel GigE ports onboard: Systems A and B: Supermicro PDSMi+ Systems C and D: Supermicro PDSMi (without the plus) [snip] Several times a day, em0 will go down, give a watchdog timeout error on the console, then come right back up on its own a few seconds later. But here's the weird twist: it ONLY happens on systems A and B, and ONLY when running at gigabit speed. If I knock the two switch ports down to 100 meg, the problem goes away. [snip] There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Let me know if it does fix it please. So far it seems like it DID fix it, but give me another day or two to watch it to be sure. Thanks! FYI, it only changed the PROM on the first NIC on each PDSMi+ box; it said the second NIC was fine. (But since the first NIC was the one I was having trouble with...) I ran it on the older PDSMi boxes and it said it changed both NICs on those, even though they were (and still are) working fine. -- Mike Andrews * [EMAIL PROTECTED] * http://www.bit0.com It's not news, it's Fark.com. Carpe cavy! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote: There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Jack, Can you expand on what this application changes in the PROM? I have an Intel motherboard which suffers from similar to what the OP has reported (em0 watchdog timeouts), and was curious what the utility does before firing up the board and trying it. Others may be curious to know, too. Thanks, as always. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networkinghttp://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)
On 1/16/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote: There are some management related issues with this NIC, first if you have not done so make a DOS bootable device, and run this app I am enclosing, it fixes the prom setting that is wrong on some devices. It will do no harm, and it may solve things. Jack, Can you expand on what this application changes in the PROM? I have an Intel motherboard which suffers from similar to what the OP has reported (em0 watchdog timeouts), and was curious what the utility does before firing up the board and trying it. Others may be curious to know, too. Hmmm, I'm rusty on this, its now been a year or more since I was first involved in the details, so I may need to amend this later :) But from memory, the issue is the value programmed into the MANC register by the PROM, I don't remember what bit it was, but one bit is mistakenly set, it causes the hardware to incorrectly intercept some packets. I was snowbound today, but I'll doublecheck on the detail tomorrow and amend if needed. Everyone note that this ONLY effects an 82573 NIC, so make sure of that before anything else. Cheers, Jack ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]