subject:"em0 watchdog timeouts"

Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Joshua Boyd

If needed, I can reproduce this on demand. Just need to know what sort of
statistics are needed when the problem is occurring. I've had to turn off my
weekly scrubs until I can figure out how to fix this problem.

On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  =

Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Jack Vogel

I cannot repro this, I used your kernel config, this is on a Dell 1850 btw,
I ran netperf stress from 3 clients, and have seen no watchdogs :(

Jack


On Tue, Jun 21, 2011 at 7:59 PM, Joshua Boyd boy...@jbip.net wrote:

 If needed, I can reproduce this on demand. Just need to know what sort of
 statistics are needed when the problem is occurring. I've had to turn off my
 weekly scrubs until I can figure out how to fix this problem.


 On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

  On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like
 watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI

em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Joshua Boyd

I recently updated my server to the latest 8-STABLE, and upgraded to v28
ZFS. I have not had these problems on any other version of 8-STABLE or
7-STABLE, which this box was upgraded from some time ago.

Now, during my weekly scrub, I get the following messages and em0 is
unresponsive:

Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting
Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting
Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP

My scrub is scheduled to start at 03:00:00, so it looks like watchdog
timeouts start occurring pretty quickly once I/O ramps up.

Here's some possibly relevant information, let me know if anything else
would be helpful to troubleshoot.

FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17:
Mon Jun  6 19:40:19 EDT 2011
r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
 amd64

em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f
mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7

em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
hdr=0x00
vendor = 'Intel Corporation'
device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
class  = network
subclass   = ethernet

And, the SAS cards:

dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
dev.mpt.0.%driver: mpt
dev.mpt.0.%location: slot=0 function=0
dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
subdevice=0xa580 class=0x01
dev.mpt.0.%parent: pci1
dev.mpt.0.debug: 3
dev.mpt.0.role: 1
dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
dev.mpt.1.%driver: mpt
dev.mpt.1.%location: slot=0 function=0
dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
subdevice=0xa580 class=0x01
dev.mpt.1.%parent: pci2
dev.mpt.1.debug: 3
dev.mpt.1.role: 1
dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
dev.mpt.2.%driver: mpt
dev.mpt.2.%location: slot=0 function=0
dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
subdevice=0x30a0 class=0x01
dev.mpt.2.%parent: pci6
dev.mpt.2.debug: 3
dev.mpt.2.role: 1


-- 
Joshua Boyd
JBipNet

E-mail: boy...@jbip.net
Cell: (513) 375-0157

http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Jeremy Chadwick

On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
 I recently updated my server to the latest 8-STABLE, and upgraded to v28
 ZFS. I have not had these problems on any other version of 8-STABLE or
 7-STABLE, which this box was upgraded from some time ago.
 
 Now, during my weekly scrub, I get the following messages and em0 is
 unresponsive:
 
 Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting
 Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
 Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
 Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting
 Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
 Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
 
 My scrub is scheduled to start at 03:00:00, so it looks like watchdog
 timeouts start occurring pretty quickly once I/O ramps up.
 
 Here's some possibly relevant information, let me know if anything else
 would be helpful to troubleshoot.
 
 FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17:
 Mon Jun  6 19:40:19 EDT 2011
 r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
  amd64
 
 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f
 mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7
 
 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
 hdr=0x00
 vendor = 'Intel Corporation'
 device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
 class  = network
 subclass   = ethernet
 
 And, the SAS cards:
 
 dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.0.%driver: mpt
 dev.mpt.0.%location: slot=0 function=0
 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
 subdevice=0xa580 class=0x01
 dev.mpt.0.%parent: pci1
 dev.mpt.0.debug: 3
 dev.mpt.0.role: 1
 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.1.%driver: mpt
 dev.mpt.1.%location: slot=0 function=0
 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
 subdevice=0xa580 class=0x01
 dev.mpt.1.%parent: pci2
 dev.mpt.1.debug: 3
 dev.mpt.1.role: 1
 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.2.%driver: mpt
 dev.mpt.2.%location: slot=0 function=0
 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
 subdevice=0x30a0 class=0x01
 dev.mpt.2.%parent: pci6
 dev.mpt.2.debug: 3
 dev.mpt.2.role: 1

Please provide output from the following commands (as root):

# pciconf -lvcb
# vmstat -i
# sysctl -a | grep msi
# dmesg

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Joshua Boyd

On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:

 On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
  I recently updated my server to the latest 8-STABLE, and upgraded to v28
  ZFS. I have not had these problems on any other version of 8-STABLE or
  7-STABLE, which this box was upgraded from some time ago.
 
  Now, during my weekly scrub, I get the following messages and em0 is
  unresponsive:
 
  Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting
  Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
  Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
  Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting
  Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
  Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
 
  My scrub is scheduled to start at 03:00:00, so it looks like watchdog
  timeouts start occurring pretty quickly once I/O ramps up.
 
  Here's some possibly relevant information, let me know if anything else
  would be helpful to troubleshoot.
 
  FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
 #17:
  Mon Jun  6 19:40:19 EDT 2011
  r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
   amd64
 
  em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
 0xe800-0xe83f
  mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on
 pci7
 
  em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
  hdr=0x00
  vendor = 'Intel Corporation'
  device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
  class  = network
  subclass   = ethernet
 
  And, the SAS cards:
 
  dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.0.%driver: mpt
  dev.mpt.0.%location: slot=0 function=0
  dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
  subdevice=0xa580 class=0x01
  dev.mpt.0.%parent: pci1
  dev.mpt.0.debug: 3
  dev.mpt.0.role: 1
  dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.1.%driver: mpt
  dev.mpt.1.%location: slot=0 function=0
  dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
  subdevice=0xa580 class=0x01
  dev.mpt.1.%parent: pci2
  dev.mpt.1.debug: 3
  dev.mpt.1.role: 1
  dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.2.%driver: mpt
  dev.mpt.2.%location: slot=0 function=0
  dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
  subdevice=0x30a0 class=0x01
  dev.mpt.2.%parent: pci6
  dev.mpt.2.debug: 3
  dev.mpt.2.role: 1

 Please provide output from the following commands (as root):

 # pciconf -lvcb


hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   = PCI-PCI
atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'IXP SB600 Serial ATA Controller'
class  = mass storage
subclass   = ATA
ohci0@pci0:0:19:0: class=0x0c0310 card=0x82881043 chip=0x43871002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'IXP SB600 USB Controller

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Joshua Boyd

In the kernel. Here's my kernel configuration:

http://pastebin.com/raw.php?i=4JL814m3

On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
 pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   =

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Jack Vogel

I have hardware now, am working on reproducing this. Just curious, do you
have
the em driver defined in the kernel, or as a module?

Jack


On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
 pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   = PCI-PCI
 atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'

Re: em0 watchdog timeouts

2010-08-11 Thread Vonarburg David

Hi
i am also searching for the dcgdis.zip file to prevent watchdog timeout on em0 
device
Where can i get it
Thanks
David

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2010-08-11 Thread Jeremy Chadwick

On Wed, Aug 11, 2010 at 02:26:01PM +0200, Vonarburg David wrote:
 Hi
 i am also searching for the dcgdis.zip file to prevent watchdog timeout on 
 em0 device
 Where can i get it
 Thanks
 David

Which watchdog issue are you referring to?  There are many reported
watchdog timeout issues with em(4) in recent days.

Are you referring to the power saving bit in the EEPRO, specific to
certain Intel 82573 NICs?  It's discussed here (see Networking
(hardware and drivers)):

http://wiki.freebsd.org/BugBusting/Commonly_reported_issues

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Daniel Bond


Hi,

I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the  
past 6months too. It looks related.


I've tried to replace the hardware 3 times (2 different IBM x3755  
chassis, one IBM x3650 chassis).
I tried first with onboard broadcom NICs (bce-based) PCIx-based, until  
I had issues with watchdog timeout.


I tried replacing it with a 4-port pci-x Intel NIC, which gave me same  
problems. I was told that the 4-port intel NICs had an onboard bus- 
controller, that
could cause trouble, so I replaced this with a 2-port PCI-e intel,  
which I was told by a Sepherosa Ziehau was the best performing gig-e  
NIC (rx/tx).


Still getting watchdog timeouts, I tried upgrading all sort of sysctls  
I found in mailing-list threads (disable msi/msix interrupts, adjust  
rx/tx processing, etc, etc).
I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC,  
etc, etc) to newest version. I also tried using a different qlogic  
isp(4) FC-controller (PCI-e).


No matter what I tried, I could not diagnose this problem, or at least  
fix it. Also it happened rarely enough, to not be easy to debugging. I  
would get a series of watchdog timeout -- resetting, until the NIC  
would go completly offline - at the point I'd reboot it from console.


This happened about once every 1-10 days, usually about 11-13:00. This  
machine has now been replaced with Linux, unfortunately, just to avoid  
more customer complaints and downtime. The IBM x3755 with FreeBSD7.2  
which was replaced with Linux, is still online, and
can be put at disposal for any developers who would like to debug this  
further.


Like Stefan Krueger mentioned, this machine is also running as NFS  
server, with a mix of BSD and Linux clients, and it's getting hit  
pretty hard by clients.



Hope we can iron this bug out, in the future.


Best regards,


Daniel Bond.



On Oct 2, 2009, at 10:36 PM, Rudy wrote:



Ah, I'll stop messing with them.


I just set them all to 0 to see if that will help and noticed the card
was leaving tx_int_delay=1.

# sysctl dev.em.4.debug=1
Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1,  
tx_abs_int_delay = 0
Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0,  
rx_abs_int_delay = 0


# sysctl dev.em.4
dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
dev.em.4.rx_int_delay: 0
dev.em.4.tx_int_delay: 0
dev.em.4.rx_abs_int_delay: 0
dev.em.4.tx_abs_int_delay: 0

Splitting traffic to different ports has brought down the watchdog
events to once a day.  ... essentially, I have a quad 30Mbps (not quad
1Gbps) card.  heheh.
Would turning off net.inet.ip.fastforwarding or any other setting  
help?


Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I  
have

a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.

Rudy



Jack Vogel wrote:
Watchdog resets the adapter. Messing with these values is of  
dubious value

anyway.

Jack


On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net  
wrote:




I noticed something interesting.

I set the rc_int_delay to 0:
sysctl dev.em.5.rx_int_delay=0

Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0,  
rx_abs_int_delay = 66


After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay  
is

now 32:
Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32,  
rx_abs_int_delay =

66

However, running sysctl dev.em.5 shows it as 0:
dev.em.5.rx_int_delay: 0
dev.em.5.tx_int_delay: 66

Seems like the adapter and the kernel don't agree on the  
rx_int_delay

value.

Rudy







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org 





PGP.sig
Description: This is a digitally signed message part

Re: em0 watchdog timeouts

2009-10-05 Thread Robert Blayzor


On Oct 2, 2009, at 4:36 PM, Rudy wrote:
Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I  
have

a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.



Just curious, have you tried (or are you using) device polling?

--
Robert Blayzor, BOFH
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel

This posting just muddies the issue, first you talk about having a problem
that
involves Broadcom, ok, so post about that on something other than em :)

Then you make some references to hardware that you might have bought
but didn't, I'm not about debugging 'possible worlds problems' though so
can't help you there either :)

Finally you never say what the actual hardware is, other than a person who
I do not know told you it was the best performer... so, what exactly is it?

You have a problem once every 10 days,  and at a specific time no less,
this almost always means something in your environment, a cron job run
amok, a piece of hardware that resets, I dunno, but the last thing I would
suspect given this description is the driver.

You need a good sysadmin for this debugging I would venture, not a driver
developer.

Jack


On Mon, Oct 5, 2009 at 7:19 AM, Daniel Bond d...@danielbond.org wrote:

 Hi,

 I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past
 6months too. It looks related.

 I've tried to replace the hardware 3 times (2 different IBM x3755 chassis,
 one IBM x3650 chassis).
 I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I
 had issues with watchdog timeout.

 I tried replacing it with a 4-port pci-x Intel NIC, which gave me same
 problems. I was told that the 4-port intel NICs had an onboard
 bus-controller, that
 could cause trouble, so I replaced this with a 2-port PCI-e intel, which I
 was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx).

 Still getting watchdog timeouts, I tried upgrading all sort of sysctls I
 found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx
 processing, etc, etc).
 I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc,
 etc) to newest version. I also tried using a different qlogic isp(4)
 FC-controller (PCI-e).

 No matter what I tried, I could not diagnose this problem, or at least fix
 it. Also it happened rarely enough, to not be easy to debugging. I would get
 a series of watchdog timeout -- resetting, until the NIC would go
 completly offline - at the point I'd reboot it from console.

 This happened about once every 1-10 days, usually about 11-13:00. This
 machine has now been replaced with Linux, unfortunately, just to avoid more
 customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was
 replaced with Linux, is still online, and
 can be put at disposal for any developers who would like to debug this
 further.

 Like Stefan Krueger mentioned, this machine is also running as NFS server,
 with a mix of BSD and Linux clients, and it's getting hit pretty hard by
 clients.


 Hope we can iron this bug out, in the future.


 Best regards,


 Daniel Bond.




 On Oct 2, 2009, at 10:36 PM, Rudy wrote:


 Ah, I'll stop messing with them.


 I just set them all to 0 to see if that will help and noticed the card
 was leaving tx_int_delay=1.

 # sysctl dev.em.4.debug=1
 Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0
 Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0

 # sysctl dev.em.4
 dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
 dev.em.4.rx_int_delay: 0
 dev.em.4.tx_int_delay: 0
 dev.em.4.rx_abs_int_delay: 0
 dev.em.4.tx_abs_int_delay: 0

 Splitting traffic to different ports has brought down the watchdog
 events to once a day.  ... essentially, I have a quad 30Mbps (not quad
 1Gbps) card.  heheh.
 Would turning off net.inet.ip.fastforwarding or any other setting help?

 Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I have
 a feeling that isn't related to the NIC at all, but I'm not sure what
 else to try.

 Rudy



 Jack Vogel wrote:

 Watchdog resets the adapter. Messing with these values is of dubious
 value
 anyway.

 Jack


 On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:


  I noticed something interesting.

 I set the rc_int_delay to 0:
 sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
 Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay =
 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
 Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy





 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Daniel Bond


Hi Jack,

I'll comment your mail inline:


On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote:

This posting just muddies the issue, first you talk about having a  
problem that
involves Broadcom, ok, so post about that on something other than  
em :)


I only meant to indicate that the problem might exist outside the  
intel driver.
I'm also indicating that it happens with several drivers (bge, bce and  
em) on several different machines, on both pci-x and pci-e.


I'm sorry if this is confusing to you, but I still think it's relevant  
to mention.




Then you make some references to hardware that you might have bought
but didn't, I'm not about debugging 'possible worlds problems'  
though so

can't help you there either :)


No. I only made references to hardware I actually used, and had real- 
world issues with.




Finally you never say what the actual hardware is, other than a  
person who
I do not know told you it was the best performer... so, what exactly  
is it?


Sepherosa is a guy that writes drivers for BSD based operating  
systems. Including FreeBSD. He has a lot of knowledge in this area.

http://people.freebsd.org/~sephe/

The NIC you are referring to, the one sephe recommended me, is a  
82571EB. I didn't mention specific hardware, as I think it's more  
important
to note this is an issue I'm experiencing across different sets of  
hardware and drivers.




You have a problem once every 10 days,  and at a specific time no  
less,

this almost always means something in your environment, a cron job run
amok, a piece of hardware that resets, I dunno, but the last thing I  
would

suspect given this description is the driver.


This is not what I wrote. I wrote I had a problem every 1-10 days, but  
it would usually happen once every 3-4 days. At worst, every day in  
periods.


It's not at any specific time. If you read my email correctly, I say  
it *usually* happens arround 11-13:00,

but it has happened at random times too.

This is my point exactly. I don't think it's the Intel-driver, I think  
the problem is elsewhere. I had a suspicion it had to do with the  
combination of nic + qlogic fc-controller, but I have no evidence of  
this.




You need a good sysadmin for this debugging I would venture, not a  
driver

developer.


What I need is useful advice/help. I never stated I needed a driver  
developer.


I'd like to be able to run my favorite OS on cool hardware, in the  
future, for a high-performing NFS-server, without problems like I've  
experienced the past 6months, on a production system.
Please note that I'm managing a server-park almost completely based on  
FreeBSD, and I'm running many NFS servers on other hardware, for other  
services, without issues.


I've seen several other FreeBSD-users having problems with this too,  
so I think it's of importance for the project. As I mentioned  
originally, I'm happy to dispose the hardware to any FreeBSD developer
that might want to look further into this. Debugging it further is  
above my skill-set, I don't even know where to begin looking,  
especially since I can't produce any panics.


I'm sorry to say, but your reply was %0 useful, Jack.



Jack



- Daniel


PGP.sig
Description: This is a digitally signed message part

Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel

Sorry, its a Monday morning, I was being kinda facetious, guess it didn't
work very well :) I apologize.

I know it must be annoying for you, its as much so for me when its something
I can't just fix because its not reproducible. So, I feel your pain.

Will try to restrain my Monday blues in the future.

Jack


On Mon, Oct 5, 2009 at 11:32 AM, Daniel Bond d...@danielbond.org wrote:

 Hi Jack,

 I'll comment your mail inline:


 On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote:

  This posting just muddies the issue, first you talk about having a problem
 that
 involves Broadcom, ok, so post about that on something other than em :)


 I only meant to indicate that the problem might exist outside the intel
 driver.
 I'm also indicating that it happens with several drivers (bge, bce and em)
 on several different machines, on both pci-x and pci-e.

 I'm sorry if this is confusing to you, but I still think it's relevant to
 mention.


 Then you make some references to hardware that you might have bought
 but didn't, I'm not about debugging 'possible worlds problems' though so
 can't help you there either :)


 No. I only made references to hardware I actually used, and had real-world
 issues with.


 Finally you never say what the actual hardware is, other than a person who
 I do not know told you it was the best performer... so, what exactly is
 it?


 Sepherosa is a guy that writes drivers for BSD based operating systems.
 Including FreeBSD. He has a lot of knowledge in this area.
 http://people.freebsd.org/~sephe/ http://people.freebsd.org/%7Esephe/

 The NIC you are referring to, the one sephe recommended me, is a 82571EB. I
 didn't mention specific hardware, as I think it's more important
 to note this is an issue I'm experiencing across different sets of hardware
 and drivers.


 You have a problem once every 10 days,  and at a specific time no less,
 this almost always means something in your environment, a cron job run
 amok, a piece of hardware that resets, I dunno, but the last thing I would
 suspect given this description is the driver.


 This is not what I wrote. I wrote I had a problem every 1-10 days, but it
 would usually happen once every 3-4 days. At worst, every day in periods.

 It's not at any specific time. If you read my email correctly, I say it
 *usually* happens arround 11-13:00,
 but it has happened at random times too.

 This is my point exactly. I don't think it's the Intel-driver, I think the
 problem is elsewhere. I had a suspicion it had to do with the combination of
 nic + qlogic fc-controller, but I have no evidence of this.


 You need a good sysadmin for this debugging I would venture, not a driver
 developer.


 What I need is useful advice/help. I never stated I needed a driver
 developer.

 I'd like to be able to run my favorite OS on cool hardware, in the future,
 for a high-performing NFS-server, without problems like I've experienced the
 past 6months, on a production system.
 Please note that I'm managing a server-park almost completely based on
 FreeBSD, and I'm running many NFS servers on other hardware, for other
 services, without issues.

 I've seen several other FreeBSD-users having problems with this too, so I
 think it's of importance for the project. As I mentioned originally, I'm
 happy to dispose the hardware to any FreeBSD developer
 that might want to look further into this. Debugging it further is above my
 skill-set, I don't even know where to begin looking, especially since I
 can't produce any panics.

 I'm sorry to say, but your reply was %0 useful, Jack.


 Jack


 - Daniel

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Greg Byshenk

On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote:
 
 What I need is useful advice/help. I never stated I needed a driver  
 developer.
 
 I'd like to be able to run my favorite OS on cool hardware, in the  
 future, for a high-performing NFS-server, without problems like I've  
 experienced the past 6months, on a production system.
 Please note that I'm managing a server-park almost completely based on  
 FreeBSD, and I'm running many NFS servers on other hardware, for other  
 services, without issues.
 
 I've seen several other FreeBSD-users having problems with this too,  
 so I think it's of importance for the project. As I mentioned  
 originally, I'm happy to dispose the hardware to any FreeBSD developer
 that might want to look further into this. Debugging it further is  
 above my skill-set, I don't even know where to begin looking,  
 especially since I can't produce any panics.

I can give one bit of advice that helped me in a similar situation:
check you motherboards.

I run about a dozen fileservers on FreeBSD, and have always been very
happy with their performance, but some months ago I began to experience
problems with one of them.  These problems were 'watchdog timeout'
errors.  Tried all manner of things, different NICs of different types,
changing settings, etc., but nothing helped over the long term.  At 
some point, when very heavy i/o was going on to our Beowulf cluster, the
'watchdog timeouts' would begin.  What was strange is that other 
(supposedly identical) machines handled _more_ i/o without a problem.

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.  I changed the motherboard and all the problems went away,
never to reappear.

I don't know if it was a specific problem with that particular
motherboard, or something about that model, but for whatever reason, it
appears that the buses just couldn't handle a RAID card and three active
NICs.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Rudy


Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.


This is a good piece of info.  I can try swapping out the MB and see 
what happens.


I do want to add: thank you Jack for all your help and if does turn out 
to be the MB, then double thanks.  Viva Monday!   :)


What would be nice would be MORE info for a watchdog timeout... maybe a 
sysctl dev.watchdog.debug=1 or something where when a watchdog event 
happened --- for whatever driver --- a bunch of stats were dumped 
relating to the event.


Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel

Hmmm, I did have one of the drivers print more info at watchdog time, but I
just looked
and that's not em, time to add that I guess.

Since you're in the driver there isn't a huge amount of info that you can
print, it still
may not be enough to help.

BTW, I've always been somewhat dissatisfied with the watchdog design and
think
its kinda flawed, I could try and make you an experimental with debug and
some
changes that you can try if you'd like.

Jack


On Mon, Oct 5, 2009 at 1:54 PM, Rudy cra...@monkeybrains.net wrote:

 Finally, while doing some comparisons, I realized that the motherboard
 having the problem was _not_ the same as the others; it was similar, but
 not identical.


 This is a good piece of info.  I can try swapping out the MB and see what
 happens.

 I do want to add: thank you Jack for all your help and if does turn out to
 be the MB, then double thanks.  Viva Monday!   :)

 What would be nice would be MORE info for a watchdog timeout... maybe a
 sysctl dev.watchdog.debug=1 or something where when a watchdog event
 happened --- for whatever driver --- a bunch of stats were dumped relating
 to the event.

 Rudy

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-05 Thread Rudy (bulk)




BTW, I've always been somewhat dissatisfied with the watchdog design and
think
its kinda flawed, I could try and make you an experimental with debug and
some
changes that you can try if you'd like.
  


I'm game -- it would be nice if the machine still reset the watchdog in 
3 seconds and didn't cause any more damage from the debug code (eg a 
panic).  :)


My frequency of watchdog events is about 2 or 3 times per day.
I am running:   Intel(R) PRO/1000 Network Connection 6.9.12




Rudy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-02 Thread Rudy


I noticed something interesting.

I set the rc_int_delay to 0:
 sysctl dev.em.5.rx_int_delay=0

Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
 Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
now 32:
 Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66

However, running sysctl dev.em.5 shows it as 0:
dev.em.5.rx_int_delay: 0
dev.em.5.tx_int_delay: 66

Seems like the adapter and the kernel don't agree on the rx_int_delay value.

Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-02 Thread Jack Vogel

Watchdog resets the adapter. Messing with these values is of dubious value
anyway.

Jack


On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:


 I noticed something interesting.

 I set the rc_int_delay to 0:
  sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
  Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
  Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-02 Thread Rudy


Ah, I'll stop messing with them. 


I just set them all to 0 to see if that will help and noticed the card
was leaving tx_int_delay=1.

# sysctl dev.em.4.debug=1
Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0
Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0

# sysctl dev.em.4
dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
dev.em.4.rx_int_delay: 0
dev.em.4.tx_int_delay: 0
dev.em.4.rx_abs_int_delay: 0
dev.em.4.tx_abs_int_delay: 0

Splitting traffic to different ports has brought down the watchdog
events to once a day.  ... essentially, I have a quad 30Mbps (not quad
1Gbps) card.  heheh.
Would turning off net.inet.ip.fastforwarding or any other setting help?

Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I have
a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.

Rudy



Jack Vogel wrote:
 Watchdog resets the adapter. Messing with these values is of dubious value
 anyway.

 Jack


 On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:

   
 I noticed something interesting.

 I set the rc_int_delay to 0:
  sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
  Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
  Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy

 

   

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-01 Thread Rudy (bulk)



I have rxd and txd set to 1024.  How high can I safely go?

# add more descriptors to em devices.
hw.em.rxd=1024
hw.em.txd=1024

### other settings... I have tried rx_int_delay=0 and 32 ... doesn't 
seem to make the watchdogs go away.


dev.em.4.rx_int_delay: 32
dev.em.4.tx_int_delay: 66
dev.em.4.rx_abs_int_delay: 66
dev.em.4.tx_abs_int_delay: 66
dev.em.4.rx_processing_limit: 300



I am using a PCI-Express (x8) PCI-e slot according to the motherboard specs:
http://supermicro.com/products/motherboard/Xeon3000/3210/X7SBi.cfm

Rudy



Jack Vogel wrote:

Increase the size of your TX ring, meaning the number of TX descriptors.

You said this is a quad port card, what size PCI E slot are you in? On
some motherboards slot connectors might suggest its of a certain size
but its not really wired fully. If you are not in a x8 lane slot move it to
one.

What about system tuning?

Some ideas, let me know how it goes.

Jack


On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote:

  

Rudy wrote:



Rudy wrote:

  

I am having watchdog timeout issues



Oh, here is some more info from 'pciconf -lcv'.

I offloaded half the traffic from em0 to em5 and there has only been one
watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We do
streaming out of our network and the 3 second outage really messes things
up...


e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086
rev=0x03 hdr=0x00
   vendor = 'Intel Corporation'
   device = '82573E Intel Corporation 82573E Gigabit Ethernet
Controller (Copper)'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086
rev=0x00 hdr=0x00
   vendor = 'Intel Corporation'
   device = '82573L Intel PRO/1000 PL Network Adaptor'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002
rev=0x02



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-01 Thread Rudy (bulk)



I have a quad card in a PCIe 8x port, and there are 2 ports on the 
motherboard.  I just read the manual and see that the on board ports are 
PCIe 1x.


I have been seeing watchdog events on the onboard ports as well as on 
the PCIe card.  The router is doing roughly 50Mbps on em0, em4  em5.


Does i386 vs amd64 make any difference to the em0 driver?

bumping TX Ring to 2048.  grep em /boot/loader.conf

if_em_load=YES
hw.em.rxd=2048
hw.em.txd=2048

Rudy





You said this is a quad port card, what size PCI E slot are you in? 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-01 Thread Jack Vogel

I would say that 1024 should be enough, I thought maybe you were at 256.
amd64 kernels just perform better at a lot of things, however I/O is not
necessarily
one of them, so I wouldn't claim it for sure, still I'd always default to 64
bit these
days unless there's some other reason not to.

What about system load, perhaps something is bogging the thing down so that
it
cannot adequately service the network interrupts??

The specs of the motherboard are respectable, how much memory does it have?

Another thought, are you using the out-of-band management features (like
IPMI)?
If you are not then go into the BIOS and disable that stuff.

Have you run netstat or some other resource monitor to see if you run out of
anything that might coincide with the watchdogs...

Jack




On Thu, Oct 1, 2009 at 2:12 PM, Rudy (bulk) cra...@monkeybrains.net wrote:


 I have a quad card in a PCIe 8x port, and there are 2 ports on the
 motherboard.  I just read the manual and see that the on board ports are
 PCIe 1x.

 I have been seeing watchdog events on the onboard ports as well as on the
 PCIe card.  The router is doing roughly 50Mbps on em0, em4  em5.

 Does i386 vs amd64 make any difference to the em0 driver?

 bumping TX Ring to 2048.  grep em /boot/loader.conf

 if_em_load=YES
 hw.em.rxd=2048
 hw.em.txd=2048

 Rudy





 You said this is a quad port card, what size PCI E slot are you in?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-10-01 Thread Rudy


 What about system load, perhaps something is bogging the thing down so that
 it cannot adequately service the network interrupts??

Hardly anything is running on the box...
Only things on the box: zebra bgpd (3 peers...)  sshd snmpd


Here is the top of 'top':

load averages:  0.06,  0.08,  0.07   up 7+01:08:16  17:26:39
15 processes:  1 running, 14 sleeping
CPU:  0.0% user,  0.0% nice,  4.5% system,  0.0% interrupt, 95.5% idle
Mem: 193M Active, 42M Inact, 156M Wired, 196K Cache, 83M Buf, 1610M Free


 The specs of the motherboard are respectable, how much memory does it have?

 Another thought, are you using the out-of-band management features (like
 IPMI)?
 If you are not then go into the BIOS and disable that stuff.

No IPMI card added to that motherboard (you have to add a daughter card).
 Have you run netstat or some other resource monitor to see if you run out of
 anything that might coincide with the watchdogs...
What should I look for?

# netstat -s
4105/4610/8715 mbufs in use (current/cache/total)
4103/2303/6406/25600 mbuf clusters in use (current/cache/total/max)
4103/2297 mbuf+clusters out of packet secondary zone in use (current/cache)
0/44/44/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
9232K/5934K/15166K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines



Are there specific router-only tunings that may help?

Here are my sysctl settings:

kern.ipc.somaxconn=256
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
kern.ipc.nmbcluster=32768

net.inet.icmp.icmplim=1000
net.inet.ip.fastforwarding=1
net.inet.ip.intr_queue_maxlen=92
net.inet.icmp.drop_redirect=1

dev.em.0.rx_processing_limit=200
dev.em.1.rx_processing_limit=200
dev.em.2.rx_processing_limit=200
#dev.em.4.rx_processing_limit=200
# test setting processing limit up to 300
dev.em.4.rx_processing_limit=300
dev.em.5.rx_processing_limit=200

Thanks,
Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-09-30 Thread Rudy

Rudy wrote:
 I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
 http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

 link to dcgdis.zip didn't work.  Do you have a copy?
   

Thanks, Jack.  Got the file and flashed -- no upgrade needed.

So, while the router was offline, I flashed the motherboards bios
(Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
version of the em driver.  Still, watchdog timeouts.  Sigh.

Will the Intel Gigabit ET Quad Port Adapter make my the timeouts go away???
Should I be using amd64???
Should tx_int_delay=0?


Summary:
 2 Nics on Motherboard + quad card in PCIe slot. 
 Watchdog timeouts on motherboard nics and on quad card nic when
bandwidth  10Mbps
 There is minimal (bgp session) TCP to the box... it only forwards
packets between interfaces.

# uname -r -m
7.2-STABLE i386

# dmesg | grep ^em
em0: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2000-0x201f
mem 0xd022-0xd023,0xd020-0xd021 irq 16 at device 0.0 on pci5
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:15:17:78:99:70
em1: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2020-0x203f
mem 0xd026-0xd027,0xd024-0xd025 irq 17 at device 0.1 on pci5
em1: Using MSI interrupt
em1: [FILTER]
em1: Ethernet address: 00:15:17:78:99:71
em2: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3000-0x301f
mem 0xd032-0xd033,0xd030-0xd031 irq 17 at device 0.0 on pci6
em2: Using MSI interrupt
em2: [FILTER]
em2: Ethernet address: 00:15:17:78:99:72
em3: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3020-0x303f
mem 0xd036-0xd037,0xd034-0xd035 irq 18 at device 0.1 on pci6
em3: Using MSI interrupt
em3: [FILTER]
em3: Ethernet address: 00:15:17:78:99:73
em4: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x4000-0x401f
mem 0xd040-0xd041 irq 16 at device 0.0 on pci13
em4: Using MSI interrupt
em4: [FILTER]
em4: Ethernet address: 00:30:48:67:14:50
em5: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x5000-0x501f
mem 0xd050-0xd051 irq 17 at device 0.0 on pci15
em5: Using MSI interrupt
em5: [FILTER]
em5: Ethernet address: 00:30:48:67:14:51


# vmstat -i
interrupt  total   rate
irq1: atkbd0 710  0
irq4: sio0 3  0
irq23: atapci0 14943  0
cpu0: timer929753417   2000
irq256: em0702754836   1511
irq257: em12  0
irq260: em4469338728   1009
irq261: em5 78605337169
cpu1: timer929753403   2000
Total 3110221379   6690

# sysctl dev.em.0.stats=1
Sep 30 01:08:20 mango kernel: em0: Excessive collisions = 0
Sep 30 01:08:20 mango kernel: em0: Sequence errors = 0
Sep 30 01:08:20 mango kernel: em0: Defer count = 0
Sep 30 01:08:20 mango kernel: em0: Missed Packets = 101469
Sep 30 01:08:20 mango kernel: em0: Receive No Buffers = 0
Sep 30 01:08:20 mango kernel: em0: Receive Length Errors = 0
Sep 30 01:08:20 mango kernel: em0: Receive errors = 0
Sep 30 01:08:20 mango kernel: em0: Crc errors = 0
Sep 30 01:08:20 mango kernel: em0: Alignment errors = 0
Sep 30 01:08:20 mango kernel: em0: Collision/Carrier extension errors = 0
Sep 30 01:08:20 mango kernel: em0: RX overruns = 0
Sep 30 01:08:20 mango kernel: em0: watchdog timeouts = 15
Sep 30 01:08:20 mango kernel: em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK
MSIX IRQ = 0
Sep 30 01:08:20 mango kernel: em0: XON Rcvd = 0
Sep 30 01:08:20 mango kernel: em0: XON Xmtd = 0
Sep 30 01:08:20 mango kernel: em0: XOFF Rcvd = 0
Sep 30 01:08:20 mango kernel: em0: XOFF Xmtd = 0
Sep 30 01:08:20 mango kernel: em0: Good Packets Rcvd = 1056196797
Sep 30 01:08:20 mango kernel: em0: Good Packets Xmtd = 1088726903
Sep 30 01:08:20 mango kernel: em0: TSO Contexts Xmtd = 4088
Sep 30 01:08:20 mango kernel: em0: TSO Contexts Failed = 0

# sysctl dev.em.0.debug=1
Sep 30 01:34:59 mango kernel: em0: Adapter hardware address = 0xc5159420
Sep 30 01:34:59 mango kernel: em0: CTRL = 0x401c0241 RCTL = 0x8002
Sep 30 01:34:59 mango kernel: em0: Packet buffer = Tx=16k Rx=32k
Sep 30 01:34:59 mango kernel: em0: Flow control watermarks high = 30720
low = 29220
Sep 30 01:34:59 mango kernel: em0: tx_int_delay = 66, tx_abs_int_delay = 66
Sep 30 01:34:59 mango kernel: em0: rx_int_delay = 0, rx_abs_int_delay = 66
Sep 30 01:34:59 mango kernel: em0: fifo workaround = 0, fifo_reset_count = 0
Sep 30 01:34:59 mango kernel: em0: hw tdh = 980, hw tdt = 980
Sep 30 01:34:59 mango kernel: em0: hw rdh = 203, hw rdt = 202
Sep 30 01:34:59 mango kernel: em0: Num Tx descriptors avail = 1024
Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail1 = 0
Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail2 = 0
Sep 30 01:34:59 mango kernel: em0: Std mbuf failed = 0
Sep 30 01:34:59 mango kernel: em0: Std mbuf cluster

Re: em0 watchdog timeouts

2009-09-30 Thread Stefan Krueger

In muc.lists.freebsd.stable, you wrote:
 Rudy wrote:
 I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
 http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

 link to dcgdis.zip didn't work.  Do you have a copy?
   

 Thanks, Jack.  Got the file and flashed -- no upgrade needed.

 So, while the router was offline, I flashed the motherboards bios
 (Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
 version of the em driver.  Still, watchdog timeouts.  Sigh.

Hi Rudy,

may I ask which clients have access to your FreeBSD 7.2 server?

I had similar problems a few days ago; I have no idea what exactly
happend, but Ubuntu Linux (NIS and NFS client) made my em0
timeout after a while, too, (and even crashed my FreeBSD 7.2 box
a few times!)

This box was rock solid before, I even thought my Intel NIC was
broken...

Anyway, since I had no time (and clue) to analyze this further, I took
the risk and upgraded to 8.0-RC1 and, well, everything is working fine
now :-)

HTH
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-09-30 Thread Rudy (bulk)


Stefan Krueger wrote:

In muc.lists.freebsd.stable, you wrote:
  

Rudy wrote:


I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

link to dcgdis.zip didn't work.  Do you have a copy?
  
  

Thanks, Jack.  Got the file and flashed -- no upgrade needed.

So, while the router was offline, I flashed the motherboards bios
(Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
version of the em driver.  Still, watchdog timeouts.  Sigh.



Hi Rudy,

may I ask which clients have access to your FreeBSD 7.2 server?


None.  It is  a router and has minimal services on it (bgpd / zebra / 
snmpd).


Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-09-30 Thread Rudy


Rudy wrote:

Rudy wrote:

I am having watchdog timeout issues


Oh, here is some more info from 'pciconf -lcv'.

I offloaded half the traffic from em0 to em5 and there has only been one 
watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We 
do streaming out of our network and the 3 second outage really messes 
things up...



e...@pci0:5:0:0:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:5:0:1:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:0:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:1:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:13:0:0:	class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 
hdr=0x00

vendor = 'Intel Corporation'
device = '82573E Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper)'

class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
e...@pci0:15:0:0:	class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 
hdr=0x00

vendor = 'Intel Corporation'
device = '82573L Intel PRO/1000 PL Network Adaptor'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
vgap...@pci0:17:3:0:	class=0x03 card=0xd18015d9 chip=0x515e1002 
rev=0x02

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: em0 watchdog timeouts

2009-09-30 Thread Jack Vogel

Increase the size of your TX ring, meaning the number of TX descriptors.

You said this is a quad port card, what size PCI E slot are you in? On
some motherboards slot connectors might suggest its of a certain size
but its not really wired fully. If you are not in a x8 lane slot move it to
one.

What about system tuning?

Some ideas, let me know how it goes.

Jack


On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote:

 Rudy wrote:

 Rudy wrote:

 I am having watchdog timeout issues


 Oh, here is some more info from 'pciconf -lcv'.

 I offloaded half the traffic from em0 to em5 and there has only been one
 watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We do
 streaming out of our network and the 3 second outage really messes things
 up...


 e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086
 rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
device = '82573E Intel Corporation 82573E Gigabit Ethernet
 Controller (Copper)'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
 e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086
 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82573L Intel PRO/1000 PL Network Adaptor'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
 vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002
 rev=0x02

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

em0 watchdog timeouts -- looking for dcgdis.zip

2009-09-22 Thread Rudy

I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

link to dcgdis.zip didn't work.  Do you have a copy?

Thanks in advance,
Rudy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

RE: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-02-02 Thread NISHIMURA Yutaka

Hello.
This is Yutaka.

I am unskilled using english.
If you don't understand my english , please teach for me.

I came from FreeBSD-users-jp (Japanese mailing-list) which 
has been guidanced.
http://home.jp.freebsd.org/cgi-bin/showmail/FreeBSD-users-jp/90318

This thired.
 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)   Mike 
 Andrews
 * 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)  
  Jack Vogel
   o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial 
 workaround)   Mike Andrews
   o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial 
 workaround)   Jeremy Chadwick
 + 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial 
 workaround)   Jack Vogel
   # 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ 
 partial workaround)   John Baldwin 
   o 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial 
 workaround)   Mike Andrews 

I know this phenomenon. My environment generate it.

Changed setting with problem has improved. 
I was setting disable USB in BIOS.

Having been generated problem by 3 times after reboot.
however, no-problem last 3 days. 

enable USB, this problem 4-10 times per hour.
disable USB, this problem 3 times after reboot.


# pciconf -l -v
[EMAIL PROTECTED]:8:0:   class=0x02 card=0x002e8086 chip=0x100e8086 
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82540EM Gigabit Ethernet Controller'
class= network
subclass = ethernet


disable USB, dmesg.boot log.

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE #0: Sat Jan 20 12:12:56 JST 2007
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/NATBOX
ACPI APIC Table: AMIINT VIA_K7  
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: AMD Sempron(tm) (1403.19-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x681  Stepping = 1
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
  AMD Features=0xc0480800SYSCALL,MP,MMX+,3DNow+,3DNow
real memory  = 805240832 (767 MB)
avail memory = 774430720 (738 MB)
ioapic0 Version 0.3 irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: AMIINT VIA_K7 on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
agp0: VIA 8377 (Apollo KT400/KT400A/KT600) host to PCI bridge mem 
0xe000-0xe7ff at device 0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci0: display, VGA at device 5.0 (no driver attached)
atapci0: Promise PDC40518 SATA150 controller port 0xec00-0xec7f,0xe800-0xe8ff 
mem 0xdfffb000-0xdfffbfff,0xdffc-0xdffd irq 17 at device 6.0 on pci0
ata2: ATA channel 0 on atapci0
ata3: ATA channel 1 on atapci0
ata4: ATA channel 2 on atapci0
ata5: ATA channel 3 on atapci0
em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0xe400-0xe43f 
mem 0xdff8-0xdff9,0xdff6-0xdff7 irq 18 at device 8.0 on pci0
em0: Ethernet address: 00:07:e9:xx:x:xx
xl0: 3Com 3c905B-TX Fast Etherlink XL port 0xe000-0xe07f mem 
0xdfffaf80-0xdfffafff irq 17 at device 10.0 on pci0
miibus0: MII bus on xl0
xlphy0: 3Com internal media interface on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:10:5a:xx:xx:xx
atapci1: VIA 6420 SATA150 controller port 
0xdc00-0xdc07,0xd800-0xd803,0xd400-0xd407,0xd000-0xd003,0xcc00-0xcc0f,0xc800-0xc8ff
 irq 20 at device 15.0 on pci0
ata6: ATA channel 0 on atapci1
ata7: ATA channel 1 on atapci1
atapci2: VIA 8237 UDMA133 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
ata0: ATA channel 0 on atapci2
ata1: ATA channel 1 on atapci2
isab0: PCI-ISA bridge at device 17.0 on pci0
isa0: ISA bus on isab0
pci0: multimedia, audio at device 17.5 (no driver attached)
acpi_button1: Sleep Button on acpi0
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: PS/2 Mouse irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
fdc0: floppy drive controller port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 
on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3

Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-17 Thread Mike Andrews


Jack Vogel wrote:


On 1/16/07, Mike Andrews [EMAIL PROTECTED] wrote:

I have a strange issue with em0 watchdog timeouts that I think is not the
same as the ones everyone was having during the 6.2 beta cycle...

I have six systems, each with two Intel GigE ports onboard:

Systems A and B: Supermicro PDSMi+
Systems C and D: Supermicro PDSMi (without the plus)

[snip]

Several times a day, em0 will go down, give a watchdog timeout error on
the console, then come right back up on its own a few seconds later.  But
here's the weird twist: it ONLY happens on systems A and B, and ONLY when
running at gigabit speed.  If I knock the two switch ports down to 100
meg, the problem goes away.

[snip]

There are some management related issues with this NIC, first if you
have not done so make a DOS bootable device, and run this app I
am enclosing, it fixes the prom setting that is wrong on some devices.
It will do no harm, and it may solve things.

Let me know if it does fix it please.



No problems since running that tool almost 24 hours ago.  Looks like a 
fix.  Thanks again!



--
Mike Andrews  *  [EMAIL PROTECTED]  *  http://www.bit0.com
It's not news, it's Fark.com.  Carpe cavy!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-17 Thread John Baldwin

On Tuesday 16 January 2007 22:07, Jack Vogel wrote:
 On 1/16/07, Jeremy Chadwick [EMAIL PROTECTED] wrote:
  On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote:
   There are some management related issues with this NIC, first if you
   have not done so make a DOS bootable device, and run this app I
   am enclosing, it fixes the prom setting that is wrong on some devices.
   It will do no harm, and it may solve things.
 
  Jack,
 
  Can you expand on what this application changes in the PROM?  I have
  an Intel motherboard which suffers from similar to what the OP has
  reported (em0 watchdog timeouts), and was curious what the utility
  does before firing up the board and trying it.  Others may be curious
  to know, too.
 
 Hmmm, I'm rusty on this, its now been a year or more since I was
 first involved in the details, so I may need to amend this later :)
 
 But from memory, the issue is the value programmed into the MANC
 register by the PROM, I don't remember what bit it was, but one bit
 is mistakenly set, it causes the hardware to incorrectly intercept some
 packets.
 
 I was snowbound today, but I'll doublecheck on the detail tomorrow
 and amend if needed.
 
 Everyone note that this ONLY effects an 82573 NIC, so make sure of
 that before anything else.

Is this the IPMI/ASF stuff?  If so, you can also work around it by adding
'net.inet.ip.portrange.lowlast=665' to /etc/sysctl.conf.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-16 Thread Mike Andrews

I have a strange issue with em0 watchdog timeouts that I think is not the 
same as the ones everyone was having during the 6.2 beta cycle...


I have six systems, each with two Intel GigE ports onboard:

Systems A and B: Supermicro PDSMi+
Systems C and D: Supermicro PDSMi (without the plus)
System E: Tyan S2730U3GN
System F: Supermicro X5DPA-GG

On each system:
em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch.
em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch.

All six run FreeBSD 6.2-RELEASE i386, even though the first four are 
capable of running amd64.  They all have 2 GB of memory, except E which 
has 4 GB.  The kernel configs are all identical, and are not that far from 
GENERIC + SMP.


Several times a day, em0 will go down, give a watchdog timeout error on 
the console, then come right back up on its own a few seconds later.  But 
here's the weird twist: it ONLY happens on systems A and B, and ONLY when 
running at gigabit speed.  If I knock the two switch ports down to 100 
meg, the problem goes away.


The other four systems C thru F never have watchdog timeout issues; they 
always work perfectly even at gigabit speed.


So I'm trying to figure out if there are any other obvious hardware 
differences between the plus and non-plus version of the PDSMi that would 
be causing issues on the plus version.  Fortunately, at the moment we are 
not (yet) pushing anywhere near even 100 meg worth of traffic through 
these ports, so it's a tolerable workaround...  just kinda annoying. :)


The chipset is a bit different: the PDSMi is the Intel E7230 chipset for 
Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo 
support.  But apparently the NIC chips are identical: 82573V for em0 and 
82573L for em1.  The BIOS is identical too, so the chipsets must be pretty 
similar.  Nothing shares an IRQ with the NICs.  (USB is disabled in the 
BIOS.)  They do have different disk systems; A and B are SATA gmirror 
setups, while C and D use LSI Megaraid SCSI cards for their mirrors.


I have tried the obvious switching the cables out.  No difference at all.

I have NOT yet tried a different gigabit switch.

Hopefully that's enough detail to start; I can get into more specifics as 
needed.  (Kernel configs, dmesg output, IRQ details, disk details, IPMI, 
running apps, serial console access if needed...)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-16 Thread Mike Andrews


Jack Vogel wrote:


On 1/16/07, Mike Andrews [EMAIL PROTECTED] wrote:

I have a strange issue with em0 watchdog timeouts that I think is not the
same as the ones everyone was having during the 6.2 beta cycle...

I have six systems, each with two Intel GigE ports onboard:

Systems A and B: Supermicro PDSMi+
Systems C and D: Supermicro PDSMi (without the plus)

[snip]


Several times a day, em0 will go down, give a watchdog timeout error on
the console, then come right back up on its own a few seconds later.  But
here's the weird twist: it ONLY happens on systems A and B, and ONLY when
running at gigabit speed.  If I knock the two switch ports down to 100
meg, the problem goes away.

[snip]


There are some management related issues with this NIC, first if you
have not done so make a DOS bootable device, and run this app I
am enclosing, it fixes the prom setting that is wrong on some devices.
It will do no harm, and it may solve things.

Let me know if it does fix it please.


So far it seems like it DID fix it, but give me another day or two to 
watch it to be sure.  Thanks!


FYI, it only changed the PROM on the first NIC on each PDSMi+ box; it 
said the second NIC was fine.  (But since the first NIC was the one I 
was having trouble with...)


I ran it on the older PDSMi boxes and it said it changed both NICs on 
those, even though they were (and still are) working fine.



--
Mike Andrews  *  [EMAIL PROTECTED]  *  http://www.bit0.com
It's not news, it's Fark.com.  Carpe cavy!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-16 Thread Jeremy Chadwick

On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote:
 There are some management related issues with this NIC, first if you
 have not done so make a DOS bootable device, and run this app I
 am enclosing, it fixes the prom setting that is wrong on some devices.
 It will do no harm, and it may solve things.

Jack,

Can you expand on what this application changes in the PROM?  I have
an Intel motherboard which suffers from similar to what the OP has
reported (em0 watchdog timeouts), and was curious what the utility
does before firing up the board and trying it.  Others may be curious
to know, too.

Thanks, as always.

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

2007-01-16 Thread Jack Vogel


On 1/16/07, Jeremy Chadwick [EMAIL PROTECTED] wrote:

On Tue, Jan 16, 2007 at 10:53:04AM -0800, Jack Vogel wrote:
 There are some management related issues with this NIC, first if you
 have not done so make a DOS bootable device, and run this app I
 am enclosing, it fixes the prom setting that is wrong on some devices.
 It will do no harm, and it may solve things.

Jack,

Can you expand on what this application changes in the PROM?  I have
an Intel motherboard which suffers from similar to what the OP has
reported (em0 watchdog timeouts), and was curious what the utility
does before firing up the board and trying it.  Others may be curious
to know, too.


Hmmm, I'm rusty on this, its now been a year or more since I was
first involved in the details, so I may need to amend this later :)

But from memory, the issue is the value programmed into the MANC
register by the PROM, I don't remember what bit it was, but one bit
is mistakenly set, it causes the hardware to incorrectly intercept some
packets.

I was snowbound today, but I'll doublecheck on the detail tomorrow
and amend if needed.

Everyone note that this ONLY effects an 82573 NIC, so make sure of
that before anything else.

Cheers,

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

38 matches

Mail list logo