Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Joshua Boyd
If needed, I can reproduce this on demand. Just need to know what sort of
statistics are needed when the problem is occurring. I've had to turn off my
weekly scrubs until I can figure out how to fix this problem.

On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

 On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = 

Re: em0 watchdog timeouts on 8-STABLE

2011-06-21 Thread Jack Vogel
I cannot repro this, I used your kernel config, this is on a Dell 1850 btw,
I ran netperf stress from 3 clients, and have seen no watchdogs :(

Jack


On Tue, Jun 21, 2011 at 7:59 PM, Joshua Boyd boy...@jbip.net wrote:

 If needed, I can reproduce this on demand. Just need to know what sort of
 statistics are needed when the problem is occurring. I've had to turn off my
 weekly scrubs until I can figure out how to fix this problem.


 On Wed, Jun 15, 2011 at 8:37 PM, Joshua Boyd boy...@jbip.net wrote:

 In the kernel. Here's my kernel configuration:

 http://pastebin.com/raw.php?i=4JL814m3

  On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE
 or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to
 DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like
 watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD
 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Jeremy Chadwick
On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
 I recently updated my server to the latest 8-STABLE, and upgraded to v28
 ZFS. I have not had these problems on any other version of 8-STABLE or
 7-STABLE, which this box was upgraded from some time ago.
 
 Now, during my weekly scrub, I get the following messages and em0 is
 unresponsive:
 
 Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting
 Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
 Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
 Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting
 Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
 Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
 
 My scrub is scheduled to start at 03:00:00, so it looks like watchdog
 timeouts start occurring pretty quickly once I/O ramps up.
 
 Here's some possibly relevant information, let me know if anything else
 would be helpful to troubleshoot.
 
 FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE #17:
 Mon Jun  6 19:40:19 EDT 2011
 r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
  amd64
 
 em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port 0xe800-0xe83f
 mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on pci7
 
 em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
 hdr=0x00
 vendor = 'Intel Corporation'
 device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
 class  = network
 subclass   = ethernet
 
 And, the SAS cards:
 
 dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.0.%driver: mpt
 dev.mpt.0.%location: slot=0 function=0
 dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
 subdevice=0xa580 class=0x01
 dev.mpt.0.%parent: pci1
 dev.mpt.0.debug: 3
 dev.mpt.0.role: 1
 dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.1.%driver: mpt
 dev.mpt.1.%location: slot=0 function=0
 dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
 subdevice=0xa580 class=0x01
 dev.mpt.1.%parent: pci2
 dev.mpt.1.debug: 3
 dev.mpt.1.role: 1
 dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
 dev.mpt.2.%driver: mpt
 dev.mpt.2.%location: slot=0 function=0
 dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
 subdevice=0x30a0 class=0x01
 dev.mpt.2.%parent: pci6
 dev.mpt.2.debug: 3
 dev.mpt.2.role: 1

Please provide output from the following commands (as root):

# pciconf -lvcb
# vmstat -i
# sysctl -a | grep msi
# dmesg

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Joshua Boyd
On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:

 On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
  I recently updated my server to the latest 8-STABLE, and upgraded to v28
  ZFS. I have not had these problems on any other version of 8-STABLE or
  7-STABLE, which this box was upgraded from some time ago.
 
  Now, during my weekly scrub, I get the following messages and em0 is
  unresponsive:
 
  Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout -- resetting
  Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
  Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
  Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout -- resetting
  Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
  Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
 
  My scrub is scheduled to start at 03:00:00, so it looks like watchdog
  timeouts start occurring pretty quickly once I/O ramps up.
 
  Here's some possibly relevant information, let me know if anything else
  would be helpful to troubleshoot.
 
  FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
 #17:
  Mon Jun  6 19:40:19 EDT 2011
  r...@foghornleghorn.res.openband.net:/usr/obj/usr/src/sys/FOGHORNLEGHORN
   amd64
 
  em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
 0xe800-0xe83f
  mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on
 pci7
 
  em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086 rev=0x05
  hdr=0x00
  vendor = 'Intel Corporation'
  device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
  class  = network
  subclass   = ethernet
 
  And, the SAS cards:
 
  dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.0.%driver: mpt
  dev.mpt.0.%location: slot=0 function=0
  dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
  subdevice=0xa580 class=0x01
  dev.mpt.0.%parent: pci1
  dev.mpt.0.debug: 3
  dev.mpt.0.role: 1
  dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.1.%driver: mpt
  dev.mpt.1.%location: slot=0 function=0
  dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
  subdevice=0xa580 class=0x01
  dev.mpt.1.%parent: pci2
  dev.mpt.1.debug: 3
  dev.mpt.1.role: 1
  dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
  dev.mpt.2.%driver: mpt
  dev.mpt.2.%location: slot=0 function=0
  dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
  subdevice=0x30a0 class=0x01
  dev.mpt.2.%parent: pci6
  dev.mpt.2.debug: 3
  dev.mpt.2.role: 1

 Please provide output from the following commands (as root):

 # pciconf -lvcb


hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00
hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   = PCI-PCI
atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'IXP SB600 Serial ATA Controller'
class  = mass storage
subclass   = ATA
ohci0@pci0:0:19:0: class=0x0c0310 card=0x82881043 chip=0x43871002 rev=0x00
hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'IXP SB600 USB Controller 

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Joshua Boyd
In the kernel. Here's my kernel configuration:

http://pastebin.com/raw.php?i=4JL814m3

On Wed, Jun 15, 2011 at 8:20 PM, Jack Vogel jfvo...@gmail.com wrote:

 I have hardware now, am working on reproducing this. Just curious, do you
 have
 the em driver defined in the kernel, or as a module?

 Jack


 On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything
 else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0
 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5
 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
 pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002
 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   = 

Re: em0 watchdog timeouts on 8-STABLE

2011-06-15 Thread Jack Vogel
I have hardware now, am working on reproducing this. Just curious, do you
have
the em driver defined in the kernel, or as a module?

Jack


On Wed, Jun 15, 2011 at 2:09 AM, Joshua Boyd boy...@jbip.net wrote:

 On Wed, Jun 15, 2011 at 3:57 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:

  On Wed, Jun 15, 2011 at 03:14:43AM -0400, Joshua Boyd wrote:
   I recently updated my server to the latest 8-STABLE, and upgraded to
 v28
   ZFS. I have not had these problems on any other version of 8-STABLE or
   7-STABLE, which this box was upgraded from some time ago.
  
   Now, during my weekly scrub, I get the following messages and em0 is
   unresponsive:
  
   Jun 12 03:07:58 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:07:58 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:01 foghornleghorn kernel: em0: link state changed to UP
   Jun 12 03:08:47 foghornleghorn kernel: em0: Watchdog timeout --
 resetting
   Jun 12 03:08:47 foghornleghorn kernel: em0: link state changed to DOWN
   Jun 12 03:08:50 foghornleghorn kernel: em0: link state changed to UP
  
   My scrub is scheduled to start at 03:00:00, so it looks like watchdog
   timeouts start occurring pretty quickly once I/O ramps up.
  
   Here's some possibly relevant information, let me know if anything else
   would be helpful to troubleshoot.
  
   FreeBSD foghornleghorn.res.openband.net 8.2-STABLE FreeBSD 8.2-STABLE
  #17:
   Mon Jun  6 19:40:19 EDT 2011
   r...@foghornleghorn.res.openband.net:
 /usr/obj/usr/src/sys/FOGHORNLEGHORN
amd64
  
   em0: Intel(R) PRO/1000 Legacy Network Connection 1.0.3 port
  0xe800-0xe83f
   mem 0xfebe-0xfebf,0xfebc-0xfebd irq 20 at device 5.0 on
  pci7
  
   em0@pci0:7:5:0: class=0x02 card=0x13768086 chip=0x107c8086
 rev=0x05
   hdr=0x00
   vendor = 'Intel Corporation'
   device = 'Gigabit Ethernet Controller (Copper) rev 5 (82541PI)'
   class  = network
   subclass   = ethernet
  
   And, the SAS cards:
  
   dev.mpt.0.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.0.%driver: mpt
   dev.mpt.0.%location: slot=0 function=0
   dev.mpt.0.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.0.%parent: pci1
   dev.mpt.0.debug: 3
   dev.mpt.0.role: 1
   dev.mpt.1.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.1.%driver: mpt
   dev.mpt.1.%location: slot=0 function=0
   dev.mpt.1.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x15d9
   subdevice=0xa580 class=0x01
   dev.mpt.1.%parent: pci2
   dev.mpt.1.debug: 3
   dev.mpt.1.role: 1
   dev.mpt.2.%desc: LSILogic SAS/SATA Adapter
   dev.mpt.2.%driver: mpt
   dev.mpt.2.%location: slot=0 function=0
   dev.mpt.2.%pnpinfo: vendor=0x1000 device=0x0058 subvendor=0x1000
   subdevice=0x30a0 class=0x01
   dev.mpt.2.%parent: pci6
   dev.mpt.2.debug: 3
   dev.mpt.2.role: 1
 
  Please provide output from the following commands (as root):
 
  # pciconf -lvcb
 

 hostb0@pci0:0:0:0: class=0x06 card=0x59561002 chip=0x59561002 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 GFX Dual Slot'
class  = bridge
subclass   = HOST-PCI
 pcib1@pci0:0:2:0: class=0x060400 card=0x59561002 chip=0x59781002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port A)'
class  = bridge
subclass   = PCI-PCI
 pcib2@pci0:0:3:0: class=0x060400 card=0x59561002 chip=0x59791002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx0 port B)'
class  = bridge
subclass   = PCI-PCI
 pcib3@pci0:0:4:0: class=0x060400 card=0x59561002 chip=0x597a1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port A)'
class  = bridge
subclass   = PCI-PCI
 pcib4@pci0:0:6:0: class=0x060400 card=0x59561002 chip=0x597c1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port C)'
class  = bridge
subclass   = PCI-PCI
 pcib5@pci0:0:7:0: class=0x060400 card=0x59561002 chip=0x597d1002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (PCIe gpp port D)'
class  = bridge
subclass   = PCI-PCI
 pcib6@pci0:0:11:0: class=0x060400 card=0x59561002 chip=0x59801002 rev=0x00
 hdr=0x01
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
device = 'RD790 PCI to PCI bridge (external gfx1 port A)'
class  = bridge
subclass   = PCI-PCI
 atapci4@pci0:0:18:0: class=0x01018f card=0x81ef1043 chip=0x43801002
 rev=0x00
 hdr=0x00
vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
  

Re: em0 watchdog timeouts

2010-08-11 Thread Vonarburg David
Hi
i am also searching for the dcgdis.zip file to prevent watchdog timeout on em0 
device
Where can i get it
Thanks
David

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2010-08-11 Thread Jeremy Chadwick
On Wed, Aug 11, 2010 at 02:26:01PM +0200, Vonarburg David wrote:
 Hi
 i am also searching for the dcgdis.zip file to prevent watchdog timeout on 
 em0 device
 Where can i get it
 Thanks
 David

Which watchdog issue are you referring to?  There are many reported
watchdog timeout issues with em(4) in recent days.

Are you referring to the power saving bit in the EEPRO, specific to
certain Intel 82573 NICs?  It's discussed here (see Networking
(hardware and drivers)):

http://wiki.freebsd.org/BugBusting/Commonly_reported_issues

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Daniel Bond

Hi,

I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the  
past 6months too. It looks related.


I've tried to replace the hardware 3 times (2 different IBM x3755  
chassis, one IBM x3650 chassis).
I tried first with onboard broadcom NICs (bce-based) PCIx-based, until  
I had issues with watchdog timeout.


I tried replacing it with a 4-port pci-x Intel NIC, which gave me same  
problems. I was told that the 4-port intel NICs had an onboard bus- 
controller, that
could cause trouble, so I replaced this with a 2-port PCI-e intel,  
which I was told by a Sepherosa Ziehau was the best performing gig-e  
NIC (rx/tx).


Still getting watchdog timeouts, I tried upgrading all sort of sysctls  
I found in mailing-list threads (disable msi/msix interrupts, adjust  
rx/tx processing, etc, etc).
I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC,  
etc, etc) to newest version. I also tried using a different qlogic  
isp(4) FC-controller (PCI-e).


No matter what I tried, I could not diagnose this problem, or at least  
fix it. Also it happened rarely enough, to not be easy to debugging. I  
would get a series of watchdog timeout -- resetting, until the NIC  
would go completly offline - at the point I'd reboot it from console.


This happened about once every 1-10 days, usually about 11-13:00. This  
machine has now been replaced with Linux, unfortunately, just to avoid  
more customer complaints and downtime. The IBM x3755 with FreeBSD7.2  
which was replaced with Linux, is still online, and
can be put at disposal for any developers who would like to debug this  
further.


Like Stefan Krueger mentioned, this machine is also running as NFS  
server, with a mix of BSD and Linux clients, and it's getting hit  
pretty hard by clients.



Hope we can iron this bug out, in the future.


Best regards,


Daniel Bond.



On Oct 2, 2009, at 10:36 PM, Rudy wrote:



Ah, I'll stop messing with them.


I just set them all to 0 to see if that will help and noticed the card
was leaving tx_int_delay=1.

# sysctl dev.em.4.debug=1
Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1,  
tx_abs_int_delay = 0
Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0,  
rx_abs_int_delay = 0


# sysctl dev.em.4
dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
dev.em.4.rx_int_delay: 0
dev.em.4.tx_int_delay: 0
dev.em.4.rx_abs_int_delay: 0
dev.em.4.tx_abs_int_delay: 0

Splitting traffic to different ports has brought down the watchdog
events to once a day.  ... essentially, I have a quad 30Mbps (not quad
1Gbps) card.  heheh.
Would turning off net.inet.ip.fastforwarding or any other setting  
help?


Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I  
have

a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.

Rudy



Jack Vogel wrote:
Watchdog resets the adapter. Messing with these values is of  
dubious value

anyway.

Jack


On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net  
wrote:




I noticed something interesting.

I set the rc_int_delay to 0:
sysctl dev.em.5.rx_int_delay=0

Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0,  
rx_abs_int_delay = 66


After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay  
is

now 32:
Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32,  
rx_abs_int_delay =

66

However, running sysctl dev.em.5 shows it as 0:
dev.em.5.rx_int_delay: 0
dev.em.5.tx_int_delay: 66

Seems like the adapter and the kernel don't agree on the  
rx_int_delay

value.

Rudy







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org 





PGP.sig
Description: This is a digitally signed message part


Re: em0 watchdog timeouts

2009-10-05 Thread Robert Blayzor

On Oct 2, 2009, at 4:36 PM, Rudy wrote:
Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I  
have

a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.



Just curious, have you tried (or are you using) device polling?

--
Robert Blayzor, BOFH
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel
This posting just muddies the issue, first you talk about having a problem
that
involves Broadcom, ok, so post about that on something other than em :)

Then you make some references to hardware that you might have bought
but didn't, I'm not about debugging 'possible worlds problems' though so
can't help you there either :)

Finally you never say what the actual hardware is, other than a person who
I do not know told you it was the best performer... so, what exactly is it?

You have a problem once every 10 days,  and at a specific time no less,
this almost always means something in your environment, a cron job run
amok, a piece of hardware that resets, I dunno, but the last thing I would
suspect given this description is the driver.

You need a good sysadmin for this debugging I would venture, not a driver
developer.

Jack


On Mon, Oct 5, 2009 at 7:19 AM, Daniel Bond d...@danielbond.org wrote:

 Hi,

 I've been struggling with watchdog timeouts in 7.1/7.2-RELEASE for the past
 6months too. It looks related.

 I've tried to replace the hardware 3 times (2 different IBM x3755 chassis,
 one IBM x3650 chassis).
 I tried first with onboard broadcom NICs (bce-based) PCIx-based, until I
 had issues with watchdog timeout.

 I tried replacing it with a 4-port pci-x Intel NIC, which gave me same
 problems. I was told that the 4-port intel NICs had an onboard
 bus-controller, that
 could cause trouble, so I replaced this with a 2-port PCI-e intel, which I
 was told by a Sepherosa Ziehau was the best performing gig-e NIC (rx/tx).

 Still getting watchdog timeouts, I tried upgrading all sort of sysctls I
 found in mailing-list threads (disable msi/msix interrupts, adjust rx/tx
 processing, etc, etc).
 I tried upgrading BIOS, firmware on all kinds of stuff (disks, BMC, etc,
 etc) to newest version. I also tried using a different qlogic isp(4)
 FC-controller (PCI-e).

 No matter what I tried, I could not diagnose this problem, or at least fix
 it. Also it happened rarely enough, to not be easy to debugging. I would get
 a series of watchdog timeout -- resetting, until the NIC would go
 completly offline - at the point I'd reboot it from console.

 This happened about once every 1-10 days, usually about 11-13:00. This
 machine has now been replaced with Linux, unfortunately, just to avoid more
 customer complaints and downtime. The IBM x3755 with FreeBSD7.2 which was
 replaced with Linux, is still online, and
 can be put at disposal for any developers who would like to debug this
 further.

 Like Stefan Krueger mentioned, this machine is also running as NFS server,
 with a mix of BSD and Linux clients, and it's getting hit pretty hard by
 clients.


 Hope we can iron this bug out, in the future.


 Best regards,


 Daniel Bond.




 On Oct 2, 2009, at 10:36 PM, Rudy wrote:


 Ah, I'll stop messing with them.


 I just set them all to 0 to see if that will help and noticed the card
 was leaving tx_int_delay=1.

 # sysctl dev.em.4.debug=1
 Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0
 Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0

 # sysctl dev.em.4
 dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
 dev.em.4.rx_int_delay: 0
 dev.em.4.tx_int_delay: 0
 dev.em.4.rx_abs_int_delay: 0
 dev.em.4.tx_abs_int_delay: 0

 Splitting traffic to different ports has brought down the watchdog
 events to once a day.  ... essentially, I have a quad 30Mbps (not quad
 1Gbps) card.  heheh.
 Would turning off net.inet.ip.fastforwarding or any other setting help?

 Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I have
 a feeling that isn't related to the NIC at all, but I'm not sure what
 else to try.

 Rudy



 Jack Vogel wrote:

 Watchdog resets the adapter. Messing with these values is of dubious
 value
 anyway.

 Jack


 On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:


  I noticed something interesting.

 I set the rc_int_delay to 0:
 sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
 Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay =
 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
 Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy





 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Daniel Bond

Hi Jack,

I'll comment your mail inline:


On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote:

This posting just muddies the issue, first you talk about having a  
problem that
involves Broadcom, ok, so post about that on something other than  
em :)


I only meant to indicate that the problem might exist outside the  
intel driver.
I'm also indicating that it happens with several drivers (bge, bce and  
em) on several different machines, on both pci-x and pci-e.


I'm sorry if this is confusing to you, but I still think it's relevant  
to mention.




Then you make some references to hardware that you might have bought
but didn't, I'm not about debugging 'possible worlds problems'  
though so

can't help you there either :)


No. I only made references to hardware I actually used, and had real- 
world issues with.




Finally you never say what the actual hardware is, other than a  
person who
I do not know told you it was the best performer... so, what exactly  
is it?


Sepherosa is a guy that writes drivers for BSD based operating  
systems. Including FreeBSD. He has a lot of knowledge in this area.

http://people.freebsd.org/~sephe/

The NIC you are referring to, the one sephe recommended me, is a  
82571EB. I didn't mention specific hardware, as I think it's more  
important
to note this is an issue I'm experiencing across different sets of  
hardware and drivers.




You have a problem once every 10 days,  and at a specific time no  
less,

this almost always means something in your environment, a cron job run
amok, a piece of hardware that resets, I dunno, but the last thing I  
would

suspect given this description is the driver.


This is not what I wrote. I wrote I had a problem every 1-10 days, but  
it would usually happen once every 3-4 days. At worst, every day in  
periods.


It's not at any specific time. If you read my email correctly, I say  
it *usually* happens arround 11-13:00,

but it has happened at random times too.

This is my point exactly. I don't think it's the Intel-driver, I think  
the problem is elsewhere. I had a suspicion it had to do with the  
combination of nic + qlogic fc-controller, but I have no evidence of  
this.




You need a good sysadmin for this debugging I would venture, not a  
driver

developer.


What I need is useful advice/help. I never stated I needed a driver  
developer.


I'd like to be able to run my favorite OS on cool hardware, in the  
future, for a high-performing NFS-server, without problems like I've  
experienced the past 6months, on a production system.
Please note that I'm managing a server-park almost completely based on  
FreeBSD, and I'm running many NFS servers on other hardware, for other  
services, without issues.


I've seen several other FreeBSD-users having problems with this too,  
so I think it's of importance for the project. As I mentioned  
originally, I'm happy to dispose the hardware to any FreeBSD developer
that might want to look further into this. Debugging it further is  
above my skill-set, I don't even know where to begin looking,  
especially since I can't produce any panics.


I'm sorry to say, but your reply was %0 useful, Jack.



Jack



- Daniel


PGP.sig
Description: This is a digitally signed message part


Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel
Sorry, its a Monday morning, I was being kinda facetious, guess it didn't
work very well :) I apologize.

I know it must be annoying for you, its as much so for me when its something
I can't just fix because its not reproducible. So, I feel your pain.

Will try to restrain my Monday blues in the future.

Jack


On Mon, Oct 5, 2009 at 11:32 AM, Daniel Bond d...@danielbond.org wrote:

 Hi Jack,

 I'll comment your mail inline:


 On Oct 5, 2009, at 6:57 PM, Jack Vogel wrote:

  This posting just muddies the issue, first you talk about having a problem
 that
 involves Broadcom, ok, so post about that on something other than em :)


 I only meant to indicate that the problem might exist outside the intel
 driver.
 I'm also indicating that it happens with several drivers (bge, bce and em)
 on several different machines, on both pci-x and pci-e.

 I'm sorry if this is confusing to you, but I still think it's relevant to
 mention.


 Then you make some references to hardware that you might have bought
 but didn't, I'm not about debugging 'possible worlds problems' though so
 can't help you there either :)


 No. I only made references to hardware I actually used, and had real-world
 issues with.


 Finally you never say what the actual hardware is, other than a person who
 I do not know told you it was the best performer... so, what exactly is
 it?


 Sepherosa is a guy that writes drivers for BSD based operating systems.
 Including FreeBSD. He has a lot of knowledge in this area.
 http://people.freebsd.org/~sephe/ http://people.freebsd.org/%7Esephe/

 The NIC you are referring to, the one sephe recommended me, is a 82571EB. I
 didn't mention specific hardware, as I think it's more important
 to note this is an issue I'm experiencing across different sets of hardware
 and drivers.


 You have a problem once every 10 days,  and at a specific time no less,
 this almost always means something in your environment, a cron job run
 amok, a piece of hardware that resets, I dunno, but the last thing I would
 suspect given this description is the driver.


 This is not what I wrote. I wrote I had a problem every 1-10 days, but it
 would usually happen once every 3-4 days. At worst, every day in periods.

 It's not at any specific time. If you read my email correctly, I say it
 *usually* happens arround 11-13:00,
 but it has happened at random times too.

 This is my point exactly. I don't think it's the Intel-driver, I think the
 problem is elsewhere. I had a suspicion it had to do with the combination of
 nic + qlogic fc-controller, but I have no evidence of this.


 You need a good sysadmin for this debugging I would venture, not a driver
 developer.


 What I need is useful advice/help. I never stated I needed a driver
 developer.

 I'd like to be able to run my favorite OS on cool hardware, in the future,
 for a high-performing NFS-server, without problems like I've experienced the
 past 6months, on a production system.
 Please note that I'm managing a server-park almost completely based on
 FreeBSD, and I'm running many NFS servers on other hardware, for other
 services, without issues.

 I've seen several other FreeBSD-users having problems with this too, so I
 think it's of importance for the project. As I mentioned originally, I'm
 happy to dispose the hardware to any FreeBSD developer
 that might want to look further into this. Debugging it further is above my
 skill-set, I don't even know where to begin looking, especially since I
 can't produce any panics.

 I'm sorry to say, but your reply was %0 useful, Jack.


 Jack


 - Daniel

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Greg Byshenk
On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote:
 
 What I need is useful advice/help. I never stated I needed a driver  
 developer.
 
 I'd like to be able to run my favorite OS on cool hardware, in the  
 future, for a high-performing NFS-server, without problems like I've  
 experienced the past 6months, on a production system.
 Please note that I'm managing a server-park almost completely based on  
 FreeBSD, and I'm running many NFS servers on other hardware, for other  
 services, without issues.
 
 I've seen several other FreeBSD-users having problems with this too,  
 so I think it's of importance for the project. As I mentioned  
 originally, I'm happy to dispose the hardware to any FreeBSD developer
 that might want to look further into this. Debugging it further is  
 above my skill-set, I don't even know where to begin looking,  
 especially since I can't produce any panics.

I can give one bit of advice that helped me in a similar situation:
check you motherboards.

I run about a dozen fileservers on FreeBSD, and have always been very
happy with their performance, but some months ago I began to experience
problems with one of them.  These problems were 'watchdog timeout'
errors.  Tried all manner of things, different NICs of different types,
changing settings, etc., but nothing helped over the long term.  At 
some point, when very heavy i/o was going on to our Beowulf cluster, the
'watchdog timeouts' would begin.  What was strange is that other 
(supposedly identical) machines handled _more_ i/o without a problem.

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.  I changed the motherboard and all the problems went away,
never to reappear.

I don't know if it was a specific problem with that particular
motherboard, or something about that model, but for whatever reason, it
appears that the buses just couldn't handle a RAID card and three active
NICs.


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Rudy

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.


This is a good piece of info.  I can try swapping out the MB and see 
what happens.


I do want to add: thank you Jack for all your help and if does turn out 
to be the MB, then double thanks.  Viva Monday!   :)


What would be nice would be MORE info for a watchdog timeout... maybe a 
sysctl dev.watchdog.debug=1 or something where when a watchdog event 
happened --- for whatever driver --- a bunch of stats were dumped 
relating to the event.


Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Jack Vogel
Hmmm, I did have one of the drivers print more info at watchdog time, but I
just looked
and that's not em, time to add that I guess.

Since you're in the driver there isn't a huge amount of info that you can
print, it still
may not be enough to help.

BTW, I've always been somewhat dissatisfied with the watchdog design and
think
its kinda flawed, I could try and make you an experimental with debug and
some
changes that you can try if you'd like.

Jack


On Mon, Oct 5, 2009 at 1:54 PM, Rudy cra...@monkeybrains.net wrote:

 Finally, while doing some comparisons, I realized that the motherboard
 having the problem was _not_ the same as the others; it was similar, but
 not identical.


 This is a good piece of info.  I can try swapping out the MB and see what
 happens.

 I do want to add: thank you Jack for all your help and if does turn out to
 be the MB, then double thanks.  Viva Monday!   :)

 What would be nice would be MORE info for a watchdog timeout... maybe a
 sysctl dev.watchdog.debug=1 or something where when a watchdog event
 happened --- for whatever driver --- a bunch of stats were dumped relating
 to the event.

 Rudy

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-05 Thread Rudy (bulk)



BTW, I've always been somewhat dissatisfied with the watchdog design and
think
its kinda flawed, I could try and make you an experimental with debug and
some
changes that you can try if you'd like.
  


I'm game -- it would be nice if the machine still reset the watchdog in 
3 seconds and didn't cause any more damage from the debug code (eg a 
panic).  :)


My frequency of watchdog events is about 2 or 3 times per day.
I am running:   Intel(R) PRO/1000 Network Connection 6.9.12




Rudy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-02 Thread Rudy

I noticed something interesting.

I set the rc_int_delay to 0:
 sysctl dev.em.5.rx_int_delay=0

Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
 Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
now 32:
 Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay = 66

However, running sysctl dev.em.5 shows it as 0:
dev.em.5.rx_int_delay: 0
dev.em.5.tx_int_delay: 66

Seems like the adapter and the kernel don't agree on the rx_int_delay value.

Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-02 Thread Jack Vogel
Watchdog resets the adapter. Messing with these values is of dubious value
anyway.

Jack


On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:


 I noticed something interesting.

 I set the rc_int_delay to 0:
  sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
  Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
  Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-02 Thread Rudy

Ah, I'll stop messing with them. 


I just set them all to 0 to see if that will help and noticed the card
was leaving tx_int_delay=1.

# sysctl dev.em.4.debug=1
Oct  2 13:26:07 mango kernel: em4: tx_int_delay = 1, tx_abs_int_delay = 0
Oct  2 13:26:07 mango kernel: em4: rx_int_delay = 0, rx_abs_int_delay = 0

# sysctl dev.em.4
dev.em.4.%desc: Intel(R) PRO/1000 Network Connection 6.9.12
dev.em.4.rx_int_delay: 0
dev.em.4.tx_int_delay: 0
dev.em.4.rx_abs_int_delay: 0
dev.em.4.tx_abs_int_delay: 0

Splitting traffic to different ports has brought down the watchdog
events to once a day.  ... essentially, I have a quad 30Mbps (not quad
1Gbps) card.  heheh.
Would turning off net.inet.ip.fastforwarding or any other setting help?

Today, I set net.inet.ip.fw.enable=0 and I'll see if that helps.  I have
a feeling that isn't related to the NIC at all, but I'm not sure what
else to try.

Rudy



Jack Vogel wrote:
 Watchdog resets the adapter. Messing with these values is of dubious value
 anyway.

 Jack


 On Fri, Oct 2, 2009 at 11:36 AM, Rudy cra...@monkeybrains.net wrote:

   
 I noticed something interesting.

 I set the rc_int_delay to 0:
  sysctl dev.em.5.rx_int_delay=0

 Chcking via sysctl dev.em.5.debug=1 shows ex_int_delay is indeed 0:
  Oct  1 17:32:41 mango kernel: em5: rx_int_delay = 0, rx_abs_int_delay = 66

 After a watchdog event, sysctl dev.em.5.debug=1 shows ex_int_delay is
 now 32:
  Oct  2 11:29:49 mango kernel: em5: rx_int_delay = 32, rx_abs_int_delay =
 66

 However, running sysctl dev.em.5 shows it as 0:
 dev.em.5.rx_int_delay: 0
 dev.em.5.tx_int_delay: 66

 Seems like the adapter and the kernel don't agree on the rx_int_delay
 value.

 Rudy

 

   

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-01 Thread Rudy (bulk)


I have rxd and txd set to 1024.  How high can I safely go?

# add more descriptors to em devices.
hw.em.rxd=1024
hw.em.txd=1024

### other settings... I have tried rx_int_delay=0 and 32 ... doesn't 
seem to make the watchdogs go away.


dev.em.4.rx_int_delay: 32
dev.em.4.tx_int_delay: 66
dev.em.4.rx_abs_int_delay: 66
dev.em.4.tx_abs_int_delay: 66
dev.em.4.rx_processing_limit: 300



I am using a PCI-Express (x8) PCI-e slot according to the motherboard specs:
http://supermicro.com/products/motherboard/Xeon3000/3210/X7SBi.cfm

Rudy



Jack Vogel wrote:

Increase the size of your TX ring, meaning the number of TX descriptors.

You said this is a quad port card, what size PCI E slot are you in? On
some motherboards slot connectors might suggest its of a certain size
but its not really wired fully. If you are not in a x8 lane slot move it to
one.

What about system tuning?

Some ideas, let me know how it goes.

Jack


On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote:

  

Rudy wrote:



Rudy wrote:

  

I am having watchdog timeout issues



Oh, here is some more info from 'pciconf -lcv'.

I offloaded half the traffic from em0 to em5 and there has only been one
watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We do
streaming out of our network and the 3 second outage really messes things
up...


e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
hdr=0x00
   vendor = 'Intel Corporation'
   device = '82571EB Gigabit Ethernet Controller'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086
rev=0x03 hdr=0x00
   vendor = 'Intel Corporation'
   device = '82573E Intel Corporation 82573E Gigabit Ethernet
Controller (Copper)'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086
rev=0x00 hdr=0x00
   vendor = 'Intel Corporation'
   device = '82573L Intel PRO/1000 PL Network Adaptor'
   class  = network
   subclass   = ethernet
   cap 01[c8] = powerspec 2  supports D0 D3  current D0
   cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
   cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002
rev=0x02



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-01 Thread Rudy (bulk)


I have a quad card in a PCIe 8x port, and there are 2 ports on the 
motherboard.  I just read the manual and see that the on board ports are 
PCIe 1x.


I have been seeing watchdog events on the onboard ports as well as on 
the PCIe card.  The router is doing roughly 50Mbps on em0, em4  em5.


Does i386 vs amd64 make any difference to the em0 driver?

bumping TX Ring to 2048.  grep em /boot/loader.conf

if_em_load=YES
hw.em.rxd=2048
hw.em.txd=2048

Rudy





You said this is a quad port card, what size PCI E slot are you in? 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-01 Thread Jack Vogel
I would say that 1024 should be enough, I thought maybe you were at 256.
amd64 kernels just perform better at a lot of things, however I/O is not
necessarily
one of them, so I wouldn't claim it for sure, still I'd always default to 64
bit these
days unless there's some other reason not to.

What about system load, perhaps something is bogging the thing down so that
it
cannot adequately service the network interrupts??

The specs of the motherboard are respectable, how much memory does it have?

Another thought, are you using the out-of-band management features (like
IPMI)?
If you are not then go into the BIOS and disable that stuff.

Have you run netstat or some other resource monitor to see if you run out of
anything that might coincide with the watchdogs...

Jack




On Thu, Oct 1, 2009 at 2:12 PM, Rudy (bulk) cra...@monkeybrains.net wrote:


 I have a quad card in a PCIe 8x port, and there are 2 ports on the
 motherboard.  I just read the manual and see that the on board ports are
 PCIe 1x.

 I have been seeing watchdog events on the onboard ports as well as on the
 PCIe card.  The router is doing roughly 50Mbps on em0, em4  em5.

 Does i386 vs amd64 make any difference to the em0 driver?

 bumping TX Ring to 2048.  grep em /boot/loader.conf

 if_em_load=YES
 hw.em.rxd=2048
 hw.em.txd=2048

 Rudy





 You said this is a quad port card, what size PCI E slot are you in?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-10-01 Thread Rudy

 What about system load, perhaps something is bogging the thing down so that
 it cannot adequately service the network interrupts??

Hardly anything is running on the box...
Only things on the box: zebra bgpd (3 peers...)  sshd snmpd


Here is the top of 'top':

load averages:  0.06,  0.08,  0.07   up 7+01:08:16  17:26:39
15 processes:  1 running, 14 sleeping
CPU:  0.0% user,  0.0% nice,  4.5% system,  0.0% interrupt, 95.5% idle
Mem: 193M Active, 42M Inact, 156M Wired, 196K Cache, 83M Buf, 1610M Free


 The specs of the motherboard are respectable, how much memory does it have?

 Another thought, are you using the out-of-band management features (like
 IPMI)?
 If you are not then go into the BIOS and disable that stuff.

No IPMI card added to that motherboard (you have to add a daughter card).
 Have you run netstat or some other resource monitor to see if you run out of
 anything that might coincide with the watchdogs...
What should I look for?

# netstat -s
4105/4610/8715 mbufs in use (current/cache/total)
4103/2303/6406/25600 mbuf clusters in use (current/cache/total/max)
4103/2297 mbuf+clusters out of packet secondary zone in use (current/cache)
0/44/44/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
9232K/5934K/15166K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines



Are there specific router-only tunings that may help?

Here are my sysctl settings:

kern.ipc.somaxconn=256
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
kern.ipc.nmbcluster=32768

net.inet.icmp.icmplim=1000
net.inet.ip.fastforwarding=1
net.inet.ip.intr_queue_maxlen=92
net.inet.icmp.drop_redirect=1

dev.em.0.rx_processing_limit=200
dev.em.1.rx_processing_limit=200
dev.em.2.rx_processing_limit=200
#dev.em.4.rx_processing_limit=200
# test setting processing limit up to 300
dev.em.4.rx_processing_limit=300
dev.em.5.rx_processing_limit=200

Thanks,
Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-09-30 Thread Rudy
Rudy wrote:
 I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
 http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

 link to dcgdis.zip didn't work.  Do you have a copy?
   

Thanks, Jack.  Got the file and flashed -- no upgrade needed.

So, while the router was offline, I flashed the motherboards bios
(Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
version of the em driver.  Still, watchdog timeouts.  Sigh.

Will the Intel Gigabit ET Quad Port Adapter make my the timeouts go away???
Should I be using amd64???
Should tx_int_delay=0?


Summary:
 2 Nics on Motherboard + quad card in PCIe slot. 
 Watchdog timeouts on motherboard nics and on quad card nic when
bandwidth  10Mbps
 There is minimal (bgp session) TCP to the box... it only forwards
packets between interfaces.

# uname -r -m
7.2-STABLE i386

# dmesg | grep ^em
em0: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2000-0x201f
mem 0xd022-0xd023,0xd020-0xd021 irq 16 at device 0.0 on pci5
em0: Using MSI interrupt
em0: [FILTER]
em0: Ethernet address: 00:15:17:78:99:70
em1: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x2020-0x203f
mem 0xd026-0xd027,0xd024-0xd025 irq 17 at device 0.1 on pci5
em1: Using MSI interrupt
em1: [FILTER]
em1: Ethernet address: 00:15:17:78:99:71
em2: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3000-0x301f
mem 0xd032-0xd033,0xd030-0xd031 irq 17 at device 0.0 on pci6
em2: Using MSI interrupt
em2: [FILTER]
em2: Ethernet address: 00:15:17:78:99:72
em3: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x3020-0x303f
mem 0xd036-0xd037,0xd034-0xd035 irq 18 at device 0.1 on pci6
em3: Using MSI interrupt
em3: [FILTER]
em3: Ethernet address: 00:15:17:78:99:73
em4: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x4000-0x401f
mem 0xd040-0xd041 irq 16 at device 0.0 on pci13
em4: Using MSI interrupt
em4: [FILTER]
em4: Ethernet address: 00:30:48:67:14:50
em5: Intel(R) PRO/1000 Network Connection 6.9.12 port 0x5000-0x501f
mem 0xd050-0xd051 irq 17 at device 0.0 on pci15
em5: Using MSI interrupt
em5: [FILTER]
em5: Ethernet address: 00:30:48:67:14:51


# vmstat -i
interrupt  total   rate
irq1: atkbd0 710  0
irq4: sio0 3  0
irq23: atapci0 14943  0
cpu0: timer929753417   2000
irq256: em0702754836   1511
irq257: em12  0
irq260: em4469338728   1009
irq261: em5 78605337169
cpu1: timer929753403   2000
Total 3110221379   6690

# sysctl dev.em.0.stats=1
Sep 30 01:08:20 mango kernel: em0: Excessive collisions = 0
Sep 30 01:08:20 mango kernel: em0: Sequence errors = 0
Sep 30 01:08:20 mango kernel: em0: Defer count = 0
Sep 30 01:08:20 mango kernel: em0: Missed Packets = 101469
Sep 30 01:08:20 mango kernel: em0: Receive No Buffers = 0
Sep 30 01:08:20 mango kernel: em0: Receive Length Errors = 0
Sep 30 01:08:20 mango kernel: em0: Receive errors = 0
Sep 30 01:08:20 mango kernel: em0: Crc errors = 0
Sep 30 01:08:20 mango kernel: em0: Alignment errors = 0
Sep 30 01:08:20 mango kernel: em0: Collision/Carrier extension errors = 0
Sep 30 01:08:20 mango kernel: em0: RX overruns = 0
Sep 30 01:08:20 mango kernel: em0: watchdog timeouts = 15
Sep 30 01:08:20 mango kernel: em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK
MSIX IRQ = 0
Sep 30 01:08:20 mango kernel: em0: XON Rcvd = 0
Sep 30 01:08:20 mango kernel: em0: XON Xmtd = 0
Sep 30 01:08:20 mango kernel: em0: XOFF Rcvd = 0
Sep 30 01:08:20 mango kernel: em0: XOFF Xmtd = 0
Sep 30 01:08:20 mango kernel: em0: Good Packets Rcvd = 1056196797
Sep 30 01:08:20 mango kernel: em0: Good Packets Xmtd = 1088726903
Sep 30 01:08:20 mango kernel: em0: TSO Contexts Xmtd = 4088
Sep 30 01:08:20 mango kernel: em0: TSO Contexts Failed = 0

# sysctl dev.em.0.debug=1
Sep 30 01:34:59 mango kernel: em0: Adapter hardware address = 0xc5159420
Sep 30 01:34:59 mango kernel: em0: CTRL = 0x401c0241 RCTL = 0x8002
Sep 30 01:34:59 mango kernel: em0: Packet buffer = Tx=16k Rx=32k
Sep 30 01:34:59 mango kernel: em0: Flow control watermarks high = 30720
low = 29220
Sep 30 01:34:59 mango kernel: em0: tx_int_delay = 66, tx_abs_int_delay = 66
Sep 30 01:34:59 mango kernel: em0: rx_int_delay = 0, rx_abs_int_delay = 66
Sep 30 01:34:59 mango kernel: em0: fifo workaround = 0, fifo_reset_count = 0
Sep 30 01:34:59 mango kernel: em0: hw tdh = 980, hw tdt = 980
Sep 30 01:34:59 mango kernel: em0: hw rdh = 203, hw rdt = 202
Sep 30 01:34:59 mango kernel: em0: Num Tx descriptors avail = 1024
Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail1 = 0
Sep 30 01:34:59 mango kernel: em0: Tx Descriptors not avail2 = 0
Sep 30 01:34:59 mango kernel: em0: Std mbuf failed = 0
Sep 30 01:34:59 mango kernel: em0: Std mbuf cluster 

Re: em0 watchdog timeouts

2009-09-30 Thread Stefan Krueger
In muc.lists.freebsd.stable, you wrote:
 Rudy wrote:
 I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
 http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

 link to dcgdis.zip didn't work.  Do you have a copy?
   

 Thanks, Jack.  Got the file and flashed -- no upgrade needed.

 So, while the router was offline, I flashed the motherboards bios
 (Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
 version of the em driver.  Still, watchdog timeouts.  Sigh.

Hi Rudy,

may I ask which clients have access to your FreeBSD 7.2 server?

I had similar problems a few days ago; I have no idea what exactly
happend, but Ubuntu Linux (NIS and NFS client) made my em0
timeout after a while, too, (and even crashed my FreeBSD 7.2 box
a few times!)

This box was rock solid before, I even thought my Intel NIC was
broken...

Anyway, since I had no time (and clue) to analyze this further, I took
the risk and upgraded to 8.0-RC1 and, well, everything is working fine
now :-)

HTH
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-09-30 Thread Rudy (bulk)

Stefan Krueger wrote:

In muc.lists.freebsd.stable, you wrote:
  

Rudy wrote:


I am having watchdog timeout issues with my Intel 82573 Pro/1000 ...
http://lists.freebsd.org/pipermail/freebsd-net/2008-May/018075.html

link to dcgdis.zip didn't work.  Do you have a copy?
  
  

Thanks, Jack.  Got the file and flashed -- no upgrade needed.

So, while the router was offline, I flashed the motherboards bios
(Supermicro X7Sbi), upgraded to 7.2-STABLE, and downloaded the 6.9.12
version of the em driver.  Still, watchdog timeouts.  Sigh.



Hi Rudy,

may I ask which clients have access to your FreeBSD 7.2 server?


None.  It is  a router and has minimal services on it (bgpd / zebra / 
snmpd).


Rudy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-09-30 Thread Rudy

Rudy wrote:

Rudy wrote:

I am having watchdog timeout issues


Oh, here is some more info from 'pciconf -lcv'.

I offloaded half the traffic from em0 to em5 and there has only been one 
watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We 
do streaming out of our network and the 3 second outage really messes 
things up...



e...@pci0:5:0:0:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:5:0:1:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:0:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:6:0:1:	class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06 
hdr=0x00

vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
e...@pci0:13:0:0:	class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 
hdr=0x00

vendor = 'Intel Corporation'
device = '82573E Intel Corporation 82573E Gigabit Ethernet 
Controller (Copper)'

class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
e...@pci0:15:0:0:	class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 
hdr=0x00

vendor = 'Intel Corporation'
device = '82573L Intel PRO/1000 PL Network Adaptor'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
vgap...@pci0:17:3:0:	class=0x03 card=0xd18015d9 chip=0x515e1002 
rev=0x02

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: em0 watchdog timeouts

2009-09-30 Thread Jack Vogel
Increase the size of your TX ring, meaning the number of TX descriptors.

You said this is a quad port card, what size PCI E slot are you in? On
some motherboards slot connectors might suggest its of a certain size
but its not really wired fully. If you are not in a x8 lane slot move it to
one.

What about system tuning?

Some ideas, let me know how it goes.

Jack


On Wed, Sep 30, 2009 at 3:28 PM, Rudy cra...@monkeybrains.net wrote:

 Rudy wrote:

 Rudy wrote:

 I am having watchdog timeout issues


 Oh, here is some more info from 'pciconf -lcv'.

 I offloaded half the traffic from em0 to em5 and there has only been one
 watchdog timeout today (on em5) vs. 10 watchdog timeouts yesterday.  We do
 streaming out of our network and the 3 second outage really messes things
 up...


 e...@pci0:5:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:5:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:6:0:0: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:6:0:1: class=0x02 card=0x10a48086 chip=0x10a48086 rev=0x06
 hdr=0x00
vendor = 'Intel Corporation'
device = '82571EB Gigabit Ethernet Controller'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x4(x4)
 e...@pci0:13:0:0:class=0x02 card=0x108c15d9 chip=0x108c8086
 rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
device = '82573E Intel Corporation 82573E Gigabit Ethernet
 Controller (Copper)'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
 e...@pci0:15:0:0:class=0x02 card=0x109a15d9 chip=0x109a8086
 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82573L Intel PRO/1000 PL Network Adaptor'
class  = network
subclass   = ethernet
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
 vgap...@pci0:17:3:0:class=0x03 card=0xd18015d9 chip=0x515e1002
 rev=0x02

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org