[E1000-devel] Detected Tx Unit Hang Issue

2011-01-22 Thread Stephen Palmateer
Hello All,

I would like to report a problem with the e1000e driver on a CentOS 5.4 machine 
with a custom kernel.

Experiencing interface timeouts/failure on a regular basis, rendering the 
management interface useless.

Seeing the following error repeatedly in dmesg and stdout:

:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de952f
  next_to_watch.status 0
:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de95f7
  next_to_watch.status 0
:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de96bf
  next_to_watch.status 0

The wierd part is eth2 has far more traffic on it and is not seeing any issue.

I'll try to provide as much info as I can below.

[admin@filter1 ~]$ uname -a
Linux filter1.yemen.net.ye 2.6.18-164.15.1.el5.netsw #1 SMP Mon Apr 26 15:01:04 
EDT 2010 i686 i686 i386 GNU/Linux

[root@filter1 ~]# ethtool -i eth0
driver: e1000e
version: 1.0.2-k2
firmware-version: 5.10-2
bus-info: :05:00.0

[root@filter1 ~]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

[root@filter1 ~]# lspci -vv | grep net
04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
05:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
07:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network 
Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network 
Connection (rev 02)

[root@filter1 ~]# modinfo e1000e
filename:   
/lib/modules/2.6.18-164.15.1.el5.netsw/kernel/drivers/net/e1000e/e1000e.ko
version:1.0.2-k2
license:GPL
description:Intel(R) PRO/1000 Network Driver
author: Intel Corporation, linux.n...@intel.com
srcversion: D6678FCB5D0D64FDE5CC3DF
alias:  pci:v8086d10F0sv*sd*bc*sc*i*
alias:  pci:v8086d10EFsv*sd*bc*sc*i*
alias:  pci:v8086d10EBsv*sd*bc*sc*i*
alias:  pci:v8086d10EAsv*sd*bc*sc*i*
alias:  pci:v8086d10DFsv*sd*bc*sc*i*
alias:  pci:v8086d10DEsv*sd*bc*sc*i*
alias:  pci:v8086d10CEsv*sd*bc*sc*i*
alias:  pci:v8086d10CDsv*sd*bc*sc*i*
alias:  pci:v8086d10CCsv*sd*bc*sc*i*
alias:  pci:v8086d10CBsv*sd*bc*sc*i*
alias:  pci:v8086d10F5sv*sd*bc*sc*i*
alias:  pci:v8086d10BFsv*sd*bc*sc*i*
alias:  pci:v8086d10E5sv*sd*bc*sc*i*
alias:  pci:v8086d294Csv*sd*bc*sc*i*
alias:  pci:v8086d10BDsv*sd*bc*sc*i*
alias:  pci:v8086d10C3sv*sd*bc*sc*i*
alias:  pci:v8086d10C2sv*sd*bc*sc*i*
alias:  pci:v8086d10C0sv*sd*bc*sc*i*
alias:  pci:v8086d1049sv*sd*bc*sc*i*
alias:  pci:v8086d104Dsv*sd*bc*sc*i*
alias:  pci:v8086d104Bsv*sd*bc*sc*i*
alias:  pci:v8086d104Asv*sd*bc*sc*i*
alias:  pci:v8086d10C4sv*sd*bc*sc*i*
alias:  pci:v8086d10C5sv*sd*bc*sc*i*
alias:  pci:v8086d104Csv*sd*bc*sc*i*
alias:  pci:v8086d10BBsv*sd*bc*sc*i*
alias:  pci:v8086d1098sv*sd*bc*sc*i*
alias:  pci:v8086d10BAsv*sd*bc*sc*i*
alias:  pci:v8086d1096sv*sd*bc*sc*i*
alias:  pci:v8086d150Csv*sd*bc*sc*i*
alias:  pci:v8086d10F6sv*sd*bc*sc*i*
alias:  pci:v8086d10D3sv*sd*bc*sc*i*
alias:  

Re: [E1000-devel] Detected Tx Unit Hang Issue

2011-01-22 Thread Stephen Palmateer
just found Network Adapter Driver for PCI-E Gigabit Network Connections under 
Linux*
version 1.2.20
Intel's Readme suggests that this will fix the driver generated interrupts.

Since our e1000e driver is only version 1.0.2 I'm going to winscp the tarball 
provided by Intel to the machine and follow Intel's instructions for 
installation.

Intel's website; 
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=YDwnldID=15817
suggests that this version of the driver is valid for the IntelĀ® 82571EB 
Gigabit Ethernet Controllers we're working with.

I'll update this email thread when I'm finished.

thanks again,
Stephen Palmateer

- Original Message -
From: Stephen Palmateer stephen.palmat...@netsweeper.com
To: E1000-devel@lists.sourceforge.net
Cc: ali a...@yemen.net.ye, Assem Alwadee assem1...@gmail.com, Jeremy 
Erb jeremy@netsweeper.com, Tamer Abu-Elsaad 
tamer.abu-els...@netsweeper.com
Sent: Saturday, January 22, 2011 4:44:01 PM
Subject: Detected Tx Unit Hang Issue

Hello All,

I would like to report a problem with the e1000e driver on a CentOS 5.4 machine 
with a custom kernel.

Experiencing interface timeouts/failure on a regular basis, rendering the 
management interface useless.

Seeing the following error repeatedly in dmesg and stdout:

:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de952f
  next_to_watch.status 0
:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de95f7
  next_to_watch.status 0
:04:00.0: eth0: Detected Tx Unit Hang:
  TDH  143
  TDT  12e
  next_to_use  12e
  next_to_clean142
buffer_info[next_to_clean]:
  time_stamp   100de9410
  next_to_watch144
  jiffies  100de96bf
  next_to_watch.status 0

The wierd part is eth2 has far more traffic on it and is not seeing any issue.

I'll try to provide as much info as I can below.

[admin@filter1 ~]$ uname -a
Linux filter1.yemen.net.ye 2.6.18-164.15.1.el5.netsw #1 SMP Mon Apr 26 15:01:04 
EDT 2010 i686 i686 i386 GNU/Linux

[root@filter1 ~]# ethtool -i eth0
driver: e1000e
version: 1.0.2-k2
firmware-version: 5.10-2
bus-info: :05:00.0

[root@filter1 ~]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: off
generic-receive-offload: off

[root@filter1 ~]# lspci -vv | grep net
04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
05:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
05:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (Copper) (rev 06)
Subsystem: Sun Microsystems Computer Corp. x4 PCI-Express Quad Gigabit 
Ethernet UTP Low Profile Adapter
07:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network 
Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network 
Connection (rev 02)

[root@filter1 ~]# modinfo e1000e
filename:   
/lib/modules/2.6.18-164.15.1.el5.netsw/kernel/drivers/net/e1000e/e1000e.ko
version:1.0.2-k2
license:GPL
description:Intel(R) PRO/1000 Network Driver
author: Intel Corporation, linux.n...@intel.com
srcversion: D6678FCB5D0D64FDE5CC3DF
alias:  pci:v8086d10F0sv*sd*bc*sc*i*
alias:  pci:v8086d10EFsv*sd*bc*sc*i*
alias:  pci:v8086d10EBsv*sd*bc*sc*i*
alias:  pci:v8086d10EAsv*sd*bc*sc*i*
alias:  pci:v8086d10DFsv*sd*bc*sc*i*
alias:  pci:v8086d10DEsv*sd*bc*sc*i*
alias:  pci:v8086d10CEsv*sd*bc*sc*i*
alias:  pci:v8086d10CDsv*sd*bc*sc*i*
alias:  pci:v8086d10CCsv*sd*bc*sc*i*
alias:  pci:v8086d10CBsv*sd*bc*sc*i*
alias:  pci:v8086d10F5sv*sd*bc*sc*i*
alias:  pci:v8086d10BFsv*sd*bc*sc*i*
alias:  pci:v8086d10E5sv*sd*bc*sc*i*
alias: