On Wed, 11 Mar 2009, Gary W. Smith wrote:
> I asked this last week but didn't get a response.  I have a supermicro

apologies for the slow response.

> server with a dual intel nic that uses the e0100 driver.  I'm using
> CentOS 5.2 and when I do anything network intensive I lose connectivity
> for a few seconds.  Then we get this in the log.  I downloaded, compiled
> and installed the latest e1000 driver.  I see that the driver is in the
> proper location (based on timestamp).

thank you for downloading the latest driver.  It is probably 8.0.9?

please load the driver with the module parameters TxDescriptorStep=4,4

you can modify /etc/modprobe.conf and add
options e1000 TxDescriptorStep=4,4
(if you only have two ports)

or just load the driver with
modprobe e1000 TxDescriptorStep=4,4
and then use ethtool to increase the number of tx descriptors.
ethtool -G eth0 tx 1024
this workaround only uses one in every four descriptors.

> How can I fix this problem on this server.   I have tried to manually
> disable the tso and other entries but this doesn't seem to help.  I've
> also tried setting it down to 100/full to no avail.  It appears to be a
> TX, not RX issue.  I say this because I run dstat in the background and
> when it hangs and then comes back it will quickly dump a full screen of
> dstat entries, which should be one per second, which I'm assuming that
> TCP is buffering the packets.

please attach the full lspci -vvv for your system, make sure that you have 
the latest bios update, and that the system's bios settings are set to the 
defaults, and particularly any settings having to do with "write 
combining" or PCI transaction combining are disabled.


> Things I've tried.
> 
> /sbin/ethtool -K eth0 tso off
> /sbin/ethtool -K eth0 rx off
> /sbin/ethtool -K eth0 tx off
> /sbin/ethtool -K eth0 sg off
> 
> 
> Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq:
> Detected Tx Unit Hang
> Mar 11 18:50:01 vcsoaknas01 kernel:   Tx Queue             <0>
> Mar 11 18:50:01 vcsoaknas01 kernel:   TDH                  <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   TDT                  <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_use          <f7>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_clean        <24>
> Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean]
> Mar 11 18:50:01 vcsoaknas01 kernel:   time_stamp           <1004de0b1>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_watch        <24>
> Mar 11 18:50:01 vcsoaknas01 kernel:   jiffies              <1004dec18>
> Mar 11 18:50:01 vcsoaknas01 kernel:   next_to_watch.status <0>

this really indicates that the adapter is finishing all the work but that 
the descriptor is not making it back to main memory indicating the work 
was completed.  We have seen this a lot with AMD systems, in particular 
ones with VIA chipsets.  There is a bad bug in those machines when an IO 
device and the processor both write to the same cache line.

also, if the above workaround doesn't help we'll want you to install the 
dump patch from the patches section of e1000.sourceforge.net and send us 
the output when you get a tx hang.

hope this helps, 
 Jesse

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel

Reply via email to