On Wed, 11 Mar 2009, Gary W. Smith wrote: > I asked this last week but didn't get a response. I have a supermicro
apologies for the slow response. > server with a dual intel nic that uses the e0100 driver. I'm using > CentOS 5.2 and when I do anything network intensive I lose connectivity > for a few seconds. Then we get this in the log. I downloaded, compiled > and installed the latest e1000 driver. I see that the driver is in the > proper location (based on timestamp). thank you for downloading the latest driver. It is probably 8.0.9? please load the driver with the module parameters TxDescriptorStep=4,4 you can modify /etc/modprobe.conf and add options e1000 TxDescriptorStep=4,4 (if you only have two ports) or just load the driver with modprobe e1000 TxDescriptorStep=4,4 and then use ethtool to increase the number of tx descriptors. ethtool -G eth0 tx 1024 this workaround only uses one in every four descriptors. > How can I fix this problem on this server. I have tried to manually > disable the tso and other entries but this doesn't seem to help. I've > also tried setting it down to 100/full to no avail. It appears to be a > TX, not RX issue. I say this because I run dstat in the background and > when it hangs and then comes back it will quickly dump a full screen of > dstat entries, which should be one per second, which I'm assuming that > TCP is buffering the packets. please attach the full lspci -vvv for your system, make sure that you have the latest bios update, and that the system's bios settings are set to the defaults, and particularly any settings having to do with "write combining" or PCI transaction combining are disabled. > Things I've tried. > > /sbin/ethtool -K eth0 tso off > /sbin/ethtool -K eth0 rx off > /sbin/ethtool -K eth0 tx off > /sbin/ethtool -K eth0 sg off > > > Mar 11 18:50:01 vcsoaknas01 kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 11 18:50:01 vcsoaknas01 kernel: Tx Queue <0> > Mar 11 18:50:01 vcsoaknas01 kernel: TDH <f7> > Mar 11 18:50:01 vcsoaknas01 kernel: TDT <f7> > Mar 11 18:50:01 vcsoaknas01 kernel: next_to_use <f7> > Mar 11 18:50:01 vcsoaknas01 kernel: next_to_clean <24> > Mar 11 18:50:01 vcsoaknas01 kernel: buffer_info[next_to_clean] > Mar 11 18:50:01 vcsoaknas01 kernel: time_stamp <1004de0b1> > Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch <24> > Mar 11 18:50:01 vcsoaknas01 kernel: jiffies <1004dec18> > Mar 11 18:50:01 vcsoaknas01 kernel: next_to_watch.status <0> this really indicates that the adapter is finishing all the work but that the descriptor is not making it back to main memory indicating the work was completed. We have seen this a lot with AMD systems, in particular ones with VIA chipsets. There is a bad bug in those machines when an IO device and the processor both write to the same cache line. also, if the above workaround doesn't help we'll want you to install the dump patch from the patches section of e1000.sourceforge.net and send us the output when you get a tx hang. hope this helps, Jesse ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel