On Fri, 16 Apr 2010, Lars Ehrhardt wrote: > > Hi Lars, just looking back over old emails, and I did notice that at > > least with one of your messages the stats still showed > > tx_tcp_seg_good: 13136 > > > > Which means that you still had TSO enabled. > > Yes, true. I disabled TSO after a while. Is there a recommended setup of > all offload/checksum options for this kind of problems?
Well it is a good start to turn off all offloads like you already have. > The offloading setup for the interface right now is: > > Offload parameters for aur-mgt: > Cannot get device flags: Operation not supported > rx-checksumming: on > tx-checksumming: off > scatter-gather: off > tcp segmentation offload: off > udp fragmentation offload: off > generic segmentation offload: on > large receive offload: off > > I have loaded the module (version 8.0.19) with the following options: > TxDescriptorStep=4 TxDescriptors=1024 > > There are not any messages related to hangs in the logs now, but there > are still outages. I guess a restart routine is kicking in after a while > because I see the following entries in the log after an outage: > > Apr 11 09:22:50 gw kernel: [1179488.074216] e1000: aur-mgt: > e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: > None Hm, thats very interesting. Depending on your kernel version they made the NETDEV_WATCHDOG message a WARN_ONCE only, but the driver's tx_timeout counter will still increment if the OS resets us due to not completing transmits. > However, there are not any "Link is Down" messages in the logs. I don't think you're losing link until you get reset > > Can you make absolutely sure that ethtool -K ethX tso off is done on > > each 82541 interface? > > > > The other thing that might be relevant is if you have >= 4GB ram. > > Nope, the machine has 2 GB RAM. Could it be that the problem is related > to hyperthreading? What I find odd is that the problem occurs on 2 > machines, while it does not occur on 3 other machines. I cannot find a > difference in the system settings though. Unfortunately there a stickers > on the network chips, so I can't say if there are different revisions of > the 82541 chips in those machines. It shouldn't be related to hyperthreading, at least I've never heard of such. you can compare dmidecode output of the machines, maybe compare /dev/nvram output of them too. lspci -vvv should show enough information to see if they are different parts (they will have different revisions) can you also compare the ethtool -e outputs on the machines? What about the slot they are plugged into? Could the two machines with issues have heat problems for some reason (different case maybe?) Jesse ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
