On Fri, 16 Apr 2010, Lars Ehrhardt wrote:
> > Hi Lars, just looking back over old emails, and I did notice that at
> > least with one of your messages the stats still showed
> > tx_tcp_seg_good: 13136
> >
> > Which means that you still had TSO enabled.
> 
> Yes, true. I disabled TSO after a while. Is there a recommended setup of
> all offload/checksum options for this kind of problems?

Well it is a good start to turn off all offloads like you already have.
 
> The offloading setup for the interface right now is:
> 
> Offload parameters for aur-mgt:
> Cannot get device flags: Operation not supported
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: on
> large receive offload: off
> 
> I have loaded the module (version 8.0.19) with the following options:
> TxDescriptorStep=4 TxDescriptors=1024
> 
> There are not any messages related to hangs in the logs now, but there
> are still outages. I guess a restart routine is kicking in after a while
> because I see the following entries in the log after an outage:
> 
> Apr 11 09:22:50 gw kernel: [1179488.074216] e1000: aur-mgt:
> e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control:
> None

Hm, thats very interesting.  Depending on your kernel version they made 
the NETDEV_WATCHDOG message a WARN_ONCE only, but the driver's 
tx_timeout counter will still increment if the OS resets us due to not 
completing transmits.

> However, there are not any "Link is Down" messages in the logs.

I don't think you're losing link until you get reset

> > Can you make absolutely sure that ethtool -K ethX tso off is done on
> > each 82541 interface?
> > 
> > The other thing that might be relevant is if you have >= 4GB ram.
> 
> Nope, the machine has 2 GB RAM. Could it be that the problem is related
> to hyperthreading? What I find odd is that the problem occurs on 2
> machines, while it does not occur on 3 other machines. I cannot find a
> difference in the system settings though. Unfortunately there a stickers
> on the network chips, so I can't say if there are different revisions of
> the 82541 chips in those machines.

It shouldn't be related to hyperthreading, at least I've never heard of 
such.  you can compare dmidecode output of the machines, maybe compare 
/dev/nvram output of them too.  lspci -vvv should show enough information 
to see if they are different parts (they will have different revisions)

can you also compare the ethtool -e outputs on the machines?  What about 
the slot they are plugged into?  Could the two machines with issues have 
heat problems for some reason (different case maybe?)

Jesse

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to