Hi Lars, just looking back over old emails, and I did notice that at least with 
one of your messages the stats still showed    tx_tcp_seg_good: 13136

Which means that you still had TSO enabled.

Can you make absolutely sure that ethtool -K ethX tso off is done on each 82541 
interface?

The other thing that might be relevant is if you have >= 4GB ram.



-----Original Message-----
From: Lars Ehrhardt [mailto:[email protected]] 
Sent: Monday, March 22, 2010 4:15 PM
To: Ronciak, John
Cc: [email protected]
Subject: Re: [E1000-devel] Network stalls with e1000 driver and 82541 network 
chips

Dear John,

Ronciak, John wrote:

> Thanks, it's a bit hard to try and translate this into some we can 
> understand.  :-(

Let me know, if there is anything specific you'd like to know and I'll
try to translate those bits.

>> I am getting dropped packets on the 82572EI interfaces as well.
> Thanks not good.  This means that the interrupts are not being 
> serviced fast enough to keep up with the traffic.  With 5 networking 
> ports it doesn't surprise me.  What kind of tests are you running to 
> cause this?  It's unclear if this system can withstand the traffic 
> from these ports.  Have you tried to run the test on a single port to
>  see if the drops happen then as well?  Try to see where the problem 
> starts to happen.  Are interrupts being shared between the devices? 
> What OS are you running?

We are running Debian Lenny with different kernel versions. At the
moment we are testing with 2.6.30 bpo version. Reading through the
archives I've tried the module options
TxDescriptorStep=4 TxDescriptors=1024
with the e1000 module.

This changes the behaviour. We no longer have tx unit hang messages in
the log, but the link nevertheless goes down sporadically and comes back
after some seconds. There is no down message in the logs, just the up
message:

syslog:Mar 22 17:34:09 gw kernel: [765361.074217] e1000: aur-mgt:
e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control:
None
syslog:Mar 22 22:28:21 gw kernel: [783013.698259] e1000: aur-mgt:
e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control:
None

Interrupts of the 82541 ports are shared with USB, interrupts of the
82572 ports are not shared. I think that the error rate corresponds to
the load of the interfaces somehow. Interfaces with little traffic have
a smaller value of rx_no_buffer_count and rx_missed_errors than
interfaces with lots of traffic.

            CPU0       CPU1
   0:       2676          0   IO-APIC-edge      timer
   1:          2          0   IO-APIC-edge      i8042
   3:     896339          0   IO-APIC-edge      serial
   4:         11          0   IO-APIC-edge
   7:          0          0   IO-APIC-edge      parport0
   8:         56          0   IO-APIC-edge      rtc0
   9:          0          0   IO-APIC-fasteoi   acpi
  14:          0          0   IO-APIC-edge      ide0
  16:      14766          0   IO-APIC-fasteoi   uhci_hcd:usb4, ath
  18:  193566354          0   IO-APIC-fasteoi   uhci_hcd:usb3, aur-mgt
  19:    1904270          0   IO-APIC-fasteoi   uhci_hcd:usb2, gst
  23:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb5
  28:    6776261          0   PCI-MSI-edge      pbr-Q0
  29:          2          0   PCI-MSI-edge      pbr
  30:   66797611          0   PCI-MSI-edge      dmz-Q0
  31:        871          0   PCI-MSI-edge      dmz
  32:  218588672          0   PCI-MSI-edge      inet-Q0
  33:     475455          0   PCI-MSI-edge      inet
  35:    5566356          0   PCI-MSI-edge      ahci
 NMI:          0          0   Non-maskable interrupts
 LOC:   70790723   55485197   Local timer interrupts
 SPU:          0          0   Spurious interrupts
 RES:     303286     277829   Rescheduling interrupts
 CAL:        127        293   Function call interrupts
 TLB:     237854      61406   TLB shootdowns

Strange thing is we have five of those devices, two show this behavior,
three don't.  I might be able to dedicate a single device for further
testing. Would it help to diagnose further if you had shell access to
one of those devices?

Best regards,

Lars

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to