В Пнд, 08/02/2010 в 14:03 -0800, Duyck, Alexander H пишет:
> Покотиленко Костик wrote:
> > В Fri, 29 Jan 2010 01:29:05 +0200, "Покотиленко Костик" пишет:
> >
> >> В Чтв, 28/01/2010 в 14:32 -0800, Alexander Duyck пишет:
> >>> On Wed, 2010-01-27 at 04:14 -0800, Покотиленко Костик wrote:
> >>>> Using serial console I've figured out:
> >>>>
> >>>> - system working fine except for the NIC
> >>>> - ifconfig show only RX dropped increasing on eth1 (client side),
> >>>> other counters stailed.
> >>>> - ethtool -t eth0:
> >>>>
> >>>> The test result is FAIL
> >>>> The test extra info:
> >>>> Register test (offline) 0
> >>>> Eeprom test (offline) 0
> >>>> Interrupt test (offline) 0
> >>>> Loopback test (offline) 13
> >>>> Link test (on/offline) 0
> >>>>
> >>>> - ethtool -t eth1
> >>>>
> >>>> The test result is FAIL
> >>>> The test extra info:
> >>>> Register test (offline) 0
> >>>> Eeprom test (offline) 0
> >>>> Interrupt test (offline) 0
> >>>> Loopback test (offline) 13
> >>>> Link test (on/offline) 0
> >>>>
> >>>> - After doing:
> >>>>
> >>>> ifdown -a; rmmod igb; rmmod dca; modprobe igb; ifup -a
> >>>>
> >>>> both ethtool commands (The test result is FAIL) and ifconfig show
> >>>> same result
> >>>>
> >>>> So it seems like NIC hawdware hand.
> >>>
> >>> The next time this occurs could you go though and run the ethtool
> >>> test on all of the network ports? I'm wondering if it is only
> >>> eth0/1 that are blocked or if eth3/4 are stopped as well.
> >>
> >> Sure.
> >
> > Last time we have changed some BIOS options to:
> >
> > Execute Disable Bit: Disabled
> > ACPI 1.0 Support: Enabled (When Disabled it's 3.0(??))
> >
> > After which system worked for almost 9 days with 2.6.30. Then the
> same
> > problem.
> >
> > Forgot to do ethtool test for all ports :/
>
> Based on the results it seems like what is failing is the hardware's
> ability to handle DMA transactions. Ideally if possible it would be
> best if you could do an lspci -t dump of the system and work your way
> up until you find at which point in the tree we have the failure. The
> ethtool -t test seems to show the failure as a loopback test so we
> should be able to at least test this up to the PCIe bridge on the
> adapter.
lspci -tv attached.
During last 2 days system rebooted twice shortly after the problem
occured, so not ethtool tests yet.
BTW, I have many "UDP: bad checksum" messages before the issue occurs
like this:
Feb 8 18:49:16 lan-r kernel: [99067.458074] UDP: bad checksum. From
95.169.150.116:48810 to 89.28.200.210:1126 ulen 181
Feb 8 18:49:24 lan-r kernel: [99074.976709] __ratelimit: 29 callbacks
suppressed
Also today there was:
Feb 9 09:57:33 lan-r kernel: [53517.383722] igb 0000:03:00.1: Detected
Tx Unit Hang
Feb 9 09:57:33 lan-r kernel: [53517.383725] Tx Queue <0>
Feb 9 09:57:33 lan-r kernel: [53517.383729] TDH <aa>
Feb 9 09:57:33 lan-r kernel: [53517.383730] TDT <e8>
Feb 9 09:57:33 lan-r kernel: [53517.383730] next_to_use <e8>
Feb 9 09:57:33 lan-r kernel: [53517.383731] next_to_clean <aa>
Feb 9 09:57:33 lan-r kernel: [53517.383732] buffer_info[next_to_clean]
Feb 9 09:57:33 lan-r kernel: [53517.383732] time_stamp
<cb1921>
Feb 9 09:57:33 lan-r kernel: [53517.383733] next_to_watch <ab>
Feb 9 09:57:33 lan-r kernel: [53517.383734] jiffies
<cb1c48>
Feb 9 09:57:33 lan-r kernel: [53517.383734] desc.status
<158000>
But the system still alive.
> Also if ACPI is having an effect on the issue one other thing you
> might try changing in the BIOS would be to disable all CPU C-states.
> The system will consume more power as a result, but the CPU also ends
> up usually being much more responsive as a result, and we have seen in
> the past that this can sometimes resolve performance issues.
I'll turn those off:
CPU C State=1 ;Options: 1=Enabled: 0=Disabled
C1E=1 ;Options: 1=Enabled: 0=Disabled
Full current BIOS config attached.
--
Покотиленко Костик <[email protected]>
-[0000:00]-+-00.0 Intel Corporation Core Processor DMI
+-05.0-[0000:01-08]----00.0-[0000:02-08]--+-02.0-[0000:03-05]--+-00.0 Intel
Corporation 82576 Gigabit Network Connection
| |
\-00.1 Intel Corporation 82576 Gigabit Network Connection
|
\-04.0-[0000:06-08]--+-00.0 Intel Corporation 82576 Gigabit Network Connection
|
\-00.1 Intel Corporation 82576 Gigabit Network Connection
+-08.0 Intel Corporation Core Processor System Management Registers
+-08.1 Intel Corporation Core Processor Semaphore and Scratchpad
Registers
+-08.2 Intel Corporation Core Processor System Control and Status
Registers
+-08.3 Intel Corporation Core Processor Miscellaneous Registers
+-10.0 Intel Corporation Core Processor QPI Link
+-10.1 Intel Corporation Core Processor QPI Routing and Protocol
Registers
+-19.0 Intel Corporation 82578DM Gigabit Network Connection
+-1a.0 Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced
Host Controller
+-1c.0-[0000:09]--
+-1c.4-[0000:0a]----00.0 Intel Corporation 82574L Gigabit Network
Connection
+-1c.6-[0000:0b]----00.0 Matrox Graphics, Inc. MGA G200e [Pilot]
ServerEngines (SEP1)
+-1c.7-[0000:0c]--
+-1d.0 Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced
Host Controller
+-1e.0-[0000:0d]--
+-1f.0 Intel Corporation 3400 Series Chipset LPC Interface
Controller
+-1f.2 Intel Corporation 5 Series/3400 Series Chipset 4 port SATA
IDE Controller
+-1f.3 Intel Corporation 5 Series/3400 Series Chipset SMBus
Controller
\-1f.5 Intel Corporation 5 Series/3400 Series Chipset 2 port SATA
IDE Controller
; Warning!!! Warning!!! Warning!!!
; ---------------------------------
; This file has been generated in a system with the BIOS/Firmware
; specifications as mentioned under [SYSTEM] section. Please do not
; modify or edit any information in this section. Attempt to restore
; these information in incompatible systems could cause serious
; problems to the sytems and could lead the system non-functional.
; Note: The file is best seen using wordpad.
[SYSTEM]
BIOSVersion=S3420GP.86B.01.00.0027.091120091739 ; This field should not
be edited
FWBootVersion=22 ; This field should not
be edited
FWOpcodeVersion=14 ; This field should not
be edited
PIAVersion=14 ; This field should not
be edited
[BIOS]
[BIOS::ADVANCED]
[BIOS::ADVANCED::MEMORY CONFIGURATION]
Margin Ranks=0 ;Options: 2=DOE Margin
Check: 1=Enable: 0=Disable
[BIOS::ADVANCED::MASS STORAGE CONTROLLER CONFIGURATION]
Onboard SATA Controller=1 ;Options: 1=Enabled:
0=Disabled
SATA Mode=0 ;Options: 4=Matrix RAID:
2=Intel ESRT2: 1=AHCI: 3=COMPATIBILITY: 0=ENHANCED
SATA Mode=0 ;Options: 4=Matrix RAID:
1=AHCI: 3=COMPATIBILITY: 0=ENHANCED
[BIOS::ADVANCED::SERIAL PORT CONFIGURATION]
[BIOS::ADVANCED::SERIAL PORT CONFIGURATION::SERIAL A ENABLE]
Serial A Enable=1 ;Options: 1=Enabled:
0=Disabled
Address=1016 ;Options: 744=2E8:
1000=3E8: 760=2F8: 1016=3F8
IRQ=4 ;Options: 4=4: 3=3
[BIOS::ADVANCED::SERIAL PORT CONFIGURATION::SERIAL B ENABLE]
Serial B Enable=1 ;Options: 1=Enabled:
0=Disabled
Address=760 ;Options: 744=2E8:
1000=3E8: 760=2F8: 1016=3F8
IRQ=3 ;Options: 4=4: 3=3
[BIOS::ADVANCED::USB CONFIGURATION]
USB Controller=1 ;Options: 1=Enabled:
0=Disabled
Legacy USB Support=0 ;Options: 2=Auto:
1=Disabled: 0=Enabled
Port 60/64 Emulation=1 ;Options: 1=Enabled:
0=Disabled
Make USB Devices Non-Bootable=0 ;Options: 1=Enabled:
0=Disabled
Device Reset Timeout=1 ;Options: 3=40 seconds:
2=30 seconds: 1=20 seconds: 0=10 seconds
[BIOS::ADVANCED::PCI CONFIGURATION]
Memory Mapped I/O above 4GB=0 ;Options: 1=Enabled:
0=Disabled
Onboard Video=0 ;Options: 1=Disabled:
0=Enabled
Dual Monitor Video=0 ;Options: 1=Enabled:
0=Disabled
Onboard NIC1 ROM=1 ;Options: 1=Enabled:
0=Disabled
Onboard NIC2 ROM=1 ;Options: 1=Enabled:
0=Disabled
[BIOS::ADVANCED::SYSTEM ACOUSTICS AND PERFORMANCE CONFIGURATION]
Set Throttling Mode=0 ;Options: 2=CLTT: 1=OLTT:
0=Auto
Altitude=900 ;Options: 3000=Higher
than 1500m: 1500=901m - 1500m: 900=301m - 900m: 300=300m or less
Set Fan Profile=2 ;Options: 2=Acoustic:
1=Performance
[BIOS::MEMORY CONFIGURATION]
[BIOS::DIMM DISABLE]
[BIOS::THERMAL THROTTLING]
[BIOS::MEMORY MAP]
[BIOS::TYLERSBURG]
[BIOS::TYLERSBURG IOH 0]
[BIOS::TYLERSBURG CONFIGURATION]
[BIOS::INTEL╝ VT FOR DIRECTED I/O (VT-D)]
[BIOS::IOH DEVICE AND FUNCTION HIDE OPTIONS]
[BIOS::PCI EXPRESS PORT 0]
[BIOS::PCI EXPRESS PORT 1]
[BIOS::PCI EXPRESS PORT 2]
[BIOS::PCI EXPRESS PORT 3]
[BIOS::PCI EXPRESS PORT 4]
[BIOS::PCI EXPRESS PORT 5]
[BIOS::PCI EXPRESS PORT 6]
[BIOS::PCI EXPRESS PORT 7]
[BIOS::PCI EXPRESS PORT 8]
[BIOS::PCI EXPRESS PORT 9]
[BIOS::PCI EXPRESS PORT 10]
[BIOS::ICH9/ICH10 CONFIGURATION]
[BIOS::ICH PCIE CONFIGURATION]
[BIOS::ICH MISC DEVICES CONFIGURATION]
System State After Power Failure=1 ;Options: 1=On: 0=Off
[BIOS::ICH SATA CONFIGURATION]
[BIOS::ICH USB CONFIGURATION]
[BIOS::PROCESSOR CONFIGURATION]
Intel(R) QPI Frequency Select=0 ;Options: 32=Auto Strap:
3=6.4 GT/s: 2=5.866 GT/s: 1=4.8 GT/s: 0=Auto Max
Turbo Mode=0 ;Options: 1=Enabled:
0=Disabled
Enhanced Intel SpeedStep(R) Tech=0 ;Options: 1=Enabled:
0=Disabled
CPU C State=1 ;Options: 1=Enabled:
0=Disabled
Processor C3 report=0 ;Options: 2=ACPI C3:
1=ACPI C2: 0=Disabled
C1E=1 ;Options: 1=Enabled:
0=Disabled
Processor C6 report=0 ;Options: 1=Enabled:
0=Disabled
Intel(R) Hyper-Threading Tech=1 ;Options: 0=Enabled:
1=Disabled
Core Multi-Processing=0 ;Options: 2=2: 1=1: 0=All
Execute Disable Bit=0 ;Options: 1=Enabled:
0=Disabled
Intel(R) Virtualization Technology=0 ;Options: 1=Enabled:
0=Disabled
Intel(R) VT for Directed I/O=0 ;Options: 1=Enabled:
0=Disabled
Hardware Prefetcher=0 ;Options: 0=Enabled:
1=Disabled
Adjacent Cache Line Prefetch=0 ;Options: 0=Enabled:
1=Disabled
Spread Spectrum=1 ;Options: 1=Enabled:
0=Disabled
DB1200 Configuration Setting=0 ;Options: 3=HI_BW,
ByPass: 2=HI_BW, PLL: 1=Low_BW, ByPass: 0=Low_BW, PLL
[BIOS::MAIN]
Quiet Boot=1 ;Options: 1=Enabled:
0=Disabled
POST Error Pause=0 ;Options: 1=Enabled:
0=Disabled
[BIOS::SECURITY]
Front Panel Lockout=0 ;Options: 1=Enabled:
0=Disabled
[BIOS::SERVER MANAGEMENT]
Assert NMI on SERR=1 ;Options: 1=Enabled:
0=Disabled
Assert NMI on PERR=1 ;Options: 1=Enabled:
0=Disabled
Resume on AC Power Loss=2 ;Options: 2=Reset: 1=Last
state: 0=Stay Off
Clear System Event Log=0 ;Options: 1=Enabled:
0=Disabled
FRB-2 Enable=1 ;Options: 1=Enabled:
0=Disabled
OS Boot Watchdog Timer=0 ;Options: 1=Enabled:
0=Disabled
Plug & Play BMC Detection=0 ;Options: 1=Enabled:
0=Disabled
ACPI 1.0 Support=1 ;Options: 1=Enabled:
0=Disabled
[BIOS::SERVER MANAGEMENT::CONSOLE REDIRECTION]
Console Redirection=0 ;Options: 2=Serial Port
B: 1=Serial Port A: 0=Disabled
[BIOS::SERVER MANAGEMENT::BMC LAN CONFIGURATION]
IP source=0 ;Options: 2=Dynamic:
1=Static
IP source=0 ;Options: 2=Dynamic:
1=Static
User ID=0 ;Options: 5=User5:
4=User4: 3=User3: 2=root: 1=anonymous
[BIOS::SYSTEM BOOTORDER]
1=Primary Slave Hard Disk
2=Internal EFI Shell
3=IBA GE Slot 00C8 v1335
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired