В Fri, 29 Jan 2010 01:29:05 +0200, "Покотиленко Костик" пишет:

> В Чтв, 28/01/2010 в 14:32 -0800, Alexander Duyck пишет:
>> On Wed, 2010-01-27 at 04:14 -0800, Покотиленко Костик wrote:
>> > Using serial console I've figured out:
>> >
>> > - system working fine except for the NIC
>> > - ifconfig show only RX dropped increasing on eth1 (client side), other
>> > counters stailed.
>> > - ethtool -t eth0:
>> >
>> > The test result is FAIL
>> > The test extra info:
>> > Register test  (offline)         0
>> > Eeprom test    (offline)         0
>> > Interrupt test (offline)         0
>> > Loopback test  (offline)         13
>> > Link test   (on/offline)         0
>> >
>> > - ethtool -t eth1
>> >
>> > The test result is FAIL
>> > The test extra info:
>> > Register test  (offline)         0
>> > Eeprom test    (offline)         0
>> > Interrupt test (offline)         0
>> > Loopback test  (offline)         13
>> > Link test   (on/offline)         0
>> >
>> > - After doing:
>> >
>> > ifdown -a; rmmod igb; rmmod dca; modprobe igb; ifup -a
>> >
>> > both ethtool commands (The test result is FAIL) and ifconfig show same
>> > result
>> >
>> > So it seems like NIC hawdware hand.
>>
>> The next time this occurs could you go though and run the ethtool test
>> on all of the network ports?  I'm wondering if it is only eth0/1 that
>> are blocked or if eth3/4 are stopped as well.
>
> Sure.

Last time we have changed some BIOS options to:

Execute Disable Bit: Disabled
ACPI 1.0 Support: Enabled (When Disabled it's 3.0(??))

After which system worked for almost 9 days with 2.6.30. Then the same  
problem.

Forgot to do ethtool test for all ports :/

>> > I don't think this problem is related to something other then NIC / igb
>> > driver. If there are HW problems like memory or power I would notice
>> > other system problems not just NIC, itsn't it?\
>>
>> I'm wondering if this issue might somehow be a PCIe problem. The fact
>> that the loopback test is failing tells me that the issue is likely
>> somehow related to the NIC's ability to perform DMA transactions since
>> that is essentially all the loopback test does.
>>
>> One of the reasons why I am thinking it is something in the system is
>> because both eth0 and eth1 fail at the same time.  From the software's
>> perspective these ports appear as two separate devices, but there are
>> certain physical items that are shared such as the PCIe physical link
>> and it is possible that there may be some sort of issue there that is
>> causing the hangs and resets.  By doing an ethtool test on eth3/4 we
>> will at least know if the issue extends to the bridge on the NIC or if
>> it is only eth0/1.
>
> This is one of the most probable sources of the problem I think.
> Considering that we also have had excactly the same problem with e1000e
> onboard cards + deep hang or reboots.
>
> The question is how to debug this.
>
> My guess it that this is due to a HW being too new and maybe some kernel
> subsystem did not have enough testing.
>
> Maybe I should also join some kernel driver list to discuss this
> problem, but don't know which one.
>
>> > If I can do more testing let me know. Moving NIC to other server isn't
>> > option for me.
>> >
>> > The server is quite new, could it be IRQ related problem, i.e.
>> > motherboard not fully supported by <=2.6.30?
>> >
>>
>> I'm not suspecting an IRQ problem because the loopback test doesn't do
>> anything with the interrupts.  Also one of the tests that are performed
>> in the ethtool testing is an interrupt test and the fact that it passed
>> means that interrupts are behaving as expected.
>
> One guy in bug comment told that "pcie_aspm=off" solved problem with
> 82574L for him. I tried this option hoping it could also help with 82576
> with no luck. He also suggested switching off "PCIe Powermanagement" in
> the Bios, but I don't have such BIOS option.
>
> Maybe there are other kernel options to try?
>
> Also, there is BIOS update available from Intel recently, would you
> suggest to update? I have previous one.
>
> Planning to try 2.6.32 tomorrow.
>
> --
> Покотиленко Костик <[email protected]>
>
>
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> E1000-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit  
> http://communities.intel.com/community/wired
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to