В Fri, 29 Jan 2010 01:29:05 +0200, "Покотиленко Костик" пишет:
> В Чтв, 28/01/2010 в 14:32 -0800, Alexander Duyck пишет: >> On Wed, 2010-01-27 at 04:14 -0800, Покотиленко Костик wrote: >> > Using serial console I've figured out: >> > >> > - system working fine except for the NIC >> > - ifconfig show only RX dropped increasing on eth1 (client side), other >> > counters stailed. >> > - ethtool -t eth0: >> > >> > The test result is FAIL >> > The test extra info: >> > Register test (offline) 0 >> > Eeprom test (offline) 0 >> > Interrupt test (offline) 0 >> > Loopback test (offline) 13 >> > Link test (on/offline) 0 >> > >> > - ethtool -t eth1 >> > >> > The test result is FAIL >> > The test extra info: >> > Register test (offline) 0 >> > Eeprom test (offline) 0 >> > Interrupt test (offline) 0 >> > Loopback test (offline) 13 >> > Link test (on/offline) 0 >> > >> > - After doing: >> > >> > ifdown -a; rmmod igb; rmmod dca; modprobe igb; ifup -a >> > >> > both ethtool commands (The test result is FAIL) and ifconfig show same >> > result >> > >> > So it seems like NIC hawdware hand. >> >> The next time this occurs could you go though and run the ethtool test >> on all of the network ports? I'm wondering if it is only eth0/1 that >> are blocked or if eth3/4 are stopped as well. > > Sure. Last time we have changed some BIOS options to: Execute Disable Bit: Disabled ACPI 1.0 Support: Enabled (When Disabled it's 3.0(??)) After which system worked for almost 9 days with 2.6.30. Then the same problem. Forgot to do ethtool test for all ports :/ >> > I don't think this problem is related to something other then NIC / igb >> > driver. If there are HW problems like memory or power I would notice >> > other system problems not just NIC, itsn't it?\ >> >> I'm wondering if this issue might somehow be a PCIe problem. The fact >> that the loopback test is failing tells me that the issue is likely >> somehow related to the NIC's ability to perform DMA transactions since >> that is essentially all the loopback test does. >> >> One of the reasons why I am thinking it is something in the system is >> because both eth0 and eth1 fail at the same time. From the software's >> perspective these ports appear as two separate devices, but there are >> certain physical items that are shared such as the PCIe physical link >> and it is possible that there may be some sort of issue there that is >> causing the hangs and resets. By doing an ethtool test on eth3/4 we >> will at least know if the issue extends to the bridge on the NIC or if >> it is only eth0/1. > > This is one of the most probable sources of the problem I think. > Considering that we also have had excactly the same problem with e1000e > onboard cards + deep hang or reboots. > > The question is how to debug this. > > My guess it that this is due to a HW being too new and maybe some kernel > subsystem did not have enough testing. > > Maybe I should also join some kernel driver list to discuss this > problem, but don't know which one. > >> > If I can do more testing let me know. Moving NIC to other server isn't >> > option for me. >> > >> > The server is quite new, could it be IRQ related problem, i.e. >> > motherboard not fully supported by <=2.6.30? >> > >> >> I'm not suspecting an IRQ problem because the loopback test doesn't do >> anything with the interrupts. Also one of the tests that are performed >> in the ethtool testing is an interrupt test and the fact that it passed >> means that interrupts are behaving as expected. > > One guy in bug comment told that "pcie_aspm=off" solved problem with > 82574L for him. I tried this option hoping it could also help with 82576 > with no luck. He also suggested switching off "PCIe Powermanagement" in > the Bios, but I don't have such BIOS option. > > Maybe there are other kernel options to try? > > Also, there is BIOS update available from Intel recently, would you > suggest to update? I have previous one. > > Planning to try 2.6.32 tomorrow. > > -- > Покотиленко Костик <[email protected]> > > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > E1000-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
