В Вто, 26/01/2010 в 09:35 -0800, Duyck, Alexander H пишет: > Покотиленко Костик wrote: > > Hi, > > > > Can somebody investigate please? Bug posted 19.01.2010/ > > > > I have tried: > > - 2.6.29 + igb 2.0.6 > > - 2.6.30 + igb 2.0.6 > > - 2.6.30 + igb 2.1.9 > > > > all resulting in deep hang or network down or reboot in 1-20 hours > > randomly. > > > > I have only 3 more variations to try: > > - 2.6.30 + in kernel igb > > - 2.6.32 + in kernel igb > > - 2.6.32 + igb 2.1.9 > >
Today I switched to 2.6.30 + in kernel igb 1.3.16-k2. Working fine for 6+ hours, as for now. Noticed that it by default use 4 rx-queue and 4 tx-queue for each NIC and uses all cores available. 2.0.6 and 2.1.9 used 1 core per NIC by default. > > And please can somebody tell which one of the drivers is to be > > considered more stable, the one in kernel or the one from sf.net? > I'm curious. You say the device is causing reboots. Is this due to a > kernel panic followed by a reboot or does the system just reboot? Regarding last bug ID: 2934941, system become disconnected from network at the same time alot of "Detected Tx Unit Hang" printing to console and logs. Some times it just stays in this state (disconnected + error being printed, but system is responding), sometimes after being in this state for few minutes it just reboots. I didn't have any chance to see "kernel panic" message. Most of the time system become disconnected when there are nobody around it, so we just remotely power down/up through cli like IPMI. Today I've set up serial console connected to a router nearby with independant Internet connection, so I can "see" what happens when it get disconnected, and if it still alive I can do clean reboot. > If the entire system is rebooting I would suspect a bigger issue such > as problems in the system memory, power issues, or an issue in the > kernel. Good guess, but until "Detected Tx Unit Hang" there is no other signs of any instabilities. Everything works perfect until that. > In 2907473 you mentioned also having SATA issues. This leads me to > wonder if there is a problem with the Mainboard or components in the > system you are currently using. In this case everything also worked perfect until NIC problems. I would notice, we have nagois and munin. Also I was working on console while few of those problem occured. > In the bug you mentioned that you had recently upgraded to this > server. Would it be possible to try installing the ET Quad port > server adapter in that system and run the same tests that you are > currently running in this system. If you mean installing ET Quad port server adapter in old system - it's impossible, there was PCI only board. > My main concern is that this issue could be due to something outside > of our control since the SATA seemed to be experiencing an I/O stall > at the same time as the network adapter. Well, first, SATA and NIC problems poped up in the same time only in 2907473 case with 82574L. Now with ET Quad port I don't see anything except NIC problems. Also, this hardware successfully compiles kernel with CONCURENCY_LEVEL=10, done many times. > If we can test this in a known good platform we might be able to > verify if the issue is a problem in the server or not. Agreed, but we don't have any spare server with PCI-e x4 v2.0 :( > In the bugs that you filed you mentioned that you have been putting > additional patches on top of the kernel. In the tests you have > recently done have any of the kernels you tested not included the > patches you mentioned? If not you may want to try running just a > plain kernel and see if the same issues occur. I thought about that. But, the router is closely interconnected with a billing software, and the whole solution requires ipset and imq. So, making such test means leaving network down. Also, problem may not occur for more than 20 hours. With ET Quad port the record is ~36 hours. -- Покотиленко Костик <[email protected]> ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
