В Чтв, 28/01/2010 в 14:32 -0800, Alexander Duyck пишет:
> On Wed, 2010-01-27 at 04:14 -0800, Покотиленко Костик wrote:
> > Using serial console I've figured out:
> > 
> > - system working fine except for the NIC
> > - ifconfig show only RX dropped increasing on eth1 (client side), other
> > counters stailed.
> > - ethtool -t eth0:
> > 
> > The test result is FAIL
> > The test extra info:
> > Register test  (offline)         0
> > Eeprom test    (offline)         0
> > Interrupt test (offline)         0
> > Loopback test  (offline)         13
> > Link test   (on/offline)         0
> > 
> > - ethtool -t eth1
> > 
> > The test result is FAIL
> > The test extra info:
> > Register test  (offline)         0
> > Eeprom test    (offline)         0
> > Interrupt test (offline)         0
> > Loopback test  (offline)         13
> > Link test   (on/offline)         0
> > 
> > - After doing:
> > 
> > ifdown -a; rmmod igb; rmmod dca; modprobe igb; ifup -a
> > 
> > both ethtool commands (The test result is FAIL) and ifconfig show same
> > result
> > 
> > So it seems like NIC hawdware hand.
> 
> The next time this occurs could you go though and run the ethtool test
> on all of the network ports?  I'm wondering if it is only eth0/1 that
> are blocked or if eth3/4 are stopped as well.

Sure.

> > I don't think this problem is related to something other then NIC / igb
> > driver. If there are HW problems like memory or power I would notice
> > other system problems not just NIC, itsn't it?\
> 
> I'm wondering if this issue might somehow be a PCIe problem. The fact
> that the loopback test is failing tells me that the issue is likely
> somehow related to the NIC's ability to perform DMA transactions since
> that is essentially all the loopback test does.  
> 
> One of the reasons why I am thinking it is something in the system is
> because both eth0 and eth1 fail at the same time.  From the software's
> perspective these ports appear as two separate devices, but there are
> certain physical items that are shared such as the PCIe physical link
> and it is possible that there may be some sort of issue there that is
> causing the hangs and resets.  By doing an ethtool test on eth3/4 we
> will at least know if the issue extends to the bridge on the NIC or if
> it is only eth0/1.

This is one of the most probable sources of the problem I think.
Considering that we also have had excactly the same problem with e1000e
onboard cards + deep hang or reboots.

The question is how to debug this.

My guess it that this is due to a HW being too new and maybe some kernel
subsystem did not have enough testing.

Maybe I should also join some kernel driver list to discuss this
problem, but don't know which one.

> > If I can do more testing let me know. Moving NIC to other server isn't
> > option for me.
> > 
> > The server is quite new, could it be IRQ related problem, i.e.
> > motherboard not fully supported by <=2.6.30?
> > 
> 
> I'm not suspecting an IRQ problem because the loopback test doesn't do
> anything with the interrupts.  Also one of the tests that are performed
> in the ethtool testing is an interrupt test and the fact that it passed
> means that interrupts are behaving as expected.

One guy in bug comment told that "pcie_aspm=off" solved problem with
82574L for him. I tried this option hoping it could also help with 82576
with no luck. He also suggested switching off "PCIe Powermanagement" in
the Bios, but I don't have such BIOS option.

Maybe there are other kernel options to try?

Also, there is BIOS update available from Intel recently, would you
suggest to update? I have previous one.

Planning to try 2.6.32 tomorrow.

-- 
Покотиленко Костик <[email protected]>


------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to