Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)
SOLVED On Tue, Feb 9, 2010 at 9:48 AM, Arun Khan knu...@gmail.com wrote: I am going to request the hardware vendor to send their engineer and to take on the next steps. The hardware vendor finally sent their technical support team. Will post final findings over here when the problem is solved. After investigation, they concluded the card was overheating and this was the possible cause of link going up/down. They have changed the location of the server. Hopefully, the problem will not manifest again. -- Arun Khan ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)
File Server OS: CentOS 5.3 (x86_64) Kernel: CentOS Plus kernel (need XFS fs drivers) The file server has a Chelsio T310 10GBASE-CX4 RNIC (rev 3) PCI Express x8 MSI-X (eth0), driver and firmware is stock from the CentOS Plus kernel. Using ethtool I have verified driver association with the 3 NICs on the system (eth1 and eth2 are not connected to any switch) Driver for eth0 driver: cxgb3 version: 1.1.3-ko firmware-version: T 7.4.0 TP 1.1.0 Driver for eth1 driver: e1000e version: 1.0.2-k2 firmware-version: 1.0-0 Driver for eth2 driver: e1000e version: 1.0.2-k2 firmware-version: 1.0-0 The last 3-4 weeks, I have noticed that the eth0 link keeps going up and down, confirmed by dmesg output as well in /var/log/messages (dmesg sample shown below). eth0: link down eth0: link up, 10Gbps, full-duplex eth0: link down eth0: link up, 10Gbps, full-duplex eth0: link down eth0: link up, 10Gbps, full-duplex The kernel RPM verification shows no errors # uname --kernel-release 2.6.18-164.2.1.el5.plus # rpm --verify kernel-2.6.18-164.2.1.el5.plus The hardware vendor tells me that the card either fails completely (kaput) or works - there is no grey area. He is of the opinion that the problem is with the driver. Verification of the kernel rpm tells me that all files including the cxgb3 driver file md5sum are OK. I would like to hear from anyone with the same NIC or another rev. using the same driver. Are you seeing similar link up/down in your system? How did you solve the problem? TIA -- Arun Khan ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)
Hi Arun, On Mon, Feb 8, 2010 at 6:43 PM, Arun Khan knu...@gmail.com wrote: The file server has a Chelsio T310 10GBASE-CX4 RNIC (rev 3) PCI Express x8 MSI-X (eth0), driver and firmware is stock from the CentOS Plus kernel. Way way back, I had similar problems on a bunch of servers with 3Com cards and some 3Com switches. It turns out to be the autonegotiation of the 3Com cards and switches we got at that time were buggy and some idiot had set the switches to 100Mbit full duplex and the cards to autoneg and it kept initiating autonegation and the buggy card kept on doing the wrong rate. The result was poor performance and constant link down/up cycle. It's worth checking what the switch side is set to. Also setting both sides to a particular value might help or just remove all and leave for autonegation. The hardware vendor tells me that the card either fails completely (kaput) or works - there is no grey area. He is of the opinion that the problem is with the driver. The last thing, IMHO, never trust a supplier trying to wriggle out of a support case :) I've seen plenty of network cards that's on the way but not dead yet. -- Hakan (m1fcj) - http://www.hititgunesi.org ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)
Hi Jobst, Brent, and Hakan, Thanks for your inputs. I sincerely appreciate your suggestions and sharing your experience. I have posted a support query with Chelsio as well. In my case the connection is a Fiber cable. On the switch side there is only one 10GB port so there is not much I can do. Luckily the switch and the cable, besides the NIC, has also been supplied by the same vendor. I am going to request the hardware vendor to send their engineer and to take on the next steps. Will post final findings over here when the problem is solved. -- Arun Khan ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos