Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)

2010-02-11 Thread Arun Khan
SOLVED

On Tue, Feb 9, 2010 at 9:48 AM, Arun Khan knu...@gmail.com wrote:

 I am going to request the hardware vendor to send their engineer and
 to take on the next steps.

The hardware vendor finally sent their technical support team.

 Will post final findings over here when the problem is solved.
After investigation, they concluded the card was overheating and this
was the possible cause of  link going up/down.
They have changed the location of the server.  Hopefully, the problem
will not manifest again.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)

2010-02-08 Thread Arun Khan
File Server OS: CentOS 5.3 (x86_64)
Kernel: CentOS  Plus kernel (need XFS fs drivers)

The file server has a Chelsio T310 10GBASE-CX4 RNIC (rev 3) PCI
Express x8 MSI-X (eth0), driver and firmware is stock from the CentOS
Plus kernel.

Using ethtool  I have verified driver association with the 3 NICs on
the system (eth1 and eth2 are not connected to any switch)

Driver for eth0
driver: cxgb3
version: 1.1.3-ko
firmware-version: T 7.4.0 TP 1.1.0

Driver for eth1
driver: e1000e
version: 1.0.2-k2
firmware-version: 1.0-0

Driver for eth2
driver: e1000e
version: 1.0.2-k2
firmware-version: 1.0-0


The last 3-4 weeks, I have noticed that the eth0 link keeps going up
and down, confirmed by dmesg output as well in /var/log/messages
(dmesg sample shown below).

eth0: link down
eth0: link up, 10Gbps, full-duplex
eth0: link down
eth0: link up, 10Gbps, full-duplex
eth0: link down
eth0: link up, 10Gbps, full-duplex

The kernel RPM verification shows no errors

# uname --kernel-release
2.6.18-164.2.1.el5.plus

# rpm --verify kernel-2.6.18-164.2.1.el5.plus

The hardware vendor tells me that the card either fails completely
(kaput) or works - there is no grey area.  He is of the opinion that
the problem is with the driver.

Verification of the kernel rpm tells me that all files including the
cxgb3 driver file md5sum are OK.

I would like to hear from anyone with the same NIC or another rev.
using the same driver.
Are you seeing similar link up/down in your system?
How did you solve the problem?

TIA
-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)

2010-02-08 Thread Hakan Koseoglu
Hi Arun,

On Mon, Feb 8, 2010 at 6:43 PM, Arun Khan knu...@gmail.com wrote:
 The file server has a Chelsio T310 10GBASE-CX4 RNIC (rev 3) PCI
 Express x8 MSI-X (eth0), driver and firmware is stock from the CentOS
 Plus kernel.
Way way back, I had similar problems on a bunch of servers with 3Com
cards and some 3Com switches.
It turns out to be the autonegotiation of the 3Com cards and switches
we got at that time were buggy and some idiot had set the switches to
100Mbit full duplex and the cards to autoneg and it kept initiating
autonegation and the buggy card kept on doing the wrong rate. The
result was poor performance and constant link down/up cycle.

It's worth checking what the switch side is set to. Also setting both
sides to a particular value might help or just remove all and leave
for autonegation.

 The hardware vendor tells me that the card either fails completely
 (kaput) or works - there is no grey area.  He is of the opinion that
 the problem is with the driver.
The last thing, IMHO, never trust a supplier trying to wriggle out of
a support case :) I've seen plenty of network cards that's on the way
but not dead yet.

-- 
Hakan (m1fcj) - http://www.hititgunesi.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)

2010-02-08 Thread Arun Khan
Hi Jobst, Brent, and Hakan,

Thanks for your inputs.  I sincerely appreciate your suggestions and
sharing your experience.

I have posted a support query with Chelsio as well.

In my case the connection is a Fiber cable.  On the switch side there
is only one 10GB port so there is not much I can do.  Luckily the
switch and the cable, besides the NIC, has also been supplied by the
same vendor.

I am going to request the hardware vendor to send their engineer and
to take on the next steps.

Will post final findings over here when the problem is solved.

-- Arun Khan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos