Hi,
We have a fleet of Dell PowerEdge R640 all with very similar configuration,
important piece here is they run intel 10GB ethernet cards as below:
lspci | grep Ether
19:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T
(rev 01)
19:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T
(rev 01)
1a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection
(rev 01)
1a:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection
(rev 01)
Only 2 of them failing to auto-negotiate correct link speed:
ethtool eno1
Settings for eno1:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
ethtool eno2
Settings for eno2:
Supported ports: [ TP ]
Supported link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
They sometimes will also loose connectivity entirely for extended period of
up
to 4 hours, here's our switch logs which usually indicates the problem
lacpd[20416]: %DAEMON-5-LACPD_TIMEOUT: xe-10/0/16: lacp current while timer
expired current Receive State: CURRENT
/kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace:
cifd xe-10/0/16 - ATTACHED state - acting as standby link
rpd[1866]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000
mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags
0xc000
mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-10/0/16.0 has got
color 0
lacpd[20416]: %DAEMON-5-LACPD_TIMEOUT: xe-11/0/16: lacp current while timer
expired current Receive State: CURRENT
/kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace:
cifd xe-11/0/16 - ATTACHED state - acting as standby link
lacpd[20416]: %DAEMON-5-LACP_INTF_DOWN: ae125: Interface marked down due to
lacp timeout on member xe-11/0/16
rpd[1866]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000
mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags
0xc000
mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-11/0/16.0 has got
color 0
/kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace:
cifd xe-11/0/16 - CD state - ready to carry traffic
/kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace:
cifd xe-10/0/16 - CD state - ready to carry traffic
rpd[1866]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000
rpd[1866]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000
mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags
0xc000
mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-11/0/16.0 has got
color 0
mcsnoopd[94056]: %DAEMON-6: received iff message xe-11/0/16.0 ifl 8c6fcf0
op 2
flag 0
mcsnoopd[94056]: %DAEMON-6: KRT Ifstate: Decode iff message -
ifl(xe-11/0/16.0) without mesh-group tlv
mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags
0xc000
mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-10/0/16.0 has got
color 0
mcsnoopd[94056]: %DAEMON-6: received iff message xe-10/0/16.0 ifl 8be35a0
op 2
flag 0
mcsnoopd[94056]: %DAEMON-6: KRT Ifstate: Decode iff message -
ifl(xe-10/0/16.0) without mesh-group tlv
We have upgraded to 4.14.52 kernel hoping there might be some ixgbe patch
that
fixes this problem but the problem still persists.
I am posting here to seek advice on how to diagnose and probably fix this
problem.
Thanks!
Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired