Hi
We have found a very strange bug in Open vSwitch, when it is connected to a
Cisco Switch port, the port will randomly get err-disabled.
So we have 76 Debian servers installed with Open vSwitch (2.4.0), each
connected an port in Cisco Switch 3110. There will be a chance of
err-disabled port on Cisco Switch every week or two. From Cisco switch
perspective, the port was disabled because detecting an loopback by
receiving a keepalive message which was originated from the cisco switch
port.
Basically the keepalive message was like below:
11:37:01.749102 e8:04:62:c8:6e:81 > e8:04:62:c8:6e:81, ethertype Loopback
(0x9000), length 60: Loopback, skipCount 0, Reply, receipt number 0, data
(40 octets)
0x0000: 0000 0100 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
Our first guess was that Open vSwitch accidentally sends the keepalive
message it received back to the port and leads to err-disabled state.
Normally the Open vSwitch will discard this message, but once a week or two
in 76 servers, it will get back to the port on the cisco switch and the
port will be err-disabled.
The work around we are using now are either disabling sending keepalive
message on cisco switch or explicitly add a flow rule for discarding that
keepalive message on Open vSwitch.
The Open vSwitch version is:
ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Aug 31 2015 16:53:51
The configuration of the switch is:
Bridge "acc_10064"
Port "acc_10064"
Interface "acc_10064"
type: internal
Port "vxnet2"
Interface "vxnet2"
Port "10064_88ad7aaa"
Interface "10064_88ad7aaa-02"
type: vxlan
options: {key="10064", local_ip="IP1", remote_ip="IP2"}
Interface "10064_88ad7aaa-01"
type: vxlan
options: {key="10064", local_ip="IP1", remote_ip="IP3"}
Bridge "acc_10050"
Port "10050_0977455a"
Interface "10050_0977455a-01"
type: vxlan
options: {key="10050", local_ip="IP1", remote_ip="IP4"}
Interface "10050_0977455a-02"
type: vxlan
options: {key="10050", local_ip="IP1", remote_ip="IP5"}
Port "vxnet0"
Interface "vxnet0"
Port "acc_10050"
Interface "acc_10050"
type: internal
Port "vxnet1"
Interface "vxnet1"
Bridge "br0"
Port "eth0"
Interface "eth0"
Port "br0"
Interface "br0"
type: internal
ovs_version: "2.4.0"
The kernel version is:
Linux version 3.16.0-4-amd64 ([email protected]) (gcc version
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04)
The ovs-dpctl show output is:
system@ovs-system:
lookups: hit:536177037 missed:17196786 lost:0
flows: 182
masks: hit:1130706939 total:9 hit/pkt:2.04
port 0: ovs-system (internal)
port 1: acc_10050 (internal)
port 2: vxlan_sys_4789 (vxlan)
port 3: eth0
port 4: br0 (internal)
port 5: vxnet0
port 6: vxnet1
port 7: acc_10064 (internal)
port 8: vxnet2
The Open vSwitch does not have a controller connected and it is configured
as normal L2 switch.
We have found some similar case on google but unanswered:
https://forums.gentoo.org/viewtopic-p-7884924.html?sid=12abe544bda8782c840fa5c70df6e65e
Thanks!
Best Regards,
Liang Dong
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss