I also encounter the same problem as Liang Dong did.
http://thread.gmane.org/gmane.linux.network.openvswitch.general/11704
Just copy the message body from Liang Dong.
Hi
We have found a very strange bug in Open vSwitch, when it is connected to a
Cisco Switch port, the port will randomly get err-disabled.
So we have 76 Debian servers installed with Open vSwitch (2.4.0), each
connected an port in Cisco Switch 3110. There will be a chance of err-disabled
port on Cisco Switch every week or two. From Cisco switch perspective, the port
was disabled because detecting an loopback by receiving a keepalive message
which was originated from the cisco switch port.
Basically the keepalive message was like below:
11:37:01.749102 e8:04:62:c8:6e:81 e8:04:62:c8:6e:81, ethertype Loopback
(0x9000), length 60: Loopback, skipCount 0, Reply, receipt number 0, data (40
octets)
0x0000: 0000 0100 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
Our first guess was that Open vSwitch accidentally sends the keepalive message
it received back to the port and leads to err-disabled state. Normally the Open
vSwitch will discard this message, but once a week or two in 76 servers, it
will get back to the port on the cisco switch and the port will be err-disabled.
The work around we are using now are either disabling sending keepalive message
on cisco switch or explicitly add a flow rule for discarding that keepalive
message on Open vSwitch.
The Open vSwitch version is:
ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Aug 31 2015 16:53:51
The configuration of the switch is:
Bridge "acc_10064"
Port "acc_10064"
Interface "acc_10064"
type: internal
Port "vxnet2"
Interface "vxnet2"
Port "10064_88ad7aaa"
Interface "10064_88ad7aaa-02"
type: vxlan
options: {key="10064", local_ip="IP1", remote_ip="IP2"}
Interface "10064_88ad7aaa-01"
type: vxlan
options: {key="10064", local_ip="IP1", remote_ip="IP3"}
Bridge "acc_10050"
Port "10050_0977455a"
Interface "10050_0977455a-01"
type: vxlan
options: {key="10050", local_ip="IP1", remote_ip="IP4"}
Interface "10050_0977455a-02"
type: vxlan
options: {key="10050", local_ip="IP1", remote_ip="IP5"}
Port "vxnet0"
Interface "vxnet0"
Port "acc_10050"
Interface "acc_10050"
type: internal
Port "vxnet1"
Interface "vxnet1"
Bridge "br0"
Port "eth0"
Interface "eth0"
Port "br0"
Interface "br0"
type: internal
ovs_version: "2.4.0"
The kernel version is:
Linux version 3.16.0-4-amd64
([email protected]) (gcc version 4.8.4
(Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04)
The ovs-dpctl show output is:
system at ovs-system:
lookups: hit:536177037 missed:17196786 lost:0
flows: 182
masks: hit:1130706939 total:9 hit/pkt:2.04
port 0: ovs-system (internal)
port 1: acc_10050 (internal)
port 2: vxlan_sys_4789 (vxlan)
port 3: eth0
port 4: br0 (internal)
port 5: vxnet0
port 6: vxnet1
port 7: acc_10064 (internal)
port 8: vxnet2
The Open vSwitch does not have a controller connected and it is configured as
normal L2 switch.
We have found some similar case on google but unanswered:
https://forums.gentoo.org/viewtopic-p-7884924.html?sid=12abe544bda8782c840fa5c70df6e65e
Any ideas?
Thanks,
Zhang Haoyu
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev